Signal Before Noise — Multilingual Publishing Platform
Production multilingual publishing platform for a news publication. Next.js 16 App Router with a 4-locale content model (Persian, English, Arabic, Turkish) in a single MongoDB document, RS256 asymmetric JWT auth, TipTap v3 editor with per-locale tabs and table support, Recharts analytics dashboard with bot suppression, tiered backup/restore system, and a hardened VPS deployment behind Cloudflare Full Strict SSL.
View Live→The Problem
Signal Before Noise is a multilingual news publication — Persian primary, with select articles translated into English, Arabic, and Turkish. The platform shipped from zero to production in 9 days, then iterated to v1.51 post-launch. The core technical problem was data architecture: how do you store a multilingual article so that the editorial workflow stays coherent, the data model stays consistent, and RTL/LTR layout switches correctly per locale without treating each locale as a separate thing to manage?
Most i18n implementations take one of two approaches. The first is separate documents per locale — one article becomes four, one per language. This creates a synchronization problem: the four documents can drift independently, and you have to decide what it means when Persian is published but English is still draft. The second is a translations join table — normalized, clean, but every article fetch requires a join, and the editorial interface has to reassemble the article from normalized parts every time.
I used a third approach: a single MongoDB document with four nested translation slots. Persian is required; English, Arabic, and Turkish are optional. An article is always atomic — all translations travel with it in one document. If a locale slot is empty, that locale URL 404s. There is no half-published article visible to readers.
The trade-off is that the document gets wide for fully-translated articles and the update path requires knowing which locale slot you're writing into. Both are manageable. The atomicity guarantee was the deciding factor.
The full schema spans 8 Mongoose models: Article, Note, Media, Subscriber, SitePage, BackupConfig, ViewLog, and ViewSnapshot. The Article and Note models use the nested translation slot approach described above. The remaining models support the analytics, backup, subscriber, and media systems added post-launch.
Routing and RTL/LTR Switching
Next.js App Router [lang] dynamic routing reads the locale from the URL segment and selects the corresponding translation slot. The layout reads the lang param and sets dir="rtl" for Persian and Arabic, dir="ltr" for English and Turkish. Tailwind's rtl: modifier handles the directional variants in components.
In practice this works cleanly for layout — padding, margins, flexbox direction, text alignment. The harder part was the editor.
The TipTap Editor
The editor is TipTap v3 with four editor instances, one per locale, rendered as tabs. Switching tabs doesn't clear state — all four editors hold their content in memory simultaneously. This keeps the editing experience fluid: a writer can toggle between the Persian draft and the English draft without waiting for a fetch.
TipTap's default configuration assumes LTR. Cursor positioning, paragraph direction, and keyboard behavior all break when you put RTL text into an unconfigured editor. I added an explicit TextDirection extension that sets paragraph direction based on the active locale tab. The Persian and Arabic editors get dir="rtl" on the editor element itself. This covers the main case — writing in the primary language of that locale slot — but it does not handle a writer pasting English text into a Persian tab, where the cursor will misbehave on punctuation. That is a known limitation.
I also added explicit text alignment controls and a table extension with contextual insert and edit controls — the editorial workflow required structured content layouts and inline data tables without leaving the editor.
Beyond the standard image and link extensions, the client needed audio article support. Some articles have recorded audio versions. Files upload to the server with UUID filename generation, MIME-type and extension validation, and a 10MB size limit. On the published article view, audio playback uses wavesurfer.js waveform rendering rather than a plain <audio> element — the waveform gives readers a visual sense of the audio length and structure before playing.
Subscriber System
The subscriber system handles email signups as a public endpoint, which is why the rate limiting and deduplication checks matter more than they might on an authenticated route. Each signup goes through: rate limiting at 5 requests per 10 minutes per IP, duplicate email detection returning 409 rather than a generic error, and HTML-tag stripping before storage to prevent XSS via subscriber data. The editorial dashboard shows a paginated list of subscribers with delete controls — the full loop from acquisition to management without touching the database directly.
Analytics Dashboard
The analytics dashboard surfaces article performance via Recharts area and bar charts. The 30-day cumulative trend chart shows total view growth over the past month. The 24-hour breakdown chart shows hourly distribution in Tehran time (IRST, UTC+3:30) — the primary audience's timezone, not server time. Top content ranking appears below the charts, showing all-time and 24-hour views side by side.
The harder problem was bot suppression. Naive view counting produces numbers that are largely meaningless on a new publication — crawlers, monitoring services, and scrapers can account for a significant portion of early traffic. I implemented three layers: a client-side JavaScript gate (bots that don't execute JS don't count), a User-Agent regex filter on the server, and IP + slug deduplication via a ViewLog TTL collection with a 24-hour expiry. A bot that does execute JS but hits the same article from the same IP twice within 24 hours is counted once.
The 24-hour expiry is a TTL index on the ViewLog collection. MongoDB prunes expired documents automatically, which keeps the collection from growing unboundedly on a VPS with constrained disk.
Backup and Restore
The backup system uses mongodump to produce .tar.gz archives with three retention tiers: daily (last 3 kept), weekly (last 1 kept), monthly (last 1 kept). Manual backups are never auto-deleted. Cron scheduling is gated by a separate BACKUP_CRON_TOKEN to prevent unauthorized HTTP triggers. A .backup.lock file prevents concurrent backup runs from racing each other.
Restore accepts either a stored backup or an uploaded archive, so the system works for both routine recovery and server migration.
The choice of mongodump over Atlas continuous backup or a managed service was deliberate: the deployment is a single Ubuntu VPS without MongoDB Atlas, and the retention policy fit the publication's risk tolerance. The tradeoff is that backups are point-in-time snapshots rather than continuous — a crash between runs loses changes since the last backup.
Authentication
The auth requirements were specific: RS256 asymmetric key signing via jose, httpOnly Secure SameSite=Strict cookies, bcrypt at cost 12, and rate-limited login (5 attempts per 15 minutes).
RS256 uses an asymmetric keypair — a private key to sign, a public key to verify. The practical difference from HS256 in a single-service application is small. The architectural reason to prefer it: any service that can verify tokens with HS256 can also issue them, because the secret is shared. With RS256, you can publish the public key and downstream services can verify tokens without being able to sign new ones. The client was explicit about RS256; the separation was also the correct default for a system that might eventually need external token verification.
The rate limiter uses MongoDB TTL indexes on a loginAttempts collection rather than Redis. Each failed attempt creates a document with a 15-minute TTL. When a login arrives, the count of documents for that email in the TTL window determines whether the attempt is allowed. This avoids adding Redis to a deployment that was already running on a single VPS. The trade-off is slightly higher latency on the login path compared to a Redis counter. For 5 attempts per 15 minutes, that latency is not meaningful.
Deployment and Security
The platform runs on Ubuntu VPS with PM2 for process management, Nginx as reverse proxy with security headers, and Cloudflare Full Strict SSL using Origin Certificates. Full Strict means the connection is encrypted end-to-end — Cloudflare to origin uses the Origin Certificate, not Flexible mode where the origin leg can be unencrypted.
Beyond SSL, the server is hardened with: UFW with minimal open ports, Fail2ban with 4 jails (SSH, Nginx HTTP auth, Nginx bad bots, and a custom jail for repeated API 4xx), auditd for system call auditing, rkhunter for rootkit detection, kernel sysctl hardening, and a dedicated unprivileged app user running the PM2 process.
This is more extensive than the hardening on other projects I've deployed. The reason is threat model. A news publication that publishes in Persian and Arabic operates in a different risk environment than a satellite company website or an e-commerce store — politically adjacent content is a more common target for defacement and credential theft. The hardening reflects that, not a general philosophy of over-engineering security.
Winston structured logging writes to MongoDB with a 30-day TTL. The TTL means the logs collection self-prunes without a maintenance job. Log records include request path, status code, response time, user ID if authenticated, and IP address.
What Shipped
The platform shipped to production in 9 days and iterated to v1.51 post-launch. It is live at signalb4noise.com. Articles publish in up to four locales with automatic RTL/LTR layout switching. Audio articles play inline with waveform display. Editorial staff manage all content through the TipTap editor without developer involvement. The post-launch iteration added an analytics dashboard with bot-suppressed view tracking, a tiered backup/restore system with three retention tiers, and a subscriber management system with rate limiting and deduplication.
What I'd Change
The four editor instances holding content in memory is convenient in the happy path but creates a risk: a browser crash or accidental navigation loses all four tabs simultaneously. The right answer is autosave per locale, debounced, writing drafts to the server. I built the editor before I built autosave, which meant drafts were client-side only in the first weeks of production use. Editorial staff were warned to save manually. That is not a sustainable answer and it should have been the first thing built, not an afterthought.
The Fail2ban jail tuning also took longer than I expected. Getting the regex patterns right for the Nginx log format, testing them against real traffic, calibrating thresholds — I did this directly against production rather than against a staging environment with replayed traffic. The first version of the API jail threshold was too aggressive and briefly affected legitimate traffic. The correct process is: replay production logs against candidate patterns in a staging environment, tune thresholds there, then deploy. Doing it on production is faster initially and slower overall.