Signal Before Noise — Multilingual Publishing Platform

Production multilingual publishing platform for a news publication. Next.js 16 App Router with a 4-locale content model (Persian, English, Arabic, Turkish) in a single MongoDB document, RS256 asymmetric JWT auth, TipTap v3 editor with per-locale tabs and table support, Recharts analytics dashboard with bot suppression, tiered backup/restore system, and a hardened VPS deployment behind Cloudflare Full Strict SSL.

View Live→

4Locales

RS256JWT Signing

4Fail2ban Jails

30 daysLog Retention

9MongoDB Models

9 daysDeployed

Next.js 16TypeScriptTailwind CSS v4MongoDBMongooseTipTap v3Rechartsjose (RS256)wavesurfer.jsCloudflarePM2 + Nginx

The Problem

Signal Before Noise is a multilingual news publication — Persian primary, with select articles translated into English, Arabic, and Turkish. The platform shipped from zero to production in 9 days, then iterated to v1.61 post-launch. The core technical problem was data architecture: how do you store a multilingual article so that the editorial workflow stays coherent, the data model stays consistent, and RTL/LTR layout switches correctly per locale without treating each locale as a separate thing to manage?

Most i18n implementations take one of two approaches. The first is separate documents per locale — one article becomes four, one per language. This creates a synchronization problem: the four documents can drift independently, and you have to decide what it means when Persian is published but English is still draft. The second is a translations join table — normalized, clean, but every article fetch requires a join, and the editorial interface has to reassemble the article from normalized parts every time.

I used a third approach: a single MongoDB document with four nested translation slots. Persian is required; English, Arabic, and Turkish are optional. An article is always atomic — all translations travel with it in one document. If a locale slot is empty, that locale URL 404s. There is no half-published article visible to readers.

The trade-off is that the document gets wide for fully-translated articles and the update path requires knowing which locale slot you're writing into. Both are manageable. The atomicity guarantee was the deciding factor.

The full schema spans 9 Mongoose models: Article, Note, Media, Subscriber, SitePage, BackupConfig, ViewLog, ViewSnapshot, and Series. The Article and Note models use the nested translation slot approach described above. The remaining models support the analytics, backup, subscriber, media, and article series systems added post-launch.

Routing and RTL/LTR Switching

Next.js App Router [lang] dynamic routing reads the locale from the URL segment and selects the corresponding translation slot. The layout reads the lang param and sets dir="rtl" for Persian and Arabic, dir="ltr" for English and Turkish. Tailwind's rtl: modifier handles the directional variants in components.

In practice this works cleanly for layout — padding, margins, flexbox direction, text alignment. The harder part was the editor.

The TipTap Editor

The editor is TipTap v3 with four editor instances, one per locale, rendered as tabs. Switching tabs doesn't clear state — all four editors hold their content in memory simultaneously. This keeps the editing experience fluid: a writer can toggle between the Persian draft and the English draft without waiting for a fetch.

TipTap's default configuration assumes LTR. Cursor positioning, paragraph direction, and keyboard behavior all break when you put RTL text into an unconfigured editor. I added an explicit TextDirection extension that sets paragraph direction based on the active locale tab. The Persian and Arabic editors get dir="rtl" on the editor element itself. This covers the main case — writing in the primary language of that locale slot — but it does not handle a writer pasting English text into a Persian tab, where the cursor will misbehave on punctuation. That is a known limitation.

I also added explicit text alignment controls and a table extension with contextual insert and edit controls — the editorial workflow required structured content layouts and inline data tables without leaving the editor.

Beyond the standard image and link extensions, the client needed audio article support. Some articles have recorded audio versions. Files upload to the server with UUID filename generation, MIME-type and extension validation, and a 10MB size limit. On the published article view, audio playback uses wavesurfer.js waveform rendering rather than a plain <audio> element — the waveform gives readers a visual sense of the audio length and structure before playing.

Subscriber System

The subscriber system handles email signups as a public endpoint, which is why the rate limiting and deduplication checks matter more than they might on an authenticated route. Each signup goes through: rate limiting at 5 requests per 10 minutes per IP, duplicate email detection returning 409 rather than a generic error, and HTML-tag stripping before storage to prevent XSS via subscriber data. The editorial dashboard shows a paginated list of subscribers with delete controls — the full loop from acquisition to management without touching the database directly.

Analytics Dashboard

The analytics dashboard surfaces article performance via Recharts area and bar charts. The 30-day cumulative trend chart shows total view growth over the past month. The 24-hour breakdown chart shows hourly distribution in Tehran time (IRST, UTC+3:30) — the primary audience's timezone, not server time. Top content ranking appears below the charts, showing all-time and 24-hour views side by side.

Per-locale view tracking records which locale each view arrived from, so the editorial team can see which locales are gaining traction independently. A locale breakdown dashboard displays locale-by-locale totals, and locale-filtered chart views let editors switch the 30-day trend to show only Persian views, or only English — the same data sliced by audience. This was a post-launch addition that turned out to be the most-used part of the analytics panel; knowing that Arabic was getting a tenth the traffic of Persian is directly actionable for editorial resource allocation.

The harder problem was bot suppression. Naive view counting produces numbers that are largely meaningless on a new publication — crawlers, monitoring services, and scrapers can account for a significant portion of early traffic. I implemented three layers: a client-side JavaScript gate (bots that don't execute JS don't count), a User-Agent regex filter on the server, and atomic IP + slug + locale deduplication via a ViewLog TTL collection with a 24-hour expiry.

The deduplication had a race condition. Two requests arriving simultaneously for the same IP + slug + locale combination would both pass the existence check before either write completed, causing a double-count. The fix was treating the ViewLog write as an insert rather than an upsert — a duplicate-key error on the compound unique index means the second request was rejected cleanly, with no double-count and no count lost. The race window that produced duplicate-key errors was also the proof that the condition existed at all.

The 24-hour expiry is a TTL index on the ViewLog collection. MongoDB prunes expired documents automatically, which keeps the collection from growing unboundedly on a VPS with constrained disk.

Umami Analytics

The analytics dashboard covers editorial-facing metrics: views per article, per locale, per time window. Umami covers the complementary question: what are readers actually doing on the public site — which pages do they visit, where do they come from, how long do they stay.

Umami runs self-hosted on the same VPS. It is injected on public pages only. For logged-in dashboard users, a localStorage flag disables the Umami script tag — editor page views do not pollute the public reader metrics. The two analytics systems answer different questions and address different audiences (editorial staff vs. the platform operator), so keeping them separate was the right call rather than trying to build one system that serves both.

Backup and Restore

The backup system uses mongodump to produce .tar.gz archives with three retention tiers: daily (last 3 kept), weekly (last 1 kept), monthly (last 1 kept). Manual backups are never auto-deleted. Cron scheduling runs app-internally via Next.js instrumentation.ts — no external crontab or separate process required. A .backup.lock file prevents concurrent backup runs from racing each other.

Restore accepts either a stored backup or an uploaded archive, so the system works for both routine recovery and server migration.

Google Drive offsite sync was added post-launch. Each backup can be pushed to the client's Google Drive automatically after it runs — OAuth2 credentials are stored in the application environment, and an auto-upload toggle in the panel enables or disables it without a deploy. The sync panel shows each backup with a status badge (local only, Drive only, or synced), and individual backups can be pushed to Drive, pulled back, or deleted from Drive without touching the local copy. This gives the client a point-in-time offsite copy without any additional infrastructure — no S3 bucket, no managed backup service.

The choice of mongodump over Atlas continuous backup or a managed service was deliberate: the deployment is a single Ubuntu VPS without MongoDB Atlas, and the retention policy fit the publication's risk tolerance. The tradeoff is that backups are point-in-time snapshots rather than continuous — a crash between runs loses changes since the last backup.

Authentication

The auth requirements were specific: RS256 asymmetric key signing via jose, httpOnly Secure SameSite=Strict cookies, bcrypt at cost 12, and rate-limited login (5 attempts per 15 minutes).

RS256 uses an asymmetric keypair — a private key to sign, a public key to verify. The practical difference from HS256 in a single-service application is small. The architectural reason to prefer it: any service that can verify tokens with HS256 can also issue them, because the secret is shared. With RS256, you can publish the public key and downstream services can verify tokens without being able to sign new ones. The client was explicit about RS256; the separation was also the correct default for a system that might eventually need external token verification.

The rate limiter uses MongoDB TTL indexes on a loginAttempts collection rather than Redis. Each failed attempt creates a document with a 15-minute TTL. When a login arrives, the count of documents for that email in the TTL window determines whether the attempt is allowed. This avoids adding Redis to a deployment that was already running on a single VPS. The trade-off is slightly higher latency on the login path compared to a Redis counter. For 5 attempts per 15 minutes, that latency is not meaningful.

Deployment and Security

The platform runs on Ubuntu VPS with PM2 for process management, Nginx as reverse proxy with security headers, and Cloudflare Full Strict SSL using Origin Certificates. Full Strict means the connection is encrypted end-to-end — Cloudflare to origin uses the Origin Certificate, not Flexible mode where the origin leg can be unencrypted.

Beyond SSL, the server is hardened with: UFW with minimal open ports, Fail2ban with 4 jails (SSH, Nginx HTTP auth, Nginx bad bots, and a custom jail for repeated API 4xx), auditd for system call auditing, rkhunter for rootkit detection, kernel sysctl hardening, and a dedicated unprivileged app user running the PM2 process.

This is more extensive than the hardening on other projects I've deployed. The reason is threat model. A news publication that publishes in Persian and Arabic operates in a different risk environment than a satellite company website or an e-commerce store — politically adjacent content is a more common target for defacement and credential theft. The hardening reflects that, not a general philosophy of over-engineering security.

Winston structured logging writes to MongoDB with a 30-day TTL. The TTL means the logs collection self-prunes without a maintenance job. Log records include request path, status code, response time, user ID if authenticated, and IP address.

Article Series System

Post-launch, the client needed a way to group related articles into ordered series — a sequence like "Understanding Conflict in the Middle East, Part 1–5" that readers can navigate linearly.

I added a Series Mongoose model with a multilingual slug and per-locale title and description. Each article can belong to one series with an ordered part number. The uniqueness constraint is at the (series, partNumber) level with a compound index — attempting to assign two articles the same position in a series returns a 409, and the write is rejected atomically rather than creating a data integrity problem.

The public-facing SeriesPanel component renders an animated accordion using Framer Motion. It collapses by default and expands to show the full part list with the current article highlighted. Reduced-motion support is handled by checking prefers-reduced-motion and disabling the animation — the panel expands without transition on devices where the user has requested reduced motion.

The dashboard editor auto-fills the next available part number when a writer selects a series. This is a small UX detail that turns out to matter in practice: without it, the writer has to mentally track which part numbers are taken and pick the next one, which creates numbering errors on series with more than a few parts.

What Shipped

The platform shipped to production in 9 days and iterated to v1.61 post-launch. It is live at signalb4noise.com. Articles publish in up to four locales with automatic RTL/LTR layout switching. Audio articles play inline with waveform display. Editorial staff manage all content through the TipTap editor without developer involvement. The post-launch iterations added: an analytics dashboard with per-locale view tracking, locale breakdown dashboard, and locale-filtered chart views with layered bot suppression; a tiered backup/restore system with Google Drive offsite sync; a subscriber management system with rate limiting and deduplication; an article series system for grouping related articles into ordered, navigable collections; Umami self-hosted analytics for public reader tracking; and a featured article hero system for homepage editorial highlighting.

Featured Article Hero

The homepage originally displayed articles in a flat chronological list. As the publication grew, the client wanted a way to surface one article prominently — a featured slot at the top of the page with a larger visual treatment and editorial framing separate from the regular feed.

The featured article hero system adds a featured boolean field to the Article schema. Only one article can be featured at a time; marking a new article as featured automatically clears the previous one at the database level rather than requiring the editor to manually unfeature the old article first. The dashboard shows which article is currently featured and lets editors reassign it in one click. On the public homepage, the featured article renders above the article list with a full-width header and a distinct layout that makes the editorial priority visible to readers without requiring any structural page changes.

What I'd Change

The four editor instances holding content in memory is convenient in the happy path but creates a risk: a browser crash or accidental navigation loses all four tabs simultaneously. The right answer is autosave per locale, debounced, writing drafts to the server. I built the editor before I built autosave, which meant drafts were client-side only in the first weeks of production use. Editorial staff were warned to save manually. That is not a sustainable answer and it should have been the first thing built, not an afterthought.

The Fail2ban jail tuning also took longer than I expected. Getting the regex patterns right for the Nginx log format, testing them against real traffic, calibrating thresholds — I did this directly against production rather than against a staging environment with replayed traffic. The first version of the API jail threshold was too aggressive and briefly affected legitimate traffic. The correct process is: replay production logs against candidate patterns in a staging environment, tune thresholds there, then deploy. Doing it on production is faster initially and slower overall.