Skip to content

Technical Decisions

This page is an ADR-lite — short, opinionated notes on the load-bearing choices that shape the codebase. Each entry follows the same shape: Decision, Context, Rationale, Trade-offs.

Wire for dependency injection

Decision. Use Google's Wire to assemble the backend's object graph at compile time. The provider set lives in backend/internal/wiring/providers.go; InitializeApp is generated and called once from main.go.

Context. A typical request touches a handler, a service, one or more repositories, the Redis client, the WebSocket Hub, the Asynq client, and configuration. Hand-wiring this in main becomes unreadable; a runtime container hides errors until startup.

Rationale. Wire validates the entire graph at go generate time. Missing or ambiguous dependencies surface as compile errors, with zero runtime overhead. The output is plain Go that a human can read.

Trade-offs. Requires running go generate after provider changes. The generated file is committed and reviewed like any other code. Constructor signatures become the public contract — refactors ripple, which is by design.

GORM + Postgres + PostGIS

Decision. Persist all durable state in a single PostGIS-enabled Postgres database, accessed through GORM with explicit repository interfaces in backend/internal/repository/.

Context. The product is geospatial (events, moments, locations) and relational (users, friendships, participants). A single ACID store keeps reasoning simple while leaving room to scale reads.

Rationale. PostGIS handles point geometry and proximity queries natively. GORM gives type-safe model declarations and migrations during development; the repository layer keeps service code free of ORM coupling. db.AutoMigrate runs only outside production — production migrations are deliberate.

Trade-offs. GORM's reflection-heavy query path can be slower than pgx for the hottest endpoints; repositories give us a place to drop down to raw SQL when needed. Logic is concentrated in services rather than the database — there are no triggers carrying business meaning.

Asynq for background work

Decision. Use Asynq on Redis for all background processing. Three queues — critical (6), default (3), low (1) — are configured in backend/internal/worker/server.go.

Context. Email sends, webhook fan-out, cleanup sweeps, status transitions, and notification dispatch must not block HTTP handlers, and must survive process restart.

Rationale. Asynq gives us retries with exponential backoff, dead-letter queues, weighted priorities, scheduled jobs, and a usable web UI — all on top of Redis we already operate. The scheduler (backend/internal/scheduler/manager.go) enqueues recurring jobs (status updates every 5 min, message expiry every 30 s, daily purges, etc.) so the worker is the single execution point.

Trade-offs. Couples background processing to Redis availability. Tasks must be idempotent — Asynq guarantees at-least-once. Stateful long-running work (>~30 s) belongs elsewhere; Asynq is for short, retryable units.

WebSocket Hub with Redis pub/sub fanout

Decision. Each pod runs an in-process WebSocket Hub (map[eventId]→Room) plus a Redis-backed PSUBSCRIBE on chat:event:* for cross-pod fanout. See backend/internal/websocket/hub.go.

Context. Chat in Tomoda is per-event: each event has a single room, and the cardinality of concurrent rooms is moderate. The backend now runs with multiple replicas (see Horizontal scaling).

Rationale. Each WS connection terminates at the pod it dialled, so some per-pod state is unavoidable. Redis pub/sub is the cheapest way to bridge those pods: one PUBLISH per broadcast, one PSUBSCRIBE per pod, no ingress session affinity, no separate broker to operate. An origin pod-ID tag in the wire envelope drops the echo that would otherwise round-trip back to the sender.

Trade-offs. Best-effort delivery (at-most-once) across pods — Redis pub/sub doesn't persist or retry. We accept this because every chat message is also persisted via ChatService.SendMessage, so a missed pub/sub frame is recoverable on reconnect. Every pod's subscriber receives every event's traffic (one cheap channel-name match per inbound message); if a single Redis instance ever becomes the bottleneck, the next step is to shard the channel namespace, not change the data model. See Real-time.

Earlier decision. Until 2026-05 the Hub was process-local with no cross-pod fanout, justified by the single-replica backend. That constraint is removed alongside the api/async deployment split — see SCALING_PLAN.md at the repo root.

Single image, multiple modes

Decision. Build one backend image. Select what to start at runtime via --mode (or SERVER_MODE env): full, multi-hub, api-hub, ws-hub, or async. See backend/cmd/server/main.go.

Context. Once horizontal scaling was on the table we needed separate scaling profiles for the API surface (CPU/RPS) and the async worker pool (queue depth) — and didn't want to maintain two Dockerfiles or two CI pipelines for it.

Rationale. Modes are cheap: one flag, a few booleans, a handful of if guards around Hub.Run, WorkerServer.Start, SchedulerManager.Start, and route registration. /health is always served so k8s probes work in every mode, including async.

Trade-offs. The Wire-built dependency graph is still wired in full on every startup — unused components are constructed but never started. This costs a few hundred milliseconds of init and a slightly larger heap, in exchange for not splitting the binary or DI graph along mode lines. Reserved modes (api-hub, ws-hub) ship now even though no deployment uses them yet, so the day we split WS out we don't need a code change.

H3 spatial indexing for discovery

Decision. Index every Location with an Uber H3 cell at resolution 12 (~5 m edges) in addition to its PostGIS point. See backend/internal/models/location.go.

Context. Discovery needs two distinct operations: (1) dedupe — "is this user-supplied lat/lng close enough to an existing place?", and (2) clustering — "group nearby pins for the map view".

Rationale. Hashing by H3 cell lets both operations be plain string equality on an indexed column. PostGIS still handles arbitrary-radius searches, but the common case becomes an index lookup. H3's hierarchical IDs also make zoom-out clustering trivial — drop precision digits.

Trade-offs. Two indices to keep aligned. The fixed resolution is a global choice — finer rooms can be derived on the fly, but the base cell size is committed.

Photon as Google Places fallback

Decision. Use self-hosted Photon (OSM-backed) as the primary geocoder, with Google Places available as an optional secondary source. The dev stack runs Photon on port 2322 (docker-compose.dev.yml).

Context. Geocoding and autocomplete are hit on nearly every interactive map screen. Google Places is excellent but billed per-request; coverage and quality vary by region.

Rationale. Self-hosting Photon caps the marginal cost at zero and gives us latency control. For users where OSM coverage is thin we still fall through to Google. The split keeps the bill bounded while preserving result quality where it matters.

Trade-offs. Operational ownership — Photon must be deployed, refreshed, and monitored. OSM data updates are not as crisp as Google's commercial dataset for points-of-interest discovery; we accept that on the autocomplete path.

JWT + refresh tokens

Decision. Authenticate stateless requests with a 24-hour JWT (HS256), paired with a server-stored refresh token mirrored into Redis. See Authentication.

Context. Clients are mobile-first and need to survive long offline windows. Session lookup on every request would push extra load onto Postgres or Redis at high QPS.

Rationale. A short-lived JWT lets the API authorise from the bearer alone — no per-request DB hit on the happy path. The refresh token gives us revocation: deleting the row + Redis key invalidates the chain on the next refresh. Sessions and login history still live in Postgres for user-facing controls.

Trade-offs. JWTs cannot be invalidated mid-lifetime — a compromised access token is valid until expiry. We mitigate by keeping expiry to 24 h, supporting forced refresh, and offering per-session revocation. The system is more complex than pure server sessions; we have judged the latency win worth it.

Expo Router (file-based, single codebase)

Decision. Ship iOS, Android, and the web from one Expo SDK 55 codebase with Expo Router for navigation. Routes live under frontend/app/.

Context. The product surface is identical across platforms — a small team cannot afford three navigators or three feature implementations.

Rationale. File-based routing matches the way the team mentally groups screens ((moments), (social), auth/, home/, etc.) and removes a class of imperative navigation bugs. React 19 + react-native-web gives us a usable web build of the same screens.

Trade-offs. A few platform-specific concerns (passkey UX, native sign-in flows, map providers) require Platform.OS branching. Some libraries lag Expo SDK releases.

Context-only state

Decision. Manage app-wide state with React Context. Nine providers live in frontend/contexts/ — Auth, Theme, Friends, Location, CreateEvent, PageHeader, Toast, Sheet, MapDock. No Redux, Zustand, or React Query.

Context. Most cross-screen state is small (auth user, current location, ephemeral UI), and React Query's caching layer was deemed unnecessary against a small set of imperative fetches.

Rationale. Keeps the bundle small, the learning curve flat, and the data flow easy to grep. Providers are composed at the root layout (app/_layout.tsx).

Trade-offs. Re-render scope is the developer's responsibility — Contexts are memoised manually with useMemo and useCallback. There is no built-in cache for server data; each service handles its own fetch and refresh.

Fetch (no axios)

Decision. Use the platform fetch API directly, wrapped in frontend/utils/api.ts (handleResponse) and frontend/services/*.ts thin per-domain wrappers.

Context. Adding axios introduces a non-trivial dependency, a parallel error model, and an interceptor system that ends up duplicating logic that React Native/web already provide.

Rationale. fetch is universal across iOS, Android, and the web. The shared handleResponse centralises the bits we actually need: 401 → session-expired event, JSON parsing, error-message extraction. Token caching lives in frontend/utils/tokenManager.ts.

Trade-offs. No request/response interceptors as a first-class concept — we hand-roll them. No built-in cancellation cookies — we rely on AbortController where it matters.

YAML + env layering

Decision. Backend config is loaded from layered YAML — config.yaml (base) plus config.<env>.yaml (overrides) — with environment variables and GCP Secret Manager filling in secrets via scripts/pull-secrets.sh.

Context. A pure .env approach scales poorly when configuration grows nested (Stripe price IDs, S3 endpoints, WebAuthn RP settings). Mixed-shape config also makes typed loading awkward.

Rationale. YAML expresses nested config naturally and maps cleanly onto the Go config structs in backend/config/config.go. Environment-specific files keep production overrides reviewable. Secrets stay out of files entirely — they are injected at runtime from Secret Manager.

Trade-offs. Two layers (file + env) to mentally merge. Local developers need a config.local.yaml for ergonomics. The split is deliberately worth the clarity at the cost of one additional load path.