Skip to content

Tomoda

The Tomoda application — Go backend plus Expo/React frontend — deployed as a Kustomize base with dev and prod overlays under k8s/apps/tomoda/. Argo CD reconciles the overlay output into the tomoda namespace (dev) or prod namespace (prod) of the single GKE cluster.

For backend code, schemas, and migrations, see the Tomoda backend docs.

Topology

The backend is one image, two Deployments: an API pool that serves HTTP + WebSocket (tomoda-api, mode multi-hub) and an async pool that runs the Asynq worker plus the Redis-leader-elected scheduler (tomoda-async, mode async). The mode is selected at startup via the SERVER_MODE env var on each Deployment. See the tomoda repo's SCALING_PLAN.md and architecture decision for the rationale.

flowchart LR
    user[User] --> cf[Cloudflare DNS<br/>tomoda.life]
    cf --> lb[GCP LB<br/>asia-east1]
    lb --> traefik[Traefik<br/>traefik-system ns]
    traefik -->|api.* / api-dev.*| api[tomoda-api<br/>SERVER_MODE=multi-hub<br/>replicas 2–6]
    traefik -->|app.* / app-dev.* / www.*| frontend[frontend<br/>Deployment :8081]
    api -->|publish chat:event:*| redis[(Redis<br/>data ns)]
    redis -->|subscribe chat:event:*| api
    async[tomoda-async<br/>SERVER_MODE=async<br/>replicas 2–6] -->|enqueue / consume| redis
    async -->|leader lock<br/>scheduler:leader| redis
    api --> pg[(Postgres<br/>data ns)]
    async --> pg
    api --> photon[photon.data:2322]

NetworkPolicies restrict ingress on every pod to the traefik-system namespace (api / frontend) or to nothing at all (tomoda-async, which has no inbound traffic — it pulls from Redis). See Network policies.

Base (k8s/apps/tomoda/base/)

kustomization.yaml aggregates five manifests and declares the dev image refs (tomoda-backend:latest / tomoda-frontend:latest from tomoda-dev-repo). Overlays rewrite the image registry for prod.

backend-api-deployment.yaml

Deployment tomoda-api plus the ClusterIP Service backend-service.

  • Replicas: 2 (base). Prod HPA scales 2–6; dev HPA scales 1–3.
  • Mode: SERVER_MODE=multi-hub — HTTP API + WebSocket Hub. Cross-pod WS fanout is via Redis pub/sub on chat:event:* (no ingress session affinity required).
  • Probes: /health for liveness (15s initial, 30s period) and readiness (10s initial, 10s period). The binary always serves /health regardless of mode.
  • Resources: requests 64Mi / 10m, limits 512Mi / 500m — I/O-bound. Tune via the HPA once steady-state utilisation is known.

The Service stays named backend-service so the existing Ingress and any intra-cluster callers keep working unchanged. Its selector is app: tomoda-api (async pods don't serve traffic).

backend-async-deployment.yaml

Deployment tomoda-async — no Service.

  • Replicas: 2 (base). Prod HPA scales 2–6; dev HPA scales 1–3.
  • Mode: SERVER_MODE=async — Asynq worker + scheduler. The scheduler is leader-elected via Redis SetNX on scheduler:leader (10s TTL), so multiple replicas are safe: exactly one becomes leader and dispatches cron tasks, all replicas consume the ready queue.
  • Probes: same /health endpoint (the binary always serves it).
  • Resources: requests 128Mi / 10m, limits 768Mi / 500m — more memory headroom than the api pool because purge/cleanup tasks can spike.

Both Deployments share the same configuration sources:

  1. backend-secrets (envFrom.secretRef) — overlay-renamed to backend-secrets-dev or backend-secrets-prod. Synced from GCP Secret Manager via ExternalSecrets. Contains JWT, encryption, DB and Redis passwords, OAuth client secrets (Google / LINE / Apple), Stripe keys, email API key, KLIPY API key.
  2. s3-uploader-secret (envFrom.secretRef) — synced from AWS Secrets Manager. Holds the AWS access key, bucket name, region, and base URL for user-uploaded media.
  3. backend-config-<env> (added by the overlay patch as a configMapRef) — non-secret runtime config: DB host/port/user/name, Redis host/port, WebAuthn RP ID, frontend URL, ENV.

GIN_MODE=debug (overlay flips to release for prod) and PHOTON_URL=http://photon.data.svc.cluster.local:2322 are set inline.

frontend-deployment.yaml

Single-replica Deployment serving the Expo web bundle on port 8081. EXPO_PUBLIC_API_URL is baked into the base as https://api-dev.tomoda.life; the production build uses a different image (rebuilt against https://api.tomoda.life) rather than overriding the env var at runtime. Liveness and readiness both probe /.

ingress.yaml

A single Traefik-class Ingress with two rules in the base — api-dev.tomoda.life routes to backend-service:8080, app-dev.tomoda.life routes to frontend-service:8081. TLS is terminated by cert-manager via the letsencrypt-prod ClusterIssuer; the certificate is stored in tomoda-app-tls.

network-policy.yaml

Three NetworkPolicies:

  • tomoda-api-policy — ingress from traefik-system only, port 8080.
  • tomoda-async-policy — no ingress rules (default-deny). Async pods initiate connections to Redis and Postgres; nothing dials them.
  • frontend-policy — ingress from traefik-system only, port 8081.

Dev overlay (k8s/apps/tomoda/overlays/dev/)

namespace: tomoda
commonLabels:
  env: dev

Adds two ExternalSecret resources, one ConfigMap, and two HPAs on top of the base:

  • backend-secrets-dev — references ClusterSecretStore/gsm-tomoda (GCP Secret Manager) and maps every key the backend needs from a tomoda-* GSM secret. Refresh interval: 1h.
  • s3-uploader-secret — references ClusterSecretStore/aws-sm-tomoda (AWS Secrets Manager) and pulls the tomoda-s3-uploader-dev JSON, splitting it into the AWS env vars the backend reads.
  • backend-config-dev — DB host postgres-postgresql.data.svc.cluster.local, DB name tomoda_dev, user tomoda_dev_user, DB_SSLMODE=disable; Redis host redis-master.data.svc.cluster.local; WebAuthn RP ID api-dev.tomoda.life; frontend URL https://app-dev.tomoda.life.
  • hpa-api.yaml / hpa-async.yaml — CPU 70%, memory 80%, bounds 1–3 (dev gets less traffic; lower floor saves cost).

JSON-patches rewire envFrom[0] to the dev secret name and append the ConfigMap as an additional envFrom entry — applied separately to both tomoda-api and tomoda-async. Ingress hostnames stay as the base defaults.

Prod overlay (k8s/apps/tomoda/overlays/prod/)

namespace: prod
commonLabels:
  env: prod

Adds five extra resources and rewrites images plus the Ingress:

  • backend-secrets-prod + s3-uploader-secret — same shape as dev, but the S3 ESO pulls tomoda-s3-uploader-prod from AWS SM. Both still go through the same gsm-tomoda / aws-sm-tomoda ClusterSecretStores; only the source secret names differ.
  • backend-config-prod — DB host prod-postgres-postgresql.data.svc.cluster.local, DB name tomoda_prod, user tomoda_prod_user, DB_SSLMODE=require; Redis host prod-redis-master.data.svc.cluster.local; WebAuthn RP ID api.tomoda.life; frontend URL https://app.tomoda.life.
  • pdb.yaml — two PodDisruptionBudgets, one each for tomoda-api and tomoda-async, both minAvailable: 50%. With replicas ≥ 2 on both pools, voluntary disruption (node drain, GKE upgrade, spot preemption rebalancing) keeps at least one pod of each pool serving throughout.
  • hpa-api.yaml / hpa-async.yaml — CPU 70%, memory 80%, bounds 2–6. The async HPA scales down more slowly (600s window) than the api HPA (300s) so in-flight worker tasks don't get cut short. See Scaling for tuning guidance and the planned queue-depth metric.

The Kustomize images block rewrites both image names to point at the prod Artifact Registry repo (tomoda-prod-repo instead of tomoda-dev-repo). Patches flip imagePullPolicy to IfNotPresent on both backend pools (matching tagged release images), set GIN_MODE=release, and rewrite the Ingress to add api.tomoda.life, app.tomoda.life, and a third rule for www.tomoda.life that points at the frontend.

Operations

  • Deploy — Argo CD Image Updater bumps the :latest tag on dev sync; prod deploys are gated on Cloud Build firing from a semver Git tag. See Argo CD.
  • Datastore depsPostgres and Redis must exist in the data namespace before backend pods become ready. The DSNs are baked into the ConfigMaps above.
  • Secrets rotation — rotate the value in GSM or AWS SM; ExternalSecrets refreshes within an hour. Pod restart is needed for changes that are read at boot — restart both pools.
  • Adding env vars — non-secret defaults go in the overlay's ConfigMap; secrets go in the matching ExternalSecret + the corresponding GSM key. Avoid hardcoding values in the base.
  • Scaling — see Operations → Scaling for HPA tuning, queue-depth metrics roadmap, and the planned WS-pool split (forward-looking design).

Migration note (one-time)

Switching from the previous single backend Deployment to the tomoda-api / tomoda-async split is not a rolling update — the Deployment names change. Argo CD will:

  1. Delete the old backend Deployment.
  2. Create tomoda-api and tomoda-async.

There is a brief window (one Service-endpoint update cycle) where the old backend-service has no endpoints. Time the deploy outside peak traffic, or pre-create tomoda-api and the new Service selector under a parallel Argo CD app, then cut over the ingress.