Tomoda¶

The Tomoda application — Go backend plus Expo/React frontend — deployed as a Kustomize base with dev and prod overlays under k8s/apps/tomoda/. Argo CD reconciles the overlay output into the tomoda-dev namespace (dev) or prod namespace (prod) of the single GKE cluster.

For backend code, schemas, and migrations, see docs/backend/ in the tomoda repo.

Topology¶

The backend is one image, two Deployments: an API pool that serves HTTP + WebSocket (tomoda-api, mode multi-hub) and an async pool that runs the Asynq server + scheduler (tomoda-async, mode async). The mode is selected at startup via the SERVER_MODE env var on each Deployment. See SCALING_PLAN.md and docs/architecture/decisions.md (section "single-image-multiple-modes") in the tomoda repo for the rationale.

tomoda binary one image, two roles via SERVER_MODE

↙ ↘

tomoda-api SERVER_MODE=multi-hub · HTTP + WS

tomoda-async SERVER_MODE=async · Asynq server + scheduler

Each pool gets its own HPA, Service (or none), and NetworkPolicy. Same image, no rebuild between them.

NetworkPolicies restrict ingress on every pod to the traefik-system namespace (api / frontend) or to nothing at all (tomoda-async, which has no inbound traffic — it pulls from Redis). See Network policies.

Base (`k8s/apps/tomoda/base/`)¶

kustomization.yaml aggregates five manifests and declares the dev image refs (tomoda-backend:latest / tomoda-frontend:latest from tomoda-dev-repo). Overlays rewrite the image registry for prod.

`backend-api-deployment.yaml`¶

Deployment tomoda-api plus the ClusterIP Service backend-service.

Replicas: 2 (base). Prod HPA scales 2–6; dev HPA scales 1–3.
Mode: SERVER_MODE=multi-hub — HTTP API + WebSocket Hub. Cross-pod WS fanout is via Redis pub/sub on chat:event:* (no ingress session affinity required).
Probes: /health for liveness (15s initial, 30s period) and readiness (10s initial, 10s period). The binary always serves /health regardless of mode.
Resources: requests 64Mi / 150m, limits 512Mi / 500m — I/O-bound. The 150m CPU request (raised from a lower value) keeps HPA utilisation stable and prevents flapping / node over-scaling.

The Service stays named backend-service so the existing Ingress and any intra-cluster callers keep working unchanged. Its selector is app: tomoda-api (async pods don't serve traffic).

`backend-async-deployment.yaml`¶

Deployment tomoda-async — no Service.

Replicas: 2 (base). Prod HPA scales 2–6; dev HPA scales 1–3.
Mode: SERVER_MODE=async — Asynq server + Asynq scheduler. Every replica runs both. The scheduler's asynq.Unique per cron entry dedups enqueues across replicas so only one wins per tick. All replicas consume from the shared Asynq queues.
Probes: same /health endpoint (the binary always serves it).
Resources: requests 128Mi / 150m, limits 768Mi / 500m — more memory headroom than the api pool because purge/cleanup tasks can spike. The 150m CPU request (raised from a lower value) keeps HPA utilisation stable and prevents flapping / node over-scaling.

Both Deployments share the same configuration sources:

backend-secrets (envFrom.secretRef) — overlay-renamed to backend-secrets-dev or backend-secrets-prod. Synced from GCP Secret Manager via ExternalSecrets. Contains JWT, encryption, DB and Redis passwords, OAuth client secrets (Google / LINE / Apple), Stripe keys, email API key, KLIPY API key.
s3-uploader-secret (envFrom.secretRef) — synced from AWS Secrets Manager. Holds the AWS access key, bucket name, region, and base URL for user-uploaded media.
backend-config-<env> (added by the overlay patch as a configMapRef) — non-secret runtime config: DB host/port/user/name, Redis host/port, WebAuthn RP ID, frontend URL, ENV.

PHOTON_URL=http://photon.platform.svc.cluster.local:2322 is set inline.

`frontend-deployment.yaml`¶

Single-replica Deployment serving the Expo web bundle on port 8081. EXPO_PUBLIC_API_URL is baked into the base as https://api-dev.tomoda.life; the production build uses a different image (rebuilt against https://api.tomoda.life) rather than overriding the env var at runtime. Liveness and readiness both probe /.

`ingress.yaml`¶

A single Traefik-class Ingress with two rules in the base — api-dev.tomoda.life routes to backend-service:8080, app-dev.tomoda.life routes to frontend-service:8081. TLS is terminated by cert-manager via the letsencrypt-prod ClusterIssuer; the certificate is stored in tomoda-app-tls.

`network-policy.yaml`¶

Three NetworkPolicies:

tomoda-api-policy — ingress from traefik-system only, port 8080.
tomoda-async-policy — no ingress rules (default-deny). Async pods initiate connections to Redis and Postgres; nothing dials them.
frontend-policy — ingress from traefik-system only, port 8081.

Dev overlay (`k8s/apps/tomoda/overlays/dev/`)¶

namespace: tomoda
commonLabels:
  env: dev

Adds two ExternalSecret resources, one ConfigMap, and two HPAs on top of the base:

backend-secrets-dev — references ClusterSecretStore/gsm-tomoda (GCP Secret Manager) and maps every key the backend needs from a tomoda-* GSM secret. Refresh interval: 1h.
s3-uploader-secret — references the namespace-scoped SecretStore/aws-sm-dev (AWS Secrets Manager) and pulls the tomoda-s3-uploader-dev JSON, splitting it into the AWS env vars the backend reads. The store's own key is projected from GCP SM via aws-eso-credentials-dev — see Secrets Management.
backend-config-dev — DB host postgres-postgresql.data.svc.cluster.local, DB name tomoda_dev, user tomoda_dev_user, DB_SSLMODE=disable; Redis host redis-dev-master.data.svc.cluster.local; WebAuthn RP ID api-dev.tomoda.life; frontend URL https://app-dev.tomoda.life.
hpa-api.yaml / hpa-async.yaml — CPU 70%, memory 80%, bounds 1–3 (dev gets less traffic; lower floor saves cost).

JSON-patches rewire envFrom[0] to the dev secret name and append the ConfigMap as an additional envFrom entry — applied separately to both tomoda-api and tomoda-async. Ingress hostnames stay as the base defaults.

Prod overlay (`k8s/apps/tomoda/overlays/prod/`)¶

namespace: prod
commonLabels:
  env: prod

Adds five extra resources and rewrites images plus the Ingress:

backend-secrets-prod + s3-uploader-secret — same shape as dev, but the S3 ESO pulls tomoda-s3-uploader-prod and uses the prod tenant's own SecretStore/aws-sm-prod. App secrets still come through the cluster-wide gsm-tomoda store; only the source secret names + the per-env AWS store differ.
backend-config-prod — DB host postgres-prod-postgresql.data.svc.cluster.local, DB name tomoda_prod, user tomoda_prod_user, DB_SSLMODE=require; Redis host redis-prod-master.data.svc.cluster.local; WebAuthn RP ID api.tomoda.life; frontend URL https://app.tomoda.life.
pdb.yaml — two PodDisruptionBudgets, one each for tomoda-api and tomoda-async, both minAvailable: 50%. With replicas ≥ 2 on both pools, voluntary disruption (node drain, GKE upgrade, spot preemption rebalancing) keeps at least one pod of each pool serving throughout.
hpa-api.yaml / hpa-async.yaml — CPU 70%, memory 80%, bounds 2–6. The async HPA scales down more slowly (600s window) than the api HPA (300s) so in-flight worker tasks don't get cut short. See Scaling for tuning guidance and the planned queue-depth metric.

The Kustomize images block rewrites both image names to point at the prod Artifact Registry repo (tomoda-prod-repo instead of tomoda-dev-repo). Patches flip imagePullPolicy to IfNotPresent on both backend pools (matching tagged release images), and rewrite the Ingress to add api.tomoda.life, app.tomoda.life, and a third rule for www.tomoda.life that points at the frontend.

Operations¶

Deploy — Argo CD Image Updater bumps the :latest tag on dev sync; prod deploys are gated on Cloud Build firing from a semver Git tag. See Argo CD.
Datastore deps — Postgres and Redis must exist in the data namespace before backend pods become ready. The DSNs are baked into the ConfigMaps above.
Secrets rotation — rotate the value in GSM or AWS SM; ExternalSecrets refreshes within an hour. Pod restart is needed for changes that are read at boot — restart both pools.
Adding env vars — non-secret defaults go in the overlay's ConfigMap; secrets go in the matching ExternalSecret + the corresponding GSM key. Avoid hardcoding values in the base.
Scaling — see Operations → Scaling for HPA tuning, queue-depth metrics roadmap, and the planned WS-pool split (forward-looking design).

Migration note (one-time)¶

Switching from the previous single backend Deployment to the tomoda-api / tomoda-async split is not a rolling update — the Deployment names change. Argo CD will:

Delete the old backend Deployment.
Create tomoda-api and tomoda-async.

There is a brief window (one Service-endpoint update cycle) where the old backend-service has no endpoints. Time the deploy outside peak traffic, or pre-create tomoda-api and the new Service selector under a parallel Argo CD app, then cut over the ingress.

Tomoda¶

Topology¶

Base (k8s/apps/tomoda/base/)¶

backend-api-deployment.yaml¶

backend-async-deployment.yaml¶

frontend-deployment.yaml¶

ingress.yaml¶

network-policy.yaml¶

Dev overlay (k8s/apps/tomoda/overlays/dev/)¶

Prod overlay (k8s/apps/tomoda/overlays/prod/)¶