Tomoda¶
The Tomoda application — Go backend plus Expo/React frontend — deployed as a
Kustomize base with dev and prod overlays under k8s/apps/tomoda/. Argo CD
reconciles the overlay output into the tomoda namespace (dev) or prod
namespace (prod) of the single GKE cluster.
For backend code, schemas, and migrations, see the Tomoda backend docs.
Topology¶
The backend is one image, two Deployments: an API pool that serves HTTP +
WebSocket (tomoda-api, mode multi-hub) and an async pool that runs the
Asynq worker plus the Redis-leader-elected scheduler (tomoda-async, mode
async). The mode is selected at startup via the SERVER_MODE env var on
each Deployment. See the tomoda repo's
SCALING_PLAN.md
and
architecture decision
for the rationale.
flowchart LR
user[User] --> cf[Cloudflare DNS<br/>tomoda.life]
cf --> lb[GCP LB<br/>asia-east1]
lb --> traefik[Traefik<br/>traefik-system ns]
traefik -->|api.* / api-dev.*| api[tomoda-api<br/>SERVER_MODE=multi-hub<br/>replicas 2–6]
traefik -->|app.* / app-dev.* / www.*| frontend[frontend<br/>Deployment :8081]
api -->|publish chat:event:*| redis[(Redis<br/>data ns)]
redis -->|subscribe chat:event:*| api
async[tomoda-async<br/>SERVER_MODE=async<br/>replicas 2–6] -->|enqueue / consume| redis
async -->|leader lock<br/>scheduler:leader| redis
api --> pg[(Postgres<br/>data ns)]
async --> pg
api --> photon[photon.data:2322]
NetworkPolicies restrict ingress on every pod to the traefik-system
namespace (api / frontend) or to nothing at all (tomoda-async, which has no
inbound traffic — it pulls from Redis). See
Network policies.
Base (k8s/apps/tomoda/base/)¶
kustomization.yaml aggregates five manifests and declares the dev image
refs (tomoda-backend:latest / tomoda-frontend:latest from
tomoda-dev-repo). Overlays rewrite the image registry for prod.
backend-api-deployment.yaml¶
Deployment tomoda-api plus the ClusterIP Service backend-service.
- Replicas: 2 (base). Prod HPA scales 2–6; dev HPA scales 1–3.
- Mode:
SERVER_MODE=multi-hub— HTTP API + WebSocket Hub. Cross-pod WS fanout is via Redis pub/sub onchat:event:*(no ingress session affinity required). - Probes:
/healthfor liveness (15s initial, 30s period) and readiness (10s initial, 10s period). The binary always serves/healthregardless of mode. - Resources: requests
64Mi/10m, limits512Mi/500m— I/O-bound. Tune via the HPA once steady-state utilisation is known.
The Service stays named backend-service so the existing Ingress and any
intra-cluster callers keep working unchanged. Its selector is
app: tomoda-api (async pods don't serve traffic).
backend-async-deployment.yaml¶
Deployment tomoda-async — no Service.
- Replicas: 2 (base). Prod HPA scales 2–6; dev HPA scales 1–3.
- Mode:
SERVER_MODE=async— Asynq worker + scheduler. The scheduler is leader-elected via RedisSetNXonscheduler:leader(10s TTL), so multiple replicas are safe: exactly one becomes leader and dispatches cron tasks, all replicas consume the ready queue. - Probes: same
/healthendpoint (the binary always serves it). - Resources: requests
128Mi/10m, limits768Mi/500m— more memory headroom than the api pool because purge/cleanup tasks can spike.
Both Deployments share the same configuration sources:
backend-secrets(envFrom.secretRef) — overlay-renamed tobackend-secrets-devorbackend-secrets-prod. Synced from GCP Secret Manager via ExternalSecrets. Contains JWT, encryption, DB and Redis passwords, OAuth client secrets (Google / LINE / Apple), Stripe keys, email API key, KLIPY API key.s3-uploader-secret(envFrom.secretRef) — synced from AWS Secrets Manager. Holds the AWS access key, bucket name, region, and base URL for user-uploaded media.backend-config-<env>(added by the overlay patch as aconfigMapRef) — non-secret runtime config: DB host/port/user/name, Redis host/port, WebAuthn RP ID, frontend URL,ENV.
GIN_MODE=debug (overlay flips to release for prod) and
PHOTON_URL=http://photon.data.svc.cluster.local:2322 are set inline.
frontend-deployment.yaml¶
Single-replica Deployment serving the Expo web bundle on port 8081.
EXPO_PUBLIC_API_URL is baked into the base as https://api-dev.tomoda.life;
the production build uses a different image (rebuilt against
https://api.tomoda.life) rather than overriding the env var at runtime.
Liveness and readiness both probe /.
ingress.yaml¶
A single Traefik-class Ingress with two rules in the base —
api-dev.tomoda.life routes to backend-service:8080,
app-dev.tomoda.life routes to frontend-service:8081. TLS is terminated
by cert-manager via the letsencrypt-prod ClusterIssuer; the certificate is
stored in tomoda-app-tls.
network-policy.yaml¶
Three NetworkPolicies:
tomoda-api-policy— ingress fromtraefik-systemonly, port 8080.tomoda-async-policy— no ingress rules (default-deny). Async pods initiate connections to Redis and Postgres; nothing dials them.frontend-policy— ingress fromtraefik-systemonly, port 8081.
Dev overlay (k8s/apps/tomoda/overlays/dev/)¶
namespace: tomoda
commonLabels:
env: dev
Adds two ExternalSecret resources, one ConfigMap, and two HPAs on top of the base:
backend-secrets-dev— referencesClusterSecretStore/gsm-tomoda(GCP Secret Manager) and maps every key the backend needs from atomoda-*GSM secret. Refresh interval: 1h.s3-uploader-secret— referencesClusterSecretStore/aws-sm-tomoda(AWS Secrets Manager) and pulls thetomoda-s3-uploader-devJSON, splitting it into the AWS env vars the backend reads.backend-config-dev— DB hostpostgres-postgresql.data.svc.cluster.local, DB nametomoda_dev, usertomoda_dev_user,DB_SSLMODE=disable; Redis hostredis-master.data.svc.cluster.local; WebAuthn RP IDapi-dev.tomoda.life; frontend URLhttps://app-dev.tomoda.life.hpa-api.yaml/hpa-async.yaml— CPU 70%, memory 80%, bounds 1–3 (dev gets less traffic; lower floor saves cost).
JSON-patches rewire envFrom[0] to the dev secret name and append the
ConfigMap as an additional envFrom entry — applied separately to both
tomoda-api and tomoda-async. Ingress hostnames stay as the base
defaults.
Prod overlay (k8s/apps/tomoda/overlays/prod/)¶
namespace: prod
commonLabels:
env: prod
Adds five extra resources and rewrites images plus the Ingress:
backend-secrets-prod+s3-uploader-secret— same shape as dev, but the S3 ESO pullstomoda-s3-uploader-prodfrom AWS SM. Both still go through the samegsm-tomoda/aws-sm-tomodaClusterSecretStores; only the source secret names differ.backend-config-prod— DB hostprod-postgres-postgresql.data.svc.cluster.local, DB nametomoda_prod, usertomoda_prod_user,DB_SSLMODE=require; Redis hostprod-redis-master.data.svc.cluster.local; WebAuthn RP IDapi.tomoda.life; frontend URLhttps://app.tomoda.life.pdb.yaml— two PodDisruptionBudgets, one each fortomoda-apiandtomoda-async, bothminAvailable: 50%. With replicas ≥ 2 on both pools, voluntary disruption (node drain, GKE upgrade, spot preemption rebalancing) keeps at least one pod of each pool serving throughout.hpa-api.yaml/hpa-async.yaml— CPU 70%, memory 80%, bounds 2–6. The async HPA scales down more slowly (600s window) than the api HPA (300s) so in-flight worker tasks don't get cut short. See Scaling for tuning guidance and the planned queue-depth metric.
The Kustomize images block rewrites both image names to point at the
prod Artifact Registry repo (tomoda-prod-repo instead of
tomoda-dev-repo). Patches flip imagePullPolicy to IfNotPresent on
both backend pools (matching tagged release images), set GIN_MODE=release,
and rewrite the Ingress to add api.tomoda.life, app.tomoda.life, and a
third rule for www.tomoda.life that points at the frontend.
Operations¶
- Deploy — Argo CD Image Updater bumps the
:latesttag on dev sync; prod deploys are gated on Cloud Build firing from a semver Git tag. See Argo CD. - Datastore deps — Postgres and
Redis must exist in the
datanamespace before backend pods become ready. The DSNs are baked into the ConfigMaps above. - Secrets rotation — rotate the value in GSM or AWS SM; ExternalSecrets refreshes within an hour. Pod restart is needed for changes that are read at boot — restart both pools.
- Adding env vars — non-secret defaults go in the overlay's ConfigMap; secrets go in the matching ExternalSecret + the corresponding GSM key. Avoid hardcoding values in the base.
- Scaling — see Operations → Scaling for HPA tuning, queue-depth metrics roadmap, and the planned WS-pool split (forward-looking design).
Migration note (one-time)¶
Switching from the previous single backend Deployment to the
tomoda-api / tomoda-async split is not a rolling update — the
Deployment names change. Argo CD will:
- Delete the old
backendDeployment. - Create
tomoda-apiandtomoda-async.
There is a brief window (one Service-endpoint update cycle) where the
old backend-service has no endpoints. Time the deploy outside peak
traffic, or pre-create tomoda-api and the new Service selector under
a parallel Argo CD app, then cut over the ingress.