Tomoda — WS Pool Split (Planned)¶
This page describes a future deployment topology: splitting the WebSocket
Hub out of tomoda-api into its own pool (tomoda-ws). The backend binary
already supports the modes required (api-hub and ws-hub ship today, just
unused) — what's missing is the k8s wiring, an ingress path-based routing
rule, and a custom HPA metric.
This is not in the current deployment. Treat this page as the design we land on the day WS load actually warrants a separate pool. Don't pre-build it.
Why split¶
Today (tomoda-api, mode multi-hub):
- HTTP API and WebSocket termination share one pool.
- HPA scales on CPU + memory.
- One long-lived WS connection looks much cheaper on CPU than one bursty HTTP request, so CPU-driven HPA can under-react to WS-connection growth.
Splitting tomoda-api into tomoda-api (HTTP only) and tomoda-ws
(WebSocket only) gives us:
- Independent scaling shape. WS pool scales on connection count (long-lived, sticky, memory-bound). API pool keeps scaling on CPU/RPS.
- Different pod sizing. WS pods can be sized for connection state (more memory, less CPU). API pods stay lean.
- Blast-radius reduction. A WS-related rollout (Hub change, gorilla upgrade) doesn't touch the API pool, and vice versa.
The cost is more moving parts: another Deployment, another Service, an ingress-level routing rule, and a custom WS-connection metric for the HPA.
When to do this¶
Trigger criteria, any of which is sufficient:
- WS-connection count regularly exceeds 10k during peak (today: well under 1k). At that scale, CPU is no longer a good proxy for load.
tomoda-apiHPA spends meaningful time pinned atmaxReplicaswhile the HTTP request-rate metrics show idle pods. That gap is WS-driven load not visible to CPU-based scaling.- WS-related changes cause API-side incidents (a Hub bug bricks the API pool too) more than once.
Until one of these fires, the multi-hub topology is correct.
Target topology¶
flowchart LR
user[User] --> traefik[Traefik]
traefik -->|/ws/*| ws[tomoda-ws<br/>SERVER_MODE=ws-hub<br/>HPA on ws_connections]
traefik -->|everything else| api[tomoda-api<br/>SERVER_MODE=api-hub<br/>HPA on CPU + RPS]
ws -->|publish/subscribe<br/>chat:event:*| redis[(Redis)]
api --> pg[(Postgres)]
api --> redis
ws --> pg
Both pools run the same image. The Hub on tomoda-ws continues to use
Redis pub/sub for cross-pod fanout — no architectural change to the WS
plane, just a packaging change.
tomoda-async (worker + scheduler) is untouched.
Manifest sketch¶
In k8s/apps/tomoda/base/:
# backend-ws-deployment.yaml — Deployment tomoda-ws (mode ws-hub)
# Service ws-service (clusterIP, port 8080, selector app: tomoda-ws)
# backend-api-deployment.yaml — change SERVER_MODE: multi-hub → api-hub
# (api-hub serves /api/v1/* + /health, no /ws/* routes, no Hub)
Update kustomization.yaml to add the new file.
In network-policy.yaml, add:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tomoda-ws-policy
spec:
podSelector:
matchLabels:
app: tomoda-ws
policyTypes: [Ingress]
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: traefik-system
ports:
- protocol: TCP
port: 8080
In ingress.yaml, replace the single-path backend-service rule with two
paths — /ws/* → ws-service, default → backend-service:
- host: api.tomoda.life
http:
paths:
- path: /ws
pathType: Prefix
backend:
service:
name: ws-service
port: { number: 8080 }
- path: /
pathType: Prefix
backend:
service:
name: backend-service
port: { number: 8080 }
Traefik picks the longer prefix first, so /ws/chat/... lands on the
WS pool and everything else stays on the API pool.
HPA: custom WS connection metric¶
CPU is the wrong signal for the WS pool. We want
websocket_connections_per_pod. The cheapest path is:
- Expose a
tomoda_ws_connectionsPrometheus gauge from the Hub (Hub.GetGlobalConnectionCount()— sum of all room sizes; add this when implementing). One scrape per pod, no Redis hop. - Install
prometheus-adapterto expose the gauge as a Kubernetes custom metric. - Reference it from
tomoda-ws-hpa:
spec:
metrics:
- type: Pods
pods:
metric:
name: tomoda_ws_connections
target:
type: AverageValue
averageValue: "500" # scale to keep <=500 conns/pod
Tune averageValue once we have steady-state telemetry. Until
prometheus-adapter is in place, the tomoda-ws HPA can fall back to
memory-based scaling (each WS connection has a known memory cost; a
memory utilisation target indirectly tracks connection count).
Migration plan (when triggered)¶
Sequence so we don't drop connections:
- Land the new
tomoda-wsDeployment + Service in the cluster (replicas 0). Verify it builds and reconciles via Argo CD without any traffic. - Bring it to 2 replicas, confirm
/healthsucceeds on the new pods. - Add the ingress
/wspath pointing atws-service. New WS connections start landing on the WS pool; existing connections remain ontomoda-apipods until they reconnect (Hub is per-pod; old pods still own their sockets). - Flip
tomoda-apifrommulti-hubtoapi-hub. Existing sockets on api pods are torn down on the rollout; clients reconnect to the WS pool via the new ingress path. - Drop unused metrics / HPA from
tomoda-api; deploy custom-metric HPA fortomoda-ws.
Pre-requisites (do before step 1):
- prometheus-adapter installed and tested.
tomoda_ws_connectionsgauge exposed by the backend.- Frontend reconnect path verified to handle the rolling pod swap gracefully (it already does — see the tomoda repo's realtime docs).
Why this isn't done now¶
- Current scale doesn't need it (see "When to do this").
- The mode flags exist precisely so this becomes a deployment-only change, not a code change. We're trading some YAML complexity for the option to do this fast when it's needed.
- Ingress path-based routing adds operational surface (one more failure mode to debug) that isn't worth carrying until the scale signal justifies it.