Skip to content

Tomoda — WS Pool Split (Planned)

This page describes a future deployment topology: splitting the WebSocket Hub out of tomoda-api into its own pool (tomoda-ws). The backend binary already supports the modes required (api-hub and ws-hub ship today, just unused) — what's missing is the k8s wiring, an ingress path-based routing rule, and a custom HPA metric.

This is not in the current deployment. Treat this page as the design we land on the day WS load actually warrants a separate pool. Don't pre-build it.

Why split

Today (tomoda-api, mode multi-hub):

  • HTTP API and WebSocket termination share one pool.
  • HPA scales on CPU + memory.
  • One long-lived WS connection looks much cheaper on CPU than one bursty HTTP request, so CPU-driven HPA can under-react to WS-connection growth.

Splitting tomoda-api into tomoda-api (HTTP only) and tomoda-ws (WebSocket only) gives us:

  • Independent scaling shape. WS pool scales on connection count (long-lived, sticky, memory-bound). API pool keeps scaling on CPU/RPS.
  • Different pod sizing. WS pods can be sized for connection state (more memory, less CPU). API pods stay lean.
  • Blast-radius reduction. A WS-related rollout (Hub change, gorilla upgrade) doesn't touch the API pool, and vice versa.

The cost is more moving parts: another Deployment, another Service, an ingress-level routing rule, and a custom WS-connection metric for the HPA.

When to do this

Trigger criteria, any of which is sufficient:

  • WS-connection count regularly exceeds 10k during peak (today: well under 1k). At that scale, CPU is no longer a good proxy for load.
  • tomoda-api HPA spends meaningful time pinned at maxReplicas while the HTTP request-rate metrics show idle pods. That gap is WS-driven load not visible to CPU-based scaling.
  • WS-related changes cause API-side incidents (a Hub bug bricks the API pool too) more than once.

Until one of these fires, the multi-hub topology is correct.

Target topology

flowchart LR
    user[User] --> traefik[Traefik]
    traefik -->|/ws/*| ws[tomoda-ws<br/>SERVER_MODE=ws-hub<br/>HPA on ws_connections]
    traefik -->|everything else| api[tomoda-api<br/>SERVER_MODE=api-hub<br/>HPA on CPU + RPS]
    ws -->|publish/subscribe<br/>chat:event:*| redis[(Redis)]
    api --> pg[(Postgres)]
    api --> redis
    ws --> pg

Both pools run the same image. The Hub on tomoda-ws continues to use Redis pub/sub for cross-pod fanout — no architectural change to the WS plane, just a packaging change.

tomoda-async (worker + scheduler) is untouched.

Manifest sketch

In k8s/apps/tomoda/base/:

# backend-ws-deployment.yaml — Deployment tomoda-ws (mode ws-hub)
# Service ws-service (clusterIP, port 8080, selector app: tomoda-ws)

# backend-api-deployment.yaml — change SERVER_MODE: multi-hub → api-hub
# (api-hub serves /api/v1/* + /health, no /ws/* routes, no Hub)

Update kustomization.yaml to add the new file.

In network-policy.yaml, add:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tomoda-ws-policy
spec:
  podSelector:
    matchLabels:
      app: tomoda-ws
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
      ports:
        - protocol: TCP
          port: 8080

In ingress.yaml, replace the single-path backend-service rule with two paths — /ws/*ws-service, default → backend-service:

- host: api.tomoda.life
  http:
    paths:
      - path: /ws
        pathType: Prefix
        backend:
          service:
            name: ws-service
            port: { number: 8080 }
      - path: /
        pathType: Prefix
        backend:
          service:
            name: backend-service
            port: { number: 8080 }

Traefik picks the longer prefix first, so /ws/chat/... lands on the WS pool and everything else stays on the API pool.

HPA: custom WS connection metric

CPU is the wrong signal for the WS pool. We want websocket_connections_per_pod. The cheapest path is:

  1. Expose a tomoda_ws_connections Prometheus gauge from the Hub (Hub.GetGlobalConnectionCount() — sum of all room sizes; add this when implementing). One scrape per pod, no Redis hop.
  2. Install prometheus-adapter to expose the gauge as a Kubernetes custom metric.
  3. Reference it from tomoda-ws-hpa:
spec:
  metrics:
    - type: Pods
      pods:
        metric:
          name: tomoda_ws_connections
        target:
          type: AverageValue
          averageValue: "500"   # scale to keep <=500 conns/pod

Tune averageValue once we have steady-state telemetry. Until prometheus-adapter is in place, the tomoda-ws HPA can fall back to memory-based scaling (each WS connection has a known memory cost; a memory utilisation target indirectly tracks connection count).

Migration plan (when triggered)

Sequence so we don't drop connections:

  1. Land the new tomoda-ws Deployment + Service in the cluster (replicas 0). Verify it builds and reconciles via Argo CD without any traffic.
  2. Bring it to 2 replicas, confirm /health succeeds on the new pods.
  3. Add the ingress /ws path pointing at ws-service. New WS connections start landing on the WS pool; existing connections remain on tomoda-api pods until they reconnect (Hub is per-pod; old pods still own their sockets).
  4. Flip tomoda-api from multi-hub to api-hub. Existing sockets on api pods are torn down on the rollout; clients reconnect to the WS pool via the new ingress path.
  5. Drop unused metrics / HPA from tomoda-api; deploy custom-metric HPA for tomoda-ws.

Pre-requisites (do before step 1):

  • prometheus-adapter installed and tested.
  • tomoda_ws_connections gauge exposed by the backend.
  • Frontend reconnect path verified to handle the rolling pod swap gracefully (it already does — see the tomoda repo's realtime docs).

Why this isn't done now

  • Current scale doesn't need it (see "When to do this").
  • The mode flags exist precisely so this becomes a deployment-only change, not a code change. We're trading some YAML complexity for the option to do this fast when it's needed.
  • Ingress path-based routing adds operational surface (one more failure mode to debug) that isn't worth carrying until the scale signal justifies it.