Tomoda — WS Pool Split (Planned)¶

This page describes a future deployment topology: splitting the WebSocket Hub out of tomoda-api into its own pool (tomoda-ws). The backend binary already supports the modes required (api-hub and ws-hub ship today, just unused) — what's missing is the k8s wiring, an ingress path-based routing rule, and a custom HPA metric.

This is not in the current deployment. Treat this page as the design we land on the day WS load actually warrants a separate pool. Don't pre-build it.

Why split¶

Today (tomoda-api, mode multi-hub):

HTTP API and WebSocket termination share one pool.
HPA scales on CPU + memory.
One long-lived WS connection looks much cheaper on CPU than one bursty HTTP request, so CPU-driven HPA can under-react to WS-connection growth.

Splitting tomoda-api into tomoda-api (HTTP only) and tomoda-ws (WebSocket only) gives us:

Independent scaling shape. WS pool scales on connection count (long-lived, sticky, memory-bound). API pool keeps scaling on CPU/RPS.
Different pod sizing. WS pods can be sized for connection state (more memory, less CPU). API pods stay lean.
Blast-radius reduction. A WS-related rollout (Hub change, gorilla upgrade) doesn't touch the API pool, and vice versa.

The cost is more moving parts: another Deployment, another Service, an ingress-level routing rule, and a custom WS-connection metric for the HPA.

When to do this¶

Trigger criteria, any of which is sufficient:

WS-connection count regularly exceeds 10k during peak (today: well under 1k). At that scale, CPU is no longer a good proxy for load.
tomoda-api HPA spends meaningful time pinned at maxReplicas while the HTTP request-rate metrics show idle pods. That gap is WS-driven load not visible to CPU-based scaling.
WS-related changes cause API-side incidents (a Hub bug bricks the API pool too) more than once.

Until one of these fires, the multi-hub topology is correct.

Target topology¶

Traefik

↙ ↘

/ws/*

tomoda-ws HPA on ws_connections

everything else

tomoda-api HPA on CPU + RPS

Path-based ingress split lets each pool scale on its own signal — long-lived WS connections (memory-bound) vs short bursty HTTP requests (CPU-bound).

Both pools run the same image. The Hub on tomoda-ws continues to use Redis pub/sub for cross-pod fanout — no architectural change to the WS plane, just a packaging change.

tomoda-async (worker + scheduler) is untouched.

Manifest sketch¶

In k8s/apps/tomoda/base/:

# backend-ws-deployment.yaml — Deployment tomoda-ws (mode ws-hub)
# Service ws-service (clusterIP, port 8080, selector app: tomoda-ws)

# backend-api-deployment.yaml — change SERVER_MODE: multi-hub → api-hub
# (api-hub serves /api/v1/* + /health, no /ws/* routes, no Hub)

Update kustomization.yaml to add the new file.

In network-policy.yaml, add:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tomoda-ws-policy
spec:
  podSelector:
    matchLabels:
      app: tomoda-ws
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: traefik-system
      ports:
        - protocol: TCP
          port: 8080

In ingress.yaml, replace the single-path backend-service rule with two paths — /ws/* → ws-service, default → backend-service:

- host: api.tomoda.life
  http:
    paths:
      - path: /ws
        pathType: Prefix
        backend:
          service:
            name: ws-service
            port: { number: 8080 }
      - path: /
        pathType: Prefix
        backend:
          service:
            name: backend-service
            port: { number: 8080 }

Traefik picks the longer prefix first, so /ws/chat/... lands on the WS pool and everything else stays on the API pool.

HPA: custom WS connection metric¶

CPU is the wrong signal for the WS pool. We want websocket_connections_per_pod. The cheapest path is:

Expose a tomoda_ws_connections Prometheus gauge from the Hub (Hub.GetGlobalConnectionCount() — sum of all room sizes; add this when implementing). One scrape per pod, no Redis hop.
Install prometheus-adapter to expose the gauge as a Kubernetes custom metric.
Reference it from tomoda-ws-hpa:

spec:
  metrics:
    - type: Pods
      pods:
        metric:
          name: tomoda_ws_connections
        target:
          type: AverageValue
          averageValue: "500"   # scale to keep <=500 conns/pod

Tune averageValue once we have steady-state telemetry. Until prometheus-adapter is in place, the tomoda-ws HPA can fall back to memory-based scaling (each WS connection has a known memory cost; a memory utilisation target indirectly tracks connection count).

Migration plan (when triggered)¶

Sequence so we don't drop connections:

Land the new tomoda-ws Deployment + Service in the cluster (replicas 0). Verify it builds and reconciles via Argo CD without any traffic.
Bring it to 2 replicas, confirm /health succeeds on the new pods.
Add the ingress /ws path pointing at ws-service. New WS connections start landing on the WS pool; existing connections remain on tomoda-api pods until they reconnect (Hub is per-pod; old pods still own their sockets).
Flip tomoda-api from multi-hub to api-hub. Existing sockets on api pods are torn down on the rollout; clients reconnect to the WS pool via the new ingress path.
Drop unused metrics / HPA from tomoda-api; deploy custom-metric HPA for tomoda-ws.

Pre-requisites (do before step 1):

prometheus-adapter installed and tested.
tomoda_ws_connections gauge exposed by the backend.
Frontend reconnect path verified to handle the rolling pod swap gracefully (it already does — see docs/architecture/realtime.md in the tomoda repo).

Why this isn't done now¶

Current scale doesn't need it (see "When to do this").
The mode flags exist precisely so this becomes a deployment-only change, not a code change. We're trading some YAML complexity for the option to do this fast when it's needed.
Ingress path-based routing adds operational surface (one more failure mode to debug) that isn't worth carrying until the scale signal justifies it.