Dashboards¶

Grafana dashboards land via two paths — custom (Tomoda-specific metrics, hand-authored JSON in ConfigMaps) and community (infrastructure baselines, imported via gnetId from grafana.com). Visible at https://grafana.tomoda.life. Most sit in the Default folder; the Costs dashboard uses its own Costs folder via the grafana_folder annotation.

Inventory¶

Dashboard	Path	Source	What it shows
Tomoda — API RED	`k8s/.../manifests/dashboards/tomoda-api-red.yaml`	Custom	Request rate per route, 5xx rate per route, latency p50/p95/p99, in-flight count
Tomoda — WebSocket Hub	`k8s/.../manifests/dashboards/tomoda-ws-hub.yaml`	Custom	Active connections per pod + total, rooms per pod, message rate by direction (inbound/outbound/fanout), broadcast latency p50/p95/p99
Tomoda — Async Worker	`k8s/.../manifests/dashboards/tomoda-async.yaml`	Custom	Task rate by type, failure rate by type, duration p50/p95/p99
Tomoda — Business Metrics	`k8s/.../manifests/dashboards/tomoda-business.yaml`	Custom	Registrations / logins / OTP sends per hour, chat message rate, events created + joins, payments
Tomoda — Semantic Resolver	`k8s/.../manifests/dashboards/tomoda-semantic-{env}.yaml`	Custom	LLM (DeepSeek) call volume, success rate, latency p50/p95/p99, prompt-cache hit ratio, dollarized cost per hour + 24h burn, tokens by direction × cache. Also: search-enrichment provider request rate, latency, errors, and 24h query counts (billed on the provider's per-plan tier — check their billing page).
Tomoda — Costs	`k8s/.../manifests/dashboards/tomoda-costs.yaml`	Custom	AWS + GCP MTD by service with totals + month-end projection, DeepSeek balance gauge + trend, Serper credits + burn rate, prepaid runway. Folder: Costs. See Cost monitoring.
Redis	`monitoring/values.yaml` (`gnetId: 11835`)	Community	Memory usage, ops/sec, slow log, key count, connected clients
Kubernetes cluster overview	`monitoring/values.yaml` (`gnetId: 15760`)	Community	Node CPU/memory, pod counts, restart rates, PVC usage

How the loading works¶

Two parallel mechanisms — pick whichever fits the use case:

Custom dashboards — ConfigMap + sidecar¶

The four Tomoda dashboards live as ConfigMap resources in the monitoring namespace, each carrying the label grafana_dashboard: "1". Grafana's sidecar container (enabled by default in kube-prometheus-stack) watches all namespaces for ConfigMaps with that label, loads the embedded JSON, and serves them under the Default dashboards folder.

Argo CD's sys-resources Application points at k8s/envs/platform/manifests/ with directory.recurse: true, so dropping any new dashboard in manifests/dashboards/ automatically deploys and loads it within a sync cycle.

ConfigMap (label: grafana_dashboard=1)
   └─► Grafana sidecar (watches all namespaces)
         └─► writes JSON to /var/lib/grafana/dashboards/default/
              └─► Grafana provisioner picks it up + reloads

Why ConfigMap-per-dashboard instead of inlining JSON in values.yaml:

Keeps values.yaml small — six dashboards × ~150 lines of JSON each would balloon the Helm values to 1500+ lines of escaped JSON noise.
Each dashboard is its own file: easier to diff, easier to edit, easier to drop a new one in.
The sidecar reload cycle is faster than a full Helm rollout — edit a dashboard, push, Argo CD syncs the ConfigMap, sidecar hot-reloads, you see the change in Grafana within ~30 seconds.

Community dashboards — `gnetId` in `values.yaml`¶

For infrastructure baselines (Redis, Kubernetes), we use gnetId references to grafana.com — the chart fetches the dashboard JSON at Helm install/upgrade time and provisions it into Grafana. No need to hand-author dashboards that already exist as well-maintained community work.

grafana:
  dashboards:
    default:
      redis:
        gnetId: 11835
        revision: 1
        datasource: Prometheus

The chart resolves gnetId: 11835 → fetches https://grafana.com/api/dashboards/11835/revisions/1/download → bundles it into a Helm-generated ConfigMap → Grafana picks it up.

Authoring a new custom dashboard¶

The high-level flow:

Build the dashboard in the Grafana UI (easiest — drag-and-drop panels, write PromQL with autocomplete).
Export: dashboard top-right → Share → Export → Save to file (or View JSON → Copy).
Drop the JSON into a new ConfigMap at k8s/envs/platform/manifests/dashboards/<name>.yaml. Use any existing dashboard YAML as a template — the wrapper is identical, only the JSON differs.
Set the datasource.uid to prometheus (or loki / tempo / sentry) so it works across all environments.
PR + Argo CD sync — appears in Grafana within ~30 seconds.

Datasource UID matters

When you export from the Grafana UI, the datasource often comes through as the literal in-cluster name (e.g., prometheus). Verify the datasource.uid field on each panel matches the data source UIDs we configure in values.yaml (prometheus, loki, tempo, sentry). If a panel shows "Datasource not found," that's the usual cause.

Don't commit credential leaks

Some Grafana exports include the datasource username/password as plaintext fields — those should never get committed. The datasource.uid reference pattern we use avoids this entirely.

Editing an existing dashboard¶

You can edit in the Grafana UI directly — it stays editable (set in our dashboardProviders config). But edits in the UI are local to the running pod: they don't survive a Grafana restart, and they're not reflected in git.

For changes that should persist:

Edit in UI, get it right.
Share → Export → Save to file.
Replace the JSON in the corresponding manifests/dashboards/<name>.yaml.
PR.

For one-off / experimental tweaks, the UI is fine — just don't be surprised when a Grafana pod restart wipes them.

Verification¶

# 1. ConfigMaps for the four custom dashboards exist in monitoring/
kubectl get cm -n monitoring -l grafana_dashboard=1
#   NAME                                  DATA   AGE
#   tomoda-api-red-dashboard              1      ...
#   tomoda-async-dashboard                1      ...
#   tomoda-business-dashboard             1      ...
#   tomoda-ws-hub-dashboard               1      ...

# 2. Grafana sidecar picked them up
kubectl logs -n monitoring deploy/monitoring-grafana -c grafana-sc-dashboard | tail -20
#   Should show four "Writing /var/lib/grafana/dashboards/default/<name>.json" lines

# 3. Browse to https://grafana.tomoda.life -> Dashboards -> Browse
#    All six should appear under the Default folder.

If a custom dashboard doesn't appear:

Confirm the ConfigMap has the grafana_dashboard: "1" label exactly (case-sensitive).
Confirm the JSON is valid (python3 -c "import yaml, json; json.loads(yaml.safe_load(open('<file>'))['data']['<key>'])").
Check the sidecar logs for parse errors: kubectl logs -n monitoring deploy/monitoring-grafana -c grafana-sc-dashboard.

Manual setup, Prometheus, Loki, Tempo, Sentry, Alerting — the data sources these dashboards query
Backend metric definitions live in tomoda repo: backend/internal/observability/