Dashboards¶
Six Grafana dashboards land via two paths — four custom (Tomoda-specific metrics, hand-authored JSON in ConfigMaps) and two community (infrastructure baselines, imported via gnetId from grafana.com). All visible at https://grafana.tomoda.life under the Default folder.
Inventory¶
| Dashboard | Path | Source | What it shows |
|---|---|---|---|
| Tomoda — API RED | k8s/.../manifests/dashboards/tomoda-api-red.yaml |
Custom | Request rate per route, 5xx rate per route, latency p50/p95/p99, in-flight count |
| Tomoda — WebSocket Hub | k8s/.../manifests/dashboards/tomoda-ws-hub.yaml |
Custom | Active connections per pod + total, rooms per pod, message rate by direction (inbound/outbound/fanout), broadcast latency p50/p95/p99 |
| Tomoda — Async Worker | k8s/.../manifests/dashboards/tomoda-async.yaml |
Custom | Task rate by type, failure rate by type, duration p50/p95/p99 |
| Tomoda — Business Metrics | k8s/.../manifests/dashboards/tomoda-business.yaml |
Custom | Registrations / logins / OTP sends per hour, chat message rate, events created + joins, payments |
| Redis | monitoring/values.yaml (gnetId: 11835) |
Community | Memory usage, ops/sec, slow log, key count, connected clients |
| Kubernetes cluster overview | monitoring/values.yaml (gnetId: 15760) |
Community | Node CPU/memory, pod counts, restart rates, PVC usage |
How the loading works¶
Two parallel mechanisms — pick whichever fits the use case:
Custom dashboards — ConfigMap + sidecar¶
The four Tomoda dashboards live as ConfigMap resources in the monitoring namespace, each carrying the label grafana_dashboard: "1". Grafana's sidecar container (enabled by default in kube-prometheus-stack) watches all namespaces for ConfigMaps with that label, loads the embedded JSON, and serves them under the Default dashboards folder.
Argo CD's sys-resources Application points at k8s/envs/dev/sys/manifests/ with directory.recurse: true, so dropping any new dashboard in manifests/dashboards/ automatically deploys and loads it within a sync cycle.
ConfigMap (label: grafana_dashboard=1)
└─► Grafana sidecar (watches all namespaces)
└─► writes JSON to /var/lib/grafana/dashboards/default/
└─► Grafana provisioner picks it up + reloads
Why ConfigMap-per-dashboard instead of inlining JSON in values.yaml:
- Keeps
values.yamlsmall — six dashboards × ~150 lines of JSON each would balloon the Helm values to 1500+ lines of escaped JSON noise. - Each dashboard is its own file: easier to diff, easier to edit, easier to drop a new one in.
- The sidecar reload cycle is faster than a full Helm rollout — edit a dashboard, push, Argo CD syncs the ConfigMap, sidecar hot-reloads, you see the change in Grafana within ~30 seconds.
Community dashboards — gnetId in values.yaml¶
For infrastructure baselines (Redis, Kubernetes), we use gnetId references to grafana.com — the chart fetches the dashboard JSON at Helm install/upgrade time and provisions it into Grafana. No need to hand-author dashboards that already exist as well-maintained community work.
grafana:
dashboards:
default:
redis:
gnetId: 11835
revision: 1
datasource: Prometheus
The chart resolves gnetId: 11835 → fetches https://grafana.com/api/dashboards/11835/revisions/1/download → bundles it into a Helm-generated ConfigMap → Grafana picks it up.
Authoring a new custom dashboard¶
The high-level flow:
- Build the dashboard in the Grafana UI (easiest — drag-and-drop panels, write PromQL with autocomplete).
- Export: dashboard top-right → Share → Export → Save to file (or View JSON → Copy).
- Drop the JSON into a new ConfigMap at
k8s/envs/dev/sys/manifests/dashboards/<name>.yaml. Use any existing dashboard YAML as a template — the wrapper is identical, only the JSON differs. - Set the
datasource.uidtoprometheus(orloki/tempo/sentry) so it works across all environments. - PR + Argo CD sync — appears in Grafana within ~30 seconds.
Datasource UID matters
When you export from the Grafana UI, the datasource often comes through as the literal in-cluster name (e.g., prometheus). Verify the datasource.uid field on each panel matches the data source UIDs we configure in values.yaml (prometheus, loki, tempo, sentry). If a panel shows "Datasource not found," that's the usual cause.
Don't commit credential leaks
Some Grafana exports include the datasource username/password as plaintext fields — those should never get committed. The datasource.uid reference pattern we use avoids this entirely.
Editing an existing dashboard¶
You can edit in the Grafana UI directly — it stays editable (set in our dashboardProviders config). But edits in the UI are local to the running pod: they don't survive a Grafana restart, and they're not reflected in git.
For changes that should persist:
- Edit in UI, get it right.
- Share → Export → Save to file.
- Replace the JSON in the corresponding
manifests/dashboards/<name>.yaml. - PR.
For one-off / experimental tweaks, the UI is fine — just don't be surprised when a Grafana pod restart wipes them.
Verification¶
# 1. ConfigMaps for the four custom dashboards exist in monitoring/
kubectl get cm -n monitoring -l grafana_dashboard=1
# NAME DATA AGE
# tomoda-api-red-dashboard 1 ...
# tomoda-async-dashboard 1 ...
# tomoda-business-dashboard 1 ...
# tomoda-ws-hub-dashboard 1 ...
# 2. Grafana sidecar picked them up
kubectl logs -n monitoring deploy/monitoring-grafana -c grafana-sc-dashboard | tail -20
# Should show four "Writing /var/lib/grafana/dashboards/default/<name>.json" lines
# 3. Browse to https://grafana.tomoda.life -> Dashboards -> Browse
# All six should appear under the Default folder.
If a custom dashboard doesn't appear:
- Confirm the ConfigMap has the
grafana_dashboard: "1"label exactly (case-sensitive). - Confirm the JSON is valid (
python3 -c "import yaml, json; json.loads(yaml.safe_load(open('<file>'))['data']['<key>'])"). - Check the sidecar logs for parse errors:
kubectl logs -n monitoring deploy/monitoring-grafana -c grafana-sc-dashboard.
Related docs¶
- Manual setup, Prometheus, Loki, Tempo, Sentry, Alerting — the data sources these dashboards query
- Backend metric definitions live in
tomodarepo:backend/internal/observability/