Skip to content

Dashboards

Six Grafana dashboards land via two paths — four custom (Tomoda-specific metrics, hand-authored JSON in ConfigMaps) and two community (infrastructure baselines, imported via gnetId from grafana.com). All visible at https://grafana.tomoda.life under the Default folder.

Inventory

Dashboard Path Source What it shows
Tomoda — API RED k8s/.../manifests/dashboards/tomoda-api-red.yaml Custom Request rate per route, 5xx rate per route, latency p50/p95/p99, in-flight count
Tomoda — WebSocket Hub k8s/.../manifests/dashboards/tomoda-ws-hub.yaml Custom Active connections per pod + total, rooms per pod, message rate by direction (inbound/outbound/fanout), broadcast latency p50/p95/p99
Tomoda — Async Worker k8s/.../manifests/dashboards/tomoda-async.yaml Custom Task rate by type, failure rate by type, duration p50/p95/p99
Tomoda — Business Metrics k8s/.../manifests/dashboards/tomoda-business.yaml Custom Registrations / logins / OTP sends per hour, chat message rate, events created + joins, payments
Redis monitoring/values.yaml (gnetId: 11835) Community Memory usage, ops/sec, slow log, key count, connected clients
Kubernetes cluster overview monitoring/values.yaml (gnetId: 15760) Community Node CPU/memory, pod counts, restart rates, PVC usage

How the loading works

Two parallel mechanisms — pick whichever fits the use case:

Custom dashboards — ConfigMap + sidecar

The four Tomoda dashboards live as ConfigMap resources in the monitoring namespace, each carrying the label grafana_dashboard: "1". Grafana's sidecar container (enabled by default in kube-prometheus-stack) watches all namespaces for ConfigMaps with that label, loads the embedded JSON, and serves them under the Default dashboards folder.

Argo CD's sys-resources Application points at k8s/envs/dev/sys/manifests/ with directory.recurse: true, so dropping any new dashboard in manifests/dashboards/ automatically deploys and loads it within a sync cycle.

ConfigMap (label: grafana_dashboard=1)
   └─► Grafana sidecar (watches all namespaces)
         └─► writes JSON to /var/lib/grafana/dashboards/default/
              └─► Grafana provisioner picks it up + reloads

Why ConfigMap-per-dashboard instead of inlining JSON in values.yaml:

  • Keeps values.yaml small — six dashboards × ~150 lines of JSON each would balloon the Helm values to 1500+ lines of escaped JSON noise.
  • Each dashboard is its own file: easier to diff, easier to edit, easier to drop a new one in.
  • The sidecar reload cycle is faster than a full Helm rollout — edit a dashboard, push, Argo CD syncs the ConfigMap, sidecar hot-reloads, you see the change in Grafana within ~30 seconds.

Community dashboards — gnetId in values.yaml

For infrastructure baselines (Redis, Kubernetes), we use gnetId references to grafana.com — the chart fetches the dashboard JSON at Helm install/upgrade time and provisions it into Grafana. No need to hand-author dashboards that already exist as well-maintained community work.

grafana:
  dashboards:
    default:
      redis:
        gnetId: 11835
        revision: 1
        datasource: Prometheus

The chart resolves gnetId: 11835 → fetches https://grafana.com/api/dashboards/11835/revisions/1/download → bundles it into a Helm-generated ConfigMap → Grafana picks it up.

Authoring a new custom dashboard

The high-level flow:

  1. Build the dashboard in the Grafana UI (easiest — drag-and-drop panels, write PromQL with autocomplete).
  2. Export: dashboard top-right → Share → Export → Save to file (or View JSON → Copy).
  3. Drop the JSON into a new ConfigMap at k8s/envs/dev/sys/manifests/dashboards/<name>.yaml. Use any existing dashboard YAML as a template — the wrapper is identical, only the JSON differs.
  4. Set the datasource.uid to prometheus (or loki / tempo / sentry) so it works across all environments.
  5. PR + Argo CD sync — appears in Grafana within ~30 seconds.

Datasource UID matters

When you export from the Grafana UI, the datasource often comes through as the literal in-cluster name (e.g., prometheus). Verify the datasource.uid field on each panel matches the data source UIDs we configure in values.yaml (prometheus, loki, tempo, sentry). If a panel shows "Datasource not found," that's the usual cause.

Don't commit credential leaks

Some Grafana exports include the datasource username/password as plaintext fields — those should never get committed. The datasource.uid reference pattern we use avoids this entirely.

Editing an existing dashboard

You can edit in the Grafana UI directly — it stays editable (set in our dashboardProviders config). But edits in the UI are local to the running pod: they don't survive a Grafana restart, and they're not reflected in git.

For changes that should persist:

  1. Edit in UI, get it right.
  2. Share → Export → Save to file.
  3. Replace the JSON in the corresponding manifests/dashboards/<name>.yaml.
  4. PR.

For one-off / experimental tweaks, the UI is fine — just don't be surprised when a Grafana pod restart wipes them.

Verification

# 1. ConfigMaps for the four custom dashboards exist in monitoring/
kubectl get cm -n monitoring -l grafana_dashboard=1
#   NAME                                  DATA   AGE
#   tomoda-api-red-dashboard              1      ...
#   tomoda-async-dashboard                1      ...
#   tomoda-business-dashboard             1      ...
#   tomoda-ws-hub-dashboard               1      ...

# 2. Grafana sidecar picked them up
kubectl logs -n monitoring deploy/monitoring-grafana -c grafana-sc-dashboard | tail -20
#   Should show four "Writing /var/lib/grafana/dashboards/default/<name>.json" lines

# 3. Browse to https://grafana.tomoda.life -> Dashboards -> Browse
#    All six should appear under the Default folder.

If a custom dashboard doesn't appear:

  • Confirm the ConfigMap has the grafana_dashboard: "1" label exactly (case-sensitive).
  • Confirm the JSON is valid (python3 -c "import yaml, json; json.loads(yaml.safe_load(open('<file>'))['data']['<key>'])").
  • Check the sidecar logs for parse errors: kubectl logs -n monitoring deploy/monitoring-grafana -c grafana-sc-dashboard.