Observability Manual Setup¶
One-time, human-only setup needed before the observability stack can run end-to-end. Work through this list in order — code that depends on these values is rejected by Terraform / ESO / the Worker runtime if a secret is missing.
Why this page exists
A lot of these accounts (Sentry, Cloudflare, Discord) are managed outside the repo and outside Terraform. If they aren't recorded, the system breaks silently when a teammate rotates a key or someone new tries to repro the setup. Treat this page as the single checklist for "what humans need to click."
Checklist at a glance¶
- Sentry — org/project, DSN, auth token, Discord integration
- Cloudflare — API token (new scope), record account ID + zone ID
- Cloudflare Tunnel — for the synthetic Worker → Loki HTTP push path
- Discord — alerts webhook URL
- GCP Secret Manager — create 9 new secrets (values pasted from the steps above;
scripts/setup-gcp-secrets.shis the canonical creator) - Synthetic test user — backend seed migration writes the row; the password lives in GCP SM
The full step-by-step is below. Each section ends with the secret(s) you should have produced; the GCP Secret Manager section is where you push them in.
Step 1 — Sentry¶
For frontend crash reporting (@sentry/react-native) wired in tomoda-labs/tomoda PR #301.
- Go to sentry.io and sign in (or sign up). Use a
@tomoda.lifeWorkspace account so org membership stays with the company, not a personal email. - Create an organization — current slug is
tomoda-platforms-inc(hardcoded as the default infrontend/app.config.jsand in the Grafana data sourcejsonData.orgSlug). Pick a different slug only if you're setting up an isolated org for a new environment; override it viaSENTRY_ORG=<slug>at build time and update the Grafana data source'sorgSlugto match. - Create a new project:
- Platform: React Native
- Project slug:
tomoda-frontend(or override withSENTRY_PROJECT=<slug>) - Default alert rules: skip (we create our own pointing at
#alerts-frontend).
- Copy the DSN from project Settings → Client Keys (DSN). Looks like
https://<key>@<id>.ingest.sentry.io/<project>. -
Create two Sentry auth tokens at Settings → Account → Auth Tokens (or org-level). Why two: source-map upload needs write scopes and rotates rarely; the in-cluster Grafana data source (
grafana-sentry-datasourceplugin) needs only read scopes and may rotate independently. Least-privilege + decoupled cadence.# Name Scopes Used by 1 sourcemap-uploadproject:releases,org:readfrontend/scripts/build-with-secrets.shat native + web release-build time2 grafana-readonlyevent:read,org:read,project:read,member:readGrafana's grafana-sentry-datasourceplugin in themonitoringnamespaceCopy each value as you create it — Sentry shows each only once. Both go into GCP SM in Step 5. 6. Install the Discord integration. Sentry routes frontend crash alerts directly to the
#alerts-frontendchannel via OAuth — there's no webhook URL, no GCP SM secret needed for this route. Sentry's servers post via the installed bot. - Project Settings → Integrations → Discord → Install. - Authorize on Discord, pick the Tomoda guild, pick the#alerts-frontendchannel (separate from#monitoring, which is reserved for Alertmanager + synthetic webhook traffic — see Step 4). - Back in Sentry: Project → Alerts → Create Alert → Issue Alert → "When a new issue is created" → Action: send a Discord notification to#alerts-frontend. Sentry groups events into issues server-side, so one bug producing thousands of events still produces one Discord message. No further throttling needed. 7. Decide retention + sampling in Project Settings → Performance/Replay. Defaults are fine for v1.
Produces three values that go into GCP Secret Manager (in Step 5 below):
| Value | GCP SM key | Used for |
|---|---|---|
| DSN | tomoda-sentry-dsn |
Baked into the JS bundle at build time; tells the SDK where to send events |
sourcemap-upload token |
tomoda-sentry-auth-token |
Used by the Sentry CLI during the native + web release build to upload source maps |
grafana-readonly token |
tomoda-sentry-grafana-token |
Projected via ESO into monitoring/sentry-grafana-credentials so Grafana's Sentry data source can query frontend metrics — see Sentry data source |
GCP Secret Manager is the source of truth. Native release builds pull the DSN + upload token via frontend/scripts/build-with-secrets.sh using your gcloud auth login session. The Grafana token is read by ESO at sync time (every hour) and projected into the K8s Secret the Grafana pod mounts as SENTRY_AUTH_TOKEN. No values touch disk in either path.
Org slug (tomoda-platforms-inc) and project slug (tomoda-frontend) are hardcoded as defaults in frontend/app.config.js and in the Grafana data source jsonData.orgSlug — they're configuration, not secrets.
Step 2 — Cloudflare API token (expanded scope)¶
The existing external-dns-cloudflare-secret in-cluster is for external-dns and stays untouched. We're creating a new token used by Terraform for both DNS + Workers + Tunnel management.
- Sign in to the Cloudflare dashboard.
- Profile (top right) → API Tokens → Create Token → Custom Token.
-
Permissions (add all four rows):
Resource type Resource Permission Zone DNS Edit Account Workers Scripts Edit Account Cloudflare Tunnel Edit Account Account Settings Read -
Zone resources: Include → Specific zone →
tomoda.life(do NOT pick "all zones"). - Account resources: Include → Specific account → your Tomoda account.
- TTL: optional. Leave blank (never expires) or set to a year and put a rotation reminder on the calendar.
- Create token → copy the value (shown once).
- While in the dashboard, record:
- Cloudflare Account ID (visible on the right sidebar of any zone overview).
tomoda.lifeZone ID (visible on thetomoda.lifezone overview page).
Produces:
- Cloudflare API token → goes into GCP SM as
tomoda-cloudflare-api-tokenin Step 5. - Account ID + Zone ID → not secret; will be put in Terraform variables (we'll commit those as part of the synthetic / tunnel PR).
Step 3 — Cloudflare Tunnel (for Worker → Loki push)¶
The synthetic Worker pushes probe results to Loki. Loki is in-cluster-only, so we expose a /loki/api/v1/push endpoint through a Cloudflare Tunnel protected by Cloudflare Access (service token auth, no humans involved).
- Cloudflare dashboard → Zero Trust → Networks → Tunnels → Create a tunnel.
- Connector type: Cloudflared (default).
- Name:
tomoda-prod-tunnel. - Copy the tunnel token (a long base64-ish string) — shown once. This is what
cloudflaredrunning in-cluster uses to authenticate to the tunnel. - Skip the "install connector" step — we'll run it as a Kubernetes Deployment via Argo CD; no need to install on a VM.
- Public hostname tab → Add a public hostname:
- Subdomain:
loki-push - Domain:
tomoda.life - Service:
HTTP→loki.monitoring.svc.cluster.local:3100 - Save.
- Subdomain:
- Cloudflare Access (still in Zero Trust): Applications → Add an application → Self-hosted.
- Application name:
loki-push - Subdomain/domain:
loki-push.tomoda.life - Identity providers: leave default (we won't use SSO; service-token-only).
- Add a policy: name
synthetic-worker, actionService Auth, includeService token→ we'll generate the token next.
- Application name:
- Zero Trust → Access → Service Auth → Create Service Token.
- Name:
synthetic-worker - Duration: never expire (or 1 year if you prefer + put a rotation reminder).
- Copy Client ID and Client Secret — shown once.
- Name:
Produces three secrets:
- Tunnel token → GCP SM as
tomoda-cloudflare-tunnel-token - Access Client ID → GCP SM as
tomoda-cloudflare-access-client-id - Access Client Secret → GCP SM as
tomoda-cloudflare-access-client-secret
Step 4 — Discord alerts webhook¶
The webhook URL is shared by Alertmanager, Sentry, and the synthetic Worker. Treat it as one shared "primary alerts pipe."
- In Discord, create the
#monitoringchannel if it doesn't exist (reserved for Alertmanager + Cloudflare synthetic webhook traffic — separate from#alerts-frontendwhich Sentry posts to via OAuth). - Channel name → Edit Channel → Integrations → Webhooks → New Webhook.
- Name:
Tomoda Alerts Primary. Avatar: optional. - Copy Webhook URL — this is the value you'll store. Format:
https://discord.com/api/webhooks/<id>/<token>.
Produces:
- Discord webhook URL → GCP SM as
tomoda-alert-webhook-primary(provider-portable name; the value can later be a Slack/PagerDuty/Opsgenie URL without renaming the secret).
Why generic name
Naming the secret tomoda-alert-webhook-primary instead of tomoda-discord-webhook lets us swap providers (Slack, PagerDuty, Opsgenie) by changing the value only — no Terraform / manifest renames cascade through the cluster.
If you want a low-priority channel later, add a separate tomoda-alert-webhook-secondary secret pointing at the low-priority channel. Not needed for v1.
Step 5 — GCP Secret Manager: create the 9 new secrets¶
The canonical creator is scripts/setup-gcp-secrets.sh — same script that bootstraps the rest of the app's secrets. Pass the new values via env vars and the script creates them (or updates the existing version) idempotently. The tomoda-synthetic-probe-password auto-generates if you leave it unset.
gcloud auth login
gcloud config set project development-485000
# Pass the values gathered from Steps 1-4. Anything you leave out either
# stays at its current value (if already set) or remains empty (the script
# prints "Empty:" so you know what's outstanding).
TOMODA_SENTRY_DSN="<from Step 1>" \
TOMODA_SENTRY_AUTH_TOKEN="<from Step 1, sourcemap-upload token>" \
TOMODA_SENTRY_GRAFANA_TOKEN="<from Step 1, grafana-readonly token>" \
TOMODA_CLOUDFLARE_API_TOKEN="<from Step 2>" \
TOMODA_CLOUDFLARE_TUNNEL_TOKEN="<from Step 3>" \
TOMODA_CLOUDFLARE_ACCESS_CLIENT_ID="<from Step 3>" \
TOMODA_CLOUDFLARE_ACCESS_CLIENT_SECRET="<from Step 3>" \
TOMODA_ALERT_WEBHOOK_PRIMARY="<from Step 4>" \
./scripts/setup-gcp-secrets.sh
# tomoda-synthetic-probe-password auto-generates via openssl rand -base64 32.
# Verify all nine
gcloud secrets list --filter="name~tomoda-(sentry|cloudflare|alert|synthetic)"
If you'd rather skip the script and create each secret by hand (one at a time, no env vars), each accepts a value via stdin so no value lands in shell history:
echo -n "<sentry-dsn>" | gcloud secrets create tomoda-sentry-dsn --replication-policy=automatic --data-file=-
echo -n "<sentry-auth-token>" | gcloud secrets create tomoda-sentry-auth-token --replication-policy=automatic --data-file=-
echo -n "<sentry-grafana-token>" | gcloud secrets create tomoda-sentry-grafana-token --replication-policy=automatic --data-file=-
echo -n "<cloudflare-api-token>" | gcloud secrets create tomoda-cloudflare-api-token --replication-policy=automatic --data-file=-
echo -n "<discord-webhook-url>" | gcloud secrets create tomoda-alert-webhook-primary --replication-policy=automatic --data-file=-
echo -n "<cloudflare-tunnel-token>" | gcloud secrets create tomoda-cloudflare-tunnel-token --replication-policy=automatic --data-file=-
echo -n "<cf-access-client-id>" | gcloud secrets create tomoda-cloudflare-access-client-id --replication-policy=automatic --data-file=-
echo -n "<cf-access-client-secret>" | gcloud secrets create tomoda-cloudflare-access-client-secret --replication-policy=automatic --data-file=-
echo -n "$(openssl rand -base64 32)" | gcloud secrets create tomoda-synthetic-probe-password --replication-policy=automatic --data-file=-
After the secrets exist, the per-area Terraform / K8s code (committed in the corresponding PRs) wires them up via ESO into the right namespaces — you don't need to touch any namespace bindings manually.
Step 6 — Synthetic test user (backend seed)¶
A dedicated user account exists in the database solely so the synthetic Worker can exercise the real POST /api/v1/auth/login endpoint end-to-end.
You decide the password now. It gets stored in GCP SM (Step 5 secret #6), and a backend migration writes the hashed value to the DB on next deploy. The Worker reads the cleartext password from a Worker secret binding (sourced from the GCP SM secret at Terraform apply time).
- Pick an email address you control. Recommend
synthetic+probe@tomoda.life(the+probelets you filter it out in any analytics later). The email must accept mail (deliverability is verified in tests), but the account never logs in interactively. -
Generate a strong password and stash it in your password manager:
openssl rand -base64 32 -
Store it in GCP SM as
tomoda-synthetic-probe-password(already covered in Step 5). - On next deploy, the migration in
backend/internal/database/migrations/seed_synthetic_user.go(added in the synthetics PR) reads the password from the env (via ESO) and either inserts the user or updates the password hash to match. Idempotent — safe to run repeatedly. - MFA stance: the synthetic account is automatically flagged
account_type = 'synthetic'in the DB and is excluded from MFA enforcement (the migration adds this filter to the MFA-required check). No human action needed.
Don't reuse a real user account
Tempting to use a personal account — don't. The synthetic account writes login history rows on every probe (one per minute), would pollute that user's session list, and would inflate "active users" metrics. The dedicated account also lets us suppress it from analytics, leaderboards, and anti-abuse heuristics in one place.
Alert routing & throttling¶
Two Discord channels, one per source category. Each source has its own throttling story — the channels themselves don't enforce anything.
| Channel | Source | Mechanism | Throttling owner |
|---|---|---|---|
#alerts-frontend |
Sentry — every new React Native issue | Sentry Discord OAuth integration (no webhook URL) | Sentry server-side dedup: events → issues, one Discord msg per new issue, no further config |
#monitoring |
Alertmanager (Prometheus rules — cluster, DB, API health) + Cloudflare synthetic Worker (probe failures) | Webhook from tomoda-alert-webhook-primary |
Alertmanager grouping + inhibition + Worker state-change-only notifications (details below) |
Sentry → #alerts-frontend¶
Defaults. The alert rule fires on "A new issue is created" only. Sentry's server-side fingerprinting collapses thousands of events into a single Issue, so a single bug ships one Discord message regardless of event volume. No tuning needed for v1.
If #alerts-frontend ever feels noisy later, the right knob is Settings → Notifications → Per-issue rate limit: at most once per N hours at the workspace level. Don't add rate-limiting at the alert-rule level — that drops new issues, not duplicates of existing ones.
Alertmanager → #monitoring (implemented in Phase 5)¶
Standard throttling profile that the Phase 5 PR will ship:
| Setting | Default | Effect |
|---|---|---|
group_by: [alertname, cluster, service] |
— | Related alerts batched into one Discord message |
group_wait: 30s |
— | Wait 30s after first alert before firing — gathers correlated alerts |
group_interval: 5m |
— | Within a group, batched update at most every 5 min |
repeat_interval: 4h |
— | Don't re-fire the same alert for 4h (warning severity) |
repeat_interval: 1h (critical severity) |
— | Critical re-fires every 1h, not 4h |
Inhibition: ClusterDown → all warnings for that cluster |
— | When cluster is down, suppress 50× "service unreachable" cascade |
Outage budget: roughly one Discord message every 5-15 minutes per incident, plus hourly reminders for critical alerts. A 6-hour outage produces ~12-15 messages instead of thousands.
Cloudflare synthetic Worker → #monitoring (implemented in the synthetics PR)¶
State-change-only notifications, stored in Cloudflare Workers KV:
- Probe goes healthy → failed: one Discord message with the failure detail.
- Probe stays failed → failed: silent.
- Probe goes failed → healthy: one "recovered" Discord message with downtime duration.
- Still-down reminder: every 1 hour while a probe remains failed (so a 6h outage produces "DOWN" + 5× "STILL DOWN" + "RECOVERED" = 7 messages per probe, not 360).
Per-probe override possible if some endpoints are flappier than others (e.g., raise reminder cadence to 4h for a probe that's known to false-positive).
Verification¶
After you've completed all six steps and confirmed the secrets exist:
# Confirm all 9 new secrets are present in GCP SM
gcloud secrets list --project=development-485000 \
--filter="name~tomoda-(sentry|cloudflare|alert|synthetic)" \
--format="table(name,createTime)"
# Should show:
# tomoda-sentry-dsn
# tomoda-sentry-auth-token
# tomoda-sentry-grafana-token
# tomoda-cloudflare-api-token
# tomoda-alert-webhook-primary
# tomoda-cloudflare-tunnel-token
# tomoda-cloudflare-access-client-id
# tomoda-cloudflare-access-client-secret
# tomoda-synthetic-probe-password
Confirm to the team that manual setup is complete, then the code PRs that depend on these values (Alertmanager Discord routing, synthetic Worker, Loki tunnel) can land.
Rotation¶
These secrets rotate independently. When you rotate one, only the consumers that read it need to be restarted — ESO syncs every hour, so a kubectl rollout restart accelerates picking up the new value if you need to verify immediately.
| Secret | When to rotate | Consumers to restart |
|---|---|---|
tomoda-cloudflare-api-token |
If leaked, or annually | Re-run terraform plan/apply — Cloudflare provider uses the new value automatically |
tomoda-alert-webhook-primary |
If leaked, or if changing alert provider | Restart alertmanager; trigger Worker redeploy so its secret binding refreshes |
tomoda-cloudflare-tunnel-token |
If leaked, or annually | Restart the cloudflared Deployment in monitoring |
tomoda-cloudflare-access-client-* |
If leaked | Worker redeploy (it embeds the value at deploy time) |
tomoda-synthetic-probe-password |
Quarterly + on team turnover | Re-run the seed migration (it updates the hash) + Worker redeploy |
tomoda-sentry-dsn |
Rarely — when rotating the Sentry project DSN | Next native release build picks up the new value (it's pulled by build-with-secrets.sh at build time) |
tomoda-sentry-auth-token |
Annually, or if leaked | Same — next native release build pulls the new value |
tomoda-sentry-grafana-token |
Annually, or if leaked | ESO syncs within 1h; kubectl rollout restart deploy/monitoring-grafana -n monitoring to verify immediately |
Related docs¶
- Secrets management — overall secret-store architecture, ESO bridging, env-var reference
- Tempo, Prometheus, Loki — what these secrets feed into