Prometheus scrape coverage¶
This page tracks what the cluster's Prometheus actually scrapes. The deployment itself is documented in kubernetes/system/monitoring.md; this page focuses on who is scraped, how, and where the ServiceMonitor / PodMonitor / Probe CR lives.
kube-prometheus-stack's serviceMonitorSelector is set to {} (match everything in any namespace), so any of the resources below light up automatically once they exist.
ServiceMonitor inventory¶
| Target | Resource type | Owned by | Namespace |
|---|---|---|---|
Tomoda backend (tomoda-api) |
ServiceMonitor tomoda-backend |
k8s/apps/tomoda/base/servicemonitor.yaml |
tomoda, prod |
| Tempo (own internal metrics) | ServiceMonitor (via Helm serviceMonitor.enabled: true) |
k8s/envs/dev/sys/tempo/values.yaml |
monitoring |
| Loki | ServiceMonitor (via Helm) | k8s/envs/dev/sys/loki/values.yaml |
monitoring |
| Traefik | ServiceMonitor (via Helm) | k8s/envs/dev/sys/traefik/values.yaml |
monitoring |
| Redis (dev) | ServiceMonitor (via Bitnami metrics.serviceMonitor.enabled: true) |
k8s/envs/dev/middleware/redis/values.yaml |
monitoring (cross-ns selector → data) |
| Redis (prod) | ServiceMonitor (via Bitnami) | k8s/envs/prod/middleware/redis/values.yaml |
monitoring (cross-ns selector → data) |
| Argo CD (controller, server, repo-server, applicationSet, notifications) | ServiceMonitor (via chart, set in TF) | infrastructure/gcp/argocd.tf |
argocd |
| Postgres (CNPG primary + standby) | PodMonitor (CNPG built-in) | k8s/envs/{dev,prod}/middleware/postgres/manifests/cluster.yaml |
cnpg-system |
| Blackbox-exporter (own internal metrics) | ServiceMonitor (via Helm) | k8s/envs/dev/sys/blackbox-exporter/values.yaml |
monitoring |
Photon (/status via blackbox) |
Probe photon |
k8s/envs/dev/sys/manifests/photon-probe.yaml |
monitoring |
The release: monitoring label is set on each of these — purely convention, since the operator's selector is empty.
Tomoda backend specifics¶
The backend exposes /metrics on the same listener as /health (port 8080) when OBSERVABILITY_METRICS_ENABLED=true. That env var is set in both the dev and prod overlay kustomizations:
- name: OBSERVABILITY_METRICS_ENABLED
value: "true"
- name: OBSERVABILITY_TRACING_ENABLED
value: "true"
- name: OBSERVABILITY_OTLP_ENDPOINT
value: "tempo.monitoring.svc.cluster.local:4317"
- name: OBSERVABILITY_OTLP_INSECURE
value: "true"
- name: OBSERVABILITY_SAMPLE_RATE
value: "0.1"
The base ServiceMonitor selects app: tomoda-api. The dev/prod overlays apply commonLabels: env: dev|prod, which kustomize propagates to both the resource labels and the selector — so the same base manifest correctly scopes to the right Service in each namespace.
NetworkPolicy¶
tomoda-api-policy and tomoda-async-policy (k8s/apps/tomoda/base/network-policy.yaml) allow inbound :8080 from the monitoring namespace, alongside Traefik. Without that rule, scrape attempts would be silently dropped — up{job="backend-service"} would be zero with no obvious cause.
Argo CD specifics¶
The Helm release in infrastructure/gcp/argocd.tf now enables per-component metrics. Five ServiceMonitors land in the argocd namespace — one each for:
argocd-application-controller— sync queue depth, app health countsargocd-server— API request ratesargocd-repo-server— repo cache stats, manifest generation latencyargocd-applicationset-controllerargocd-notifications-controller
These metrics drive the standard Argo CD Grafana dashboard (grafana.com 14584) if added later.
Redis prod¶
Prior to this change, dev had metrics.enabled: true but prod did not — Bitnami's chart simply didn't render the redis-exporter sidecar or the ServiceMonitor in prod. Prod values now mirror dev:
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: monitoring
namespaceSelector:
matchNames:
- data
additionalLabels:
release: monitoring
prod-redis-master now appears alongside redis-master under up{job=~"prod-redis.*"}.
Photon — blackbox probe¶
Photon doesn't expose Prometheus metrics. The prometheus-blackbox-exporter Helm chart in k8s/envs/dev/sys/blackbox-exporter/ deploys an HTTP prober, and a Probe CR in k8s/envs/dev/sys/manifests/photon-probe.yaml tells Prometheus to ask blackbox to hit http://photon.data.svc.cluster.local:2322/status every 30s.
The resulting metrics (with target=http://photon.data.svc.cluster.local:2322/status):
probe_success— 1 if the request succeeded, 0 otherwiseprobe_duration_seconds— how long the request tookprobe_http_status_code— last response code
A "Photon down" alert keys off probe_success{service="photon"} == 0 for 2m.
Debugging "where are my metrics?"¶
# Is the ServiceMonitor visible to Prometheus?
kubectl get servicemonitor -A | grep <name>
# Does Prometheus actually have the target?
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090
# Open http://localhost:9090/targets → look for the job
# Is the pod's /metrics endpoint working?
kubectl port-forward -n <ns> <pod> 8080:8080
curl localhost:8080/metrics | head -20
# Is a NetworkPolicy blocking the scrape?
kubectl get networkpolicy -n <ns>
If a ServiceMonitor exists but the target is missing from Prometheus, the most common causes are: (a) wrong port name (endpoints[].port must match the Service's ports[].name, not the number), (b) the Service's selector doesn't actually hit any pods, © a NetworkPolicy blocks ingress from monitoring.