Skip to content

Tempo

Distributed-tracing backend. The tomoda backend emits OpenTelemetry spans (Go OTel SDK + OTLP gRPC exporter), Tempo stores them in GCS, and Grafana queries Tempo as a data source. Spans link back to Loki logs via trace_id.

Installed by k8s/envs/dev/sys/tempo/application.yaml, configured by k8s/envs/dev/sys/tempo/values.yaml. The GCS bucket and Workload Identity binding come from infrastructure/gcp/tempo.tf.

Chart and source

Field Value
Helm chart tempo (monolithic single-binary mode)
Repository https://grafana.github.io/helm-charts
Version 1.10.1
Destination namespace monitoring (shared with kube-prometheus-stack and Loki)
Argo CD Application tempo

We picked the monolithic chart over tempo-distributed deliberately: trace volume is low (10% sampling on a small backend), and a single replica with GCS storage is plenty. If write throughput grows past a few thousand spans/sec, switch to tempo-distributed (distributors + ingesters + queriers split).

Storage — GCS backend

Tempo stores trace blocks in ${project_id}-tomoda-traces (provisioned by infrastructure/gcp/tempo.tf). The bucket has a 30-day lifecycle rule that deletes old objects; Tempo's compactor block_retention is aligned to the same 720h window so the index doesn't reference deleted blocks.

resource "google_storage_bucket" "tomoda_traces" {
  name     = "${var.project_id}-tomoda-traces"
  location = var.region
  lifecycle_rule {
    condition { age = 30 }
    action    { type = "Delete" }
  }
}

Authentication is via Workload Identity — no service account key is mounted. The GCP SA tempo@${project_id}.iam.gserviceaccount.com is bound to the K8s SA monitoring/tempo, which is what the Tempo Helm chart creates via serviceAccount.name: tempo. The link is the annotation on the KSA:

serviceAccount:
  create: true
  name: tempo
  annotations:
    iam.gke.io/gcp-service-account: tempo@development-485000.iam.gserviceaccount.com

No secrets in GCP Secret Manager are required for Tempo.

Ingestion — OTLP gRPC

Tempo's receivers config exposes OTLP on both gRPC (:4317) and HTTP (:4318). The tomoda backend uses gRPC:

OBSERVABILITY_OTLP_ENDPOINT=tempo.monitoring.svc.cluster.local:4317
OBSERVABILITY_OTLP_INSECURE=true
OBSERVABILITY_SAMPLE_RATE=0.1

Apps in any namespace can reach Tempo via that FQDN. Insecure (plain gRPC, no mTLS) is fine inside the cluster — NetworkPolicies and the trust boundary already restrict who can connect.

The sampling rate is set on the application side (head sampling — Go SDK decides per-trace). Tempo itself doesn't sub-sample.

Grafana integration

The Tempo data source is added in k8s/envs/dev/sys/monitoring/values.yaml alongside Loki:

- name: Tempo
  type: tempo
  uid: tempo
  url: http://tempo:3100
  jsonData:
    tracesToLogsV2:
      datasourceUid: loki
      filterByTraceID: true
      query: '{namespace="$${__span.tags["k8s.namespace.name"]}"} | trace_id="$${__span.traceId}"'

Two cross-data-source jumps are wired up:

  • Trace → logs: click a span in the Tempo UI, jump to a Loki query filtered to the same trace_id in a ±5min window. Uses LogQL's | json stage to extract trace_id from the JSON log body at query time (Loki 2.x doesn't support structured metadata yet — see loki.md).
  • Logs → trace: the Loki data source has a derivedFields rule that matches "trace_id":"..." in JSON log lines and renders it as a clickable Tempo link.

Operational notes

  • Single replica. Monolithic mode means one Tempo pod handles ingest + query + compaction. Restarts cause brief ingest gaps (OTel exporters retry, but the SDK's queue is bounded — bursts during a restart can drop spans). Acceptable at current volume; if you see drops in app metrics (otelcol_exporter_send_failed_spans_total), bump replicas or switch chart.
  • GCS is the source of truth. The 5Gi PVC is just for WAL — actual trace data lives in GCS. PVC loss is recoverable; bucket loss is not.
  • No multi-tenant mode. multitenancy_enabled: false — all spans share one tenant ID (single-tenant). If we ever onboard a second team, flip it on and add tenant headers to the OTel exporter config.
  • 30-day retention. Driven by the GCS lifecycle rule. Bump the lifecycle age + Tempo's block_retention together if longer trace history is needed.

Debugging

# Verify Tempo is healthy
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo

# Check it's accepting traces (look for "received OTLP traces" lines)
kubectl logs -n monitoring deploy/tempo --tail=200 | grep -i otlp

# Confirm Workload Identity is working
kubectl exec -n monitoring deploy/tempo -- \
  curl -s -H "Metadata-Flavor: Google" \
  http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email
# Expected: tempo@development-485000.iam.gserviceaccount.com

# List recent trace blocks in GCS
gsutil ls -lh gs://development-485000-tomoda-traces/single-tenant/ | tail -20

# Tempo's /ready endpoint
kubectl port-forward -n monitoring svc/tempo 3100:3100
curl localhost:3100/ready

If the backend logs failed to upload span batch: ... PermissionDenied, the Workload Identity link is broken — check the KSA annotation and the IAM binding.