Loki — storage, retention, and Promtail pipeline¶

The deployment is documented in kubernetes/system/loki.md. This page covers what's specific to our config:

Storage backend is GCS. Chunks + index live in the shared tomoda-observability-${project_id} bucket alongside Tempo blocks. PVC is disabled.
Retention is pinned to 7 days Loki-side, 14 days GCS-side (lifecycle).
The Promtail pipeline parses the tomoda backend's Zap JSON logs and surfaces trace_id for trace-to-log navigation from Grafana / Tempo.

Storage — shared GCS bucket¶

Loki writes chunks and the boltdb-shipper index to tomoda-observability-${project_id} (provisioned by infrastructure/gcp/tempo.tf) — same bucket as Tempo. Two writers, one bucket, separated by chart-default prefixes:

Writer	Prefix	What lands here
Loki	`loki/` (chart default, no explicit prefix needed)	Chunks + boltdb-shipper index + compactor working files
Tempo	`tempo/` (set explicitly in tempo values)	Trace blocks

Authentication via Workload Identity — KSA monitoring/loki impersonates the GCP SA observability@${project_id}.iam.gserviceaccount.com. No service account key is mounted. The link is one annotation:

serviceAccount:
  create: true
  name: loki
  annotations:
    iam.gke.io/gcp-service-account: observability@development-485000.iam.gserviceaccount.com

The SA holds roles/storage.objectAdmin on the bucket (full read/write/delete on objects). See tempo.md for the matching binding.

Retention math¶

Knob	Value	Why
`loki.config.limits_config.retention_period`	`168h` (7d)	Query-API retention. After this Loki refuses to query the chunks even if they still exist in GCS.
`loki.config.compactor.retention_enabled`	`true`	Without this, retention is best-effort table-manager only.
`loki.config.compactor.retention_delete_delay`	`2h`	Soft window before old chunks are actually deleted by the compactor.
GCS lifecycle on the bucket	`age = 14` → Delete	Bucket-side safety net. Loki normally drops chunks at 7d via the compactor; this 14d backstop catches any chunks the compactor misses.

Estimated write rate (current cluster):

~10 backend / system pods continuously logging at ~5 lines/sec each
≈ 4.3 M lines/day raw
≈ 1.5 GB/day uncompressed
≈ 150 MB/day after Loki's gzip chunk compression
7 × 150 MB ≈ 1.05 GB of active chunk data in GCS at steady state

At GCS Standard pricing (~$0.020/GB/month in us-central1), 1 GB / month is ~$0.02 — well under the noise floor of the rest of the observability stack. The Loki migration from PVC to GCS is a net-zero cost change at this volume; the win is operational (no PVC to grow, snapshot, or recover) and architectural (Tempo and Loki share one storage primitive).

If retention needs ever change, bump:

limits_config.retention_period (Loki query API)
The GCS lifecycle rule in infrastructure/gcp/tempo.tf (deletion floor)

…together. They have to move in lockstep — Loki querying chunks that GCS deleted yields confusing 5xx.

If you see write-rate spikes:

Compactor is actually running. kubectl logs -n monitoring -l app.kubernetes.io/name=loki | grep compactor. If it's silent for hours, retention isn't being enforced.
One pod isn't spamming. Run topk(10, sum by (namespace, pod) (rate({__name__=~".+"}[5m]))) in Grafana to find the loudest pod and either fix it or drop its logs at the Promtail level.

Promtail pipeline¶

Two match stages run in order:

Traefik (unchanged behavior)¶

Selector: {container="traefik"}. Parses the JSON access log and promotes entryPointName, request_Host, status, method, level to labels. This is what powers the Traefik logs Grafana dashboard (grafana.com 13702).

Tomoda backend (new)¶

Selector: {app=~"tomoda-(api|async)"}. The backend uses Zap with a JSON encoder, so each log line is shaped like:

{
  "level": "info",
  "ts": 1716480000,
  "caller": "main.go:45",
  "msg": "request handled",
  "trace_id": "8a4f5e0e9b1b9c1f1e1d1c1b1a191817",
  "span_id": "0123456789abcdef"
}

The pipeline:

Parses the JSON.
Promotes level to a label. Cardinality is bounded (debug, info, warn, error, fatal, panic), so this is safe and gives free filtering: {app="tomoda-api", level="error"}.
trace_id and span_id stay in the raw log body — not promoted to a label, not lifted into structured metadata. They're queryable at query time via LogQL's json stage.

- match:
    selector: '{app=~"tomoda-(api|async)"}'
    stages:
      - json:
          expressions:
            level: level
      - labels:
          level:

Why not structured_metadata?

Loki 2.9.x (shipped by loki-stack 2.10.x) at schema v11 doesn't support the structured_metadata pipeline stage — that's a Loki 3.x + schema v13 feature. Promoting trace_id to a Loki label would multiply the index by per-request cardinality and is the canonical way to wreck a Loki cluster.

The interim pattern is query-time JSON parsing:

{app="tomoda-api"} | json | trace_id="8a4f5e0e..."

Slightly slower than indexed lookup but correct, and the same query syntax works in Grafana's tracesToLogsV2 link (configured to use this exact filter). When we bump to Loki 3.x, this stage gets a one-line upgrade.

Label strategy — what's safe and what's not¶

Field	Promote to label?	Why
`namespace`	yes (Promtail does it automatically)	bounded by namespace count
`app`	yes (auto)	bounded by app count
`level`	yes	bounded enum (debug/info/warn/error/fatal/panic)
`status` (Traefik)	yes	bounded HTTP status codes
`method`	yes	bounded HTTP method set
`trace_id`	NO — query-time JSON parse	per-request, unbounded
`span_id`	NO — query-time JSON parse	per-request, unbounded
`caller`	NO — query-time JSON parse	many call sites; not useful as a label
`user_id` (if ever added)	NO — query-time JSON parse	per-user, unbounded
`request_id`	NO — query-time JSON parse	per-request

Rule of thumb: if a field has fewer than ~100 distinct values cluster-wide, label is OK. Anything per-request, per-user, or per-trace either gets queried via | json | <field>="..." (today, on Loki 2.x) or via structured_metadata (after the Loki 3.x bump). It never becomes a label.

Trace-to-log jump¶

This is what makes the Tempo integration useful. From a Tempo span:

{namespace="tomoda"} | json | trace_id="8a4f5e0e9b1b9c1f1e1d1c1b1a191817"

is run automatically when you click "View logs" on a span. The tracesToLogsV2 config in monitoring/values.yaml wires this — see tempo.md. The | json stage is required because trace_id isn't a label (see retention sizing note above); it's pulled from the JSON body at query time.

Going the other direction, the Loki data source has a derivedFields rule:

- name: TraceID
  matcherRegex: '"trace_id":"(\w+)"'
  url: '${__value.raw}'
  datasourceUid: tempo

Any log line containing "trace_id":"..." gets a clickable link in Grafana that opens the matching trace in Tempo.

Debugging the pipeline¶

# Is Promtail seeing tomoda pods?
kubectl logs -n monitoring -l app.kubernetes.io/name=promtail | grep "tomoda" | head -20

# Tail a recent tomoda log line and check the extracted labels
kubectl port-forward -n monitoring svc/loki 3100:3100
curl -s -G 'http://localhost:3100/loki/api/v1/labels' | jq

# Sample query for a known trace
curl -s -G 'http://localhost:3100/loki/api/v1/query_range' \
  --data-urlencode 'query={app="tomoda-api"} | trace_id="abc..."' \
  --data-urlencode 'start='$(date -d '1 hour ago' +%s)000000000 | jq '.data.result[0]'

If a tomoda log line shows up in Grafana but level is missing as a label, it means the JSON parse failed — usually because the line isn't JSON (e.g. a panic stack trace, or a third-party library logging with a different format). The pipeline doesn't drop those — they're just queryable by {app="tomoda-api"} without structured filtering.