Skip to content

Loki

Cluster-wide log aggregation. Promtail tails every pod's logs and ships them to Loki; Grafana queries Loki using its built-in Loki datasource. There is no Elasticsearch, no Cloud Logging sink, no Fluent Bit — just loki-stack from Grafana's Helm chart.

Installed by k8s/envs/dev/sys/loki/application.yaml, configured by k8s/envs/dev/sys/loki/values.yaml.

Chart and source

Field Value
Helm chart loki-stack
Repository https://grafana.github.io/helm-charts
Version 2.10.2
Destination namespace monitoring (shared with kube-prometheus-stack)
Argo CD Application loki

The loki-stack chart bundles both Loki (the log store) and Promtail (the per-node log shipper) — they're enabled in values.yaml:

loki:
  enabled: true
  persistence:
    enabled: true
    size: 5Gi
promtail:
  enabled: true

Promtail

Promtail runs as a DaemonSet — one pod per node — and tails /var/log/pods/*/*.log from every container in the cluster. It does not require workloads to opt in. Every pod's stdout/stderr lands in Loki by default.

The pipeline in values.yaml parses the Traefik JSON access log specifically:

promtail:
  config:
    snippets:
      pipelineStages:
        - cri: {}
        - json:
            expressions:
              entryPointName: entryPointName
              request_Host: RequestHost
              request_Path: RequestPath
              status: DownstreamStatus
              method: RequestMethod
              msg: msg
              level: level
        - labels:
            entryPointName:
            request_Host:
            status:
            method:
            level:

cri: {} strips the container-runtime envelope. The json stage extracts Traefik fields, and labels promotes a handful of them (host, status, method, level, entrypoint) into Loki labels — that's what makes the Traefik logs Grafana dashboard (gnetId: 13702) work out of the box.

Non-JSON logs (everything that isn't Traefik) still get ingested, just without the structured labels. They're queryable by {namespace="..."} / {app="..."} / similar.

Loki

Loki runs as a single StatefulSet with a 20Gi PVC for chunk storage and a 7-day compactor-enforced retention. It exposes:

  • http://loki:3100 — inside the cluster, used as a Grafana datasource.
  • A ServiceMonitor (label release: monitoring) — so Prometheus scrapes Loki's own metrics (loki_request_duration_seconds, etc.).

There is no S3 / GCS backend configured — chunks live on the PVC. That keeps the deployment simple but caps long-term log volume at the disk size.

Querying

Logs are queried in Grafana via the Loki datasource (added in k8s/envs/dev/sys/monitoring/values.yaml):

grafana:
  additionalDataSources:
    - name: Loki
      type: loki
      url: http://loki:3100
      access: proxy
      isDefault: false

Open Grafana at https://grafana.tomoda.life, switch to the Explore tab, pick the Loki datasource, and use LogQL:

{namespace="tomoda"} |= "error"
{namespace="traefik-system", status=~"5.."}

The Traefik-logs dashboard provides a pre-built view over the structured labels.

Operational notes

  • 20Gi PVC + 7-day compactor retention. Sized for current write rate with ~13× headroom. Retention math + the compactor config live in operations/observability/loki.md.
  • No object-storage backend. Switching Loki to a GCS-backed schema is the next scaling step, not a current capability.
  • Promtail consumes CPU on busy nodes. Resource limits in values.yaml cap it at 200m CPU / 128Mi memory per node — usually fine, but visible in node-pressure events if log volume spikes.
  • One Loki across all environments. Both the tomoda (dev) and prod namespaces ship logs to the same Loki. Filter by namespace in queries when looking at prod traffic only.
  • Backend Zap JSON parsing. The Promtail pipeline parses tomoda backend logs and promotes level as a label; trace_id stays in the body for trace→log navigation from Tempo. See the observability/loki page for the full label strategy.