Loki¶
Cluster-wide log aggregation. Promtail tails every pod's logs and ships them to Loki; Grafana queries Loki using its built-in Loki datasource. There is no Elasticsearch, no Cloud Logging sink, no Fluent Bit — just loki-stack from Grafana's Helm chart.
Installed by k8s/envs/dev/sys/loki/application.yaml, configured by k8s/envs/dev/sys/loki/values.yaml.
Chart and source¶
| Field | Value |
|---|---|
| Helm chart | loki-stack |
| Repository | https://grafana.github.io/helm-charts |
| Version | 2.10.2 |
| Destination namespace | monitoring (shared with kube-prometheus-stack) |
| Argo CD Application | loki |
The loki-stack chart bundles both Loki (the log store) and Promtail (the per-node log shipper) — they're enabled in values.yaml:
loki:
enabled: true
persistence:
enabled: true
size: 5Gi
promtail:
enabled: true
Promtail¶
Promtail runs as a DaemonSet — one pod per node — and tails /var/log/pods/*/*.log from every container in the cluster. It does not require workloads to opt in. Every pod's stdout/stderr lands in Loki by default.
The pipeline in values.yaml parses the Traefik JSON access log specifically:
promtail:
config:
snippets:
pipelineStages:
- cri: {}
- json:
expressions:
entryPointName: entryPointName
request_Host: RequestHost
request_Path: RequestPath
status: DownstreamStatus
method: RequestMethod
msg: msg
level: level
- labels:
entryPointName:
request_Host:
status:
method:
level:
cri: {} strips the container-runtime envelope. The json stage extracts Traefik fields, and labels promotes a handful of them (host, status, method, level, entrypoint) into Loki labels — that's what makes the Traefik logs Grafana dashboard (gnetId: 13702) work out of the box.
Non-JSON logs (everything that isn't Traefik) still get ingested, just without the structured labels. They're queryable by {namespace="..."} / {app="..."} / similar.
Loki¶
Loki runs as a single StatefulSet with a 20Gi PVC for chunk storage and a 7-day compactor-enforced retention. It exposes:
http://loki:3100— inside the cluster, used as a Grafana datasource.- A
ServiceMonitor(labelrelease: monitoring) — so Prometheus scrapes Loki's own metrics (loki_request_duration_seconds, etc.).
There is no S3 / GCS backend configured — chunks live on the PVC. That keeps the deployment simple but caps long-term log volume at the disk size.
Querying¶
Logs are queried in Grafana via the Loki datasource (added in k8s/envs/dev/sys/monitoring/values.yaml):
grafana:
additionalDataSources:
- name: Loki
type: loki
url: http://loki:3100
access: proxy
isDefault: false
Open Grafana at https://grafana.tomoda.life, switch to the Explore tab, pick the Loki datasource, and use LogQL:
{namespace="tomoda"} |= "error"
{namespace="traefik-system", status=~"5.."}
The Traefik-logs dashboard provides a pre-built view over the structured labels.
Operational notes¶
- 20Gi PVC + 7-day compactor retention. Sized for current write rate with ~13× headroom. Retention math + the compactor config live in operations/observability/loki.md.
- No object-storage backend. Switching Loki to a GCS-backed schema is the next scaling step, not a current capability.
- Promtail consumes CPU on busy nodes. Resource limits in
values.yamlcap it at 200m CPU / 128Mi memory per node — usually fine, but visible in node-pressure events if log volume spikes. - One Loki across all environments. Both the
tomoda(dev) andprodnamespaces ship logs to the same Loki. Filter bynamespacein queries when looking at prod traffic only. - Backend Zap JSON parsing. The Promtail pipeline parses tomoda backend logs and promotes
levelas a label;trace_idstays in the body for trace→log navigation from Tempo. See the observability/loki page for the full label strategy.