Postgres (CloudNativePG)¶

Tomoda's primary datastore. Postgres runs in-cluster as a CloudNativePG (CNPG) Cluster per environment, not on Cloud SQL or any other managed service. Both environments share one GKE cluster and one data namespace, with separate Cluster CRs (postgres-dev, postgres-prod) backed by separate PVCs, secrets, and backup paths.

Not Cloud SQL

Earlier iterations of this stack assumed Cloud SQL would be the database. That is no longer the case — the only Postgres in Tomoda's path is the CNPG cluster described here. The Photon Indexer design still references a possible warm Nominatim Postgres on Cloud SQL (see Photon indexing), but that is unrelated to application data.

Two layers¶

CNPG ships as two things: a cluster-scoped operator (the controller that owns the Cluster CRD) and the instances it manages. They are deployed by separate Argo CD Applications:

Operator — k8s/envs/platform/cnpg/application.yaml. Helm chart cloudnative-pg/cloudnative-pg, version 0.23.0, installed into cnpg-system. Pod monitor enabled so Prometheus scrapes the controller.
Instances — k8s/envs/<env>/postgres/manifests/cluster.yaml, applied by a per-env Argo CD Application that targets the data namespace.

The operator is dev-only in the path layout, but a single operator install reconciles Cluster CRs across all namespaces in the cluster — there is no separate prod operator.

Cluster CR¶

Both environments run the same custom operand image, built from k8s/envs/platform/postgres-image and tracked by :latest in each repo:

imageName: us-central1-docker.pkg.dev/development-485000/tomoda-<env>-repo/tomoda-postgres:latest

Postgres 17 with PostGIS 3.5, plus PGroonga (multilingual full-text search, including CJK) and pgvector (semantic search) for the location service. The bootstrap step runs CREATE EXTENSION for postgis, postgis_topology, pgroonga, vector, unaccent, and pg_trgm after initdb, so apps get spatial types and search on a fresh cluster with no extra step.

A Cloud Build trigger (postgres-image-push-trigger) rebuilds the image on every merge to main that touches k8s/envs/platform/postgres-image/, pushing :<version> and :latest to both repos. A fresh cluster pulls the newest :latest; an existing cluster only rolls on a deliberate pod restart (CNPG does not hot-swap on a tag re-push). Bump _VERSION in cloudbuild-postgres.yaml when extension versions change (:<version> is the rollback handle). Cold start: build once first with task postgres-image:push before the cluster syncs. Build details live next to the image in k8s/envs/platform/postgres-image/.

Dev (`postgres-dev`)¶

1 instance, 3 Gi PVC (room for the location catalog + its search indexes).
DB tomoda_dev, owner tomoda_dev_user.
max_connections=100, shared_buffers=128MB, maintenance_work_mem=256MB, log_statement=all (verbose for debugging).
Resources: 50m–250m CPU, 128Mi–256Mi memory.

Prod (`postgres-prod`)¶

1 instance, 20 Gi PVC. The comment # Increase to 2+ for HA when ready flags that HA is intentional future work.
DB tomoda_prod, owner tomoda_prod_user.
shared_buffers=256MB, effective_cache_size=512MB, work_mem=4MB, maintenance_work_mem=256MB, log_statement=none, log_min_duration_statement=1000 (only log slow queries).
Resources: 100m–1000m CPU, 256Mi–1024Mi memory.

Both clusters expose pg_hba: host all all 0.0.0.0/0 md5 — in-cluster network reach is open, but NetworkPolicies in app namespaces and the absence of an external Service make the surface small.

Credentials¶

The bootstrap credentials are populated by ExternalSecrets pulling tomoda-db-password from the gsm-tomoda ClusterSecretStore (GCP Secret Manager). The ESO template formats them as a kubernetes.io/basic-auth Secret named postgres-<env>-credentials with username = the env owner and password from GSM. CNPG reads that secret during initdb.

The Tomoda backend's backend-secrets-<env> ExternalSecret pulls the same tomoda-db-password key separately and exposes it as DB_PASSWORD to the application — both sides stay in sync as long as the GSM value isn't changed.

DSNs (application-facing)¶

CNPG creates postgres-<env>-rw, postgres-<env>-ro, and postgres-<env>-r Services automatically. To keep the backend's DB_HOST env var stable across CNPG renames, each Cluster manifest declares a Service of type ExternalName that aliases the legacy hostname to the CNPG -rw Service:

Env	Application connects to	Aliases to
dev	`postgres-postgresql.data.svc.cluster.local`	`postgres-dev-rw.data.svc.cluster.local`
prod	`postgres-prod-postgresql.data.svc.cluster.local`	`postgres-prod-rw.data.svc.cluster.local`

These exact hostnames are baked into the Tomoda backend-config-<env> ConfigMap (see Tomoda).

Backups¶

Each Cluster ships its WALs and base backups to GCS via Barman:

backup:
  barmanObjectStore:
    destinationPath: gs://tomoda-db-backups-development-485000/<env>/
    googleCredentials:
      gkeEnvironment: true
  retentionPolicy: "30d"

gkeEnvironment: true tells Barman to use the pod's ambient Workload Identity credentials. The Cluster CR sets a serviceAccountTemplate so every Postgres instance pod runs as a K8s ServiceAccount annotated to impersonate cnpg-backup-sa@development-485000.iam.gserviceaccount.com. That GCP SA has write access to the backup bucket — see Backup infra.

A ScheduledBackup resource alongside the Cluster (manifests/backup.yaml) triggers a full base backup daily at 03:00 UTC, using the same barmanObjectStore method. Retention is 30 days.

Operations¶

Runbook-level steps — base backup on demand, point-in-time recovery, storage expansion, major-version upgrades, switchover — are in the Postgres operations runbook. A quick orientation:

Manual base backup — apply a Backup CR pointing at the cluster; CNPG runs it immediately and uploads to the configured destinationPath.
Scaling storage — increase spec.storage.size on the Cluster CR. CNPG resizes the PVC; the StorageClass (standard-rwo) supports online expansion on GKE.
Version upgrade — bump imageName (e.g. 17-3.5 → 18-3.5). CNPG performs an in-place minor upgrade per pod; majors require a Cluster.spec.bootstrap.recovery flow against a backup.
Connectivity check — kubectl -n data exec -it postgres-<env>-1 -- psql for ad-hoc shells; for UI access use pgAdmin.
Monitoring — monitoring.enablePodMonitor: true on both clusters means Prometheus (in the monitoring namespace) scrapes per-instance metrics; alerts on replication lag, WAL archive failure, and disk usage are wired into the platform alerting bundle.