Skip to content

Postgres (CloudNativePG)

Tomoda's primary datastore. Postgres runs in-cluster as a CloudNativePG (CNPG) Cluster per environment, not on Cloud SQL or any other managed service. Both environments share one GKE cluster and one data namespace, with separate Cluster CRs (postgres-dev, postgres-prod) backed by separate PVCs, secrets, and backup paths.

Not Cloud SQL

Earlier iterations of this stack assumed Cloud SQL would be the database. That is no longer the case — the only Postgres in Tomoda's path is the CNPG cluster described here. The Photon Indexer design still references a possible warm Nominatim Postgres on Cloud SQL (see Photon Indexer), but that is unrelated to application data.

Two layers

CNPG ships as two things: a cluster-scoped operator (the controller that owns the Cluster CRD) and the instances it manages. They are deployed by separate Argo CD Applications:

  • Operatork8s/envs/dev/sys/cnpg/application.yaml. Helm chart cloudnative-pg/cloudnative-pg, version 0.23.0, installed into cnpg-system. Pod monitor enabled so Prometheus scrapes the controller.
  • Instancesk8s/envs/<env>/middleware/postgres/manifests/cluster.yaml, applied by a per-env Argo CD Application that targets the data namespace.

The operator is dev-only in the path layout, but a single operator install reconciles Cluster CRs across all namespaces in the cluster — there is no separate prod operator.

Cluster CR

Both environments use the same image:

imageName: ghcr.io/cloudnative-pg/postgis:17-3.5

Postgres 17 with PostGIS 3.5 bundled. The bootstrap step runs CREATE EXTENSION postgis; CREATE EXTENSION postgis_topology; after initdb, so apps can use spatial types without any extra step.

Dev (postgres-dev)

  • 1 instance, 10 Gi PVC.
  • DB tomoda_dev, owner tomoda_dev_user.
  • max_connections=100, shared_buffers=128MB, log_statement=all (verbose for debugging).
  • Resources: 100m–500m CPU, 256Mi–512Mi memory.

Prod (postgres-prod)

  • 1 instance, 20 Gi PVC. The comment # Increase to 2+ for HA when ready flags that HA is intentional future work.
  • DB tomoda_prod, owner tomoda_prod_user.
  • shared_buffers=256MB, effective_cache_size=512MB, work_mem=4MB, log_statement=none, log_min_duration_statement=1000 (only log slow queries).
  • Resources: 100m–1000m CPU, 256Mi–1024Mi memory.

Both clusters expose pg_hba: host all all 0.0.0.0/0 md5 — in-cluster network reach is open, but NetworkPolicies in app namespaces and the absence of an external Service make the surface small.

Credentials

The bootstrap credentials are populated by ExternalSecrets pulling tomoda-db-password from the gsm-tomoda ClusterSecretStore (GCP Secret Manager). The ESO template formats them as a kubernetes.io/basic-auth Secret named postgres-<env>-credentials with username = the env owner and password from GSM. CNPG reads that secret during initdb.

The Tomoda backend's backend-secrets-<env> ExternalSecret pulls the same tomoda-db-password key separately and exposes it as DB_PASSWORD to the application — both sides stay in sync as long as the GSM value isn't changed.

DSNs (application-facing)

CNPG creates postgres-<env>-rw, postgres-<env>-ro, and postgres-<env>-r Services automatically. To keep the backend's DB_HOST env var stable across CNPG renames, each Cluster manifest declares a Service of type ExternalName that aliases the legacy hostname to the CNPG -rw Service:

Env Application connects to Aliases to
dev postgres-postgresql.data.svc.cluster.local postgres-dev-rw.data.svc.cluster.local
prod prod-postgres-postgresql.data.svc.cluster.local postgres-prod-rw.data.svc.cluster.local

These exact hostnames are baked into the Tomoda backend-config-<env> ConfigMap (see Tomoda).

Backups

Each Cluster ships its WALs and base backups to GCS via Barman:

backup:
  barmanObjectStore:
    destinationPath: gs://tomoda-db-backups-development-485000/<env>/
    googleCredentials:
      gkeEnvironment: true
  retentionPolicy: "30d"

gkeEnvironment: true tells Barman to use the pod's ambient Workload Identity credentials. The Cluster CR sets a serviceAccountTemplate so every Postgres instance pod runs as a K8s ServiceAccount annotated to impersonate cnpg-backup-sa@development-485000.iam.gserviceaccount.com. That GCP SA has write access to the backup bucket — see Backup infra.

A ScheduledBackup resource alongside the Cluster (manifests/backup.yaml) triggers a full base backup daily at 03:00 UTC, using the same barmanObjectStore method. Retention is 30 days.

Operations

Runbook-level steps — base backup on demand, point-in-time recovery, storage expansion, major-version upgrades, switchover — are in the Postgres operations runbook. A quick orientation:

  • Manual base backup — apply a Backup CR pointing at the cluster; CNPG runs it immediately and uploads to the configured destinationPath.
  • Scaling storage — increase spec.storage.size on the Cluster CR. CNPG resizes the PVC; the StorageClass (standard-rwo) supports online expansion on GKE.
  • Version upgrade — bump imageName (e.g. 17-3.518-3.5). CNPG performs an in-place minor upgrade per pod; majors require a Cluster.spec.bootstrap.recovery flow against a backup.
  • Connectivity checkkubectl -n data exec -it postgres-<env>-1 -- psql for ad-hoc shells; for UI access use pgAdmin.
  • Monitoringmonitoring.enablePodMonitor: true on both clusters means Prometheus (in the monitoring namespace) scrapes per-instance metrics; alerts on replication lag, WAL archive failure, and disk usage are wired into the platform alerting bundle.