Postgres (CloudNativePG)¶
Tomoda's primary datastore. Postgres runs in-cluster as a CloudNativePG (CNPG) Cluster per environment, not on Cloud SQL or any other managed service. Both environments share one GKE cluster and one data namespace, with separate Cluster CRs (postgres-dev, postgres-prod) backed by separate PVCs, secrets, and backup paths.
Not Cloud SQL
Earlier iterations of this stack assumed Cloud SQL would be the database. That is no longer the case — the only Postgres in Tomoda's path is the CNPG cluster described here. The Photon Indexer design still references a possible warm Nominatim Postgres on Cloud SQL (see Photon Indexer), but that is unrelated to application data.
Two layers¶
CNPG ships as two things: a cluster-scoped operator (the controller that owns the Cluster CRD) and the instances it manages. They are deployed by separate Argo CD Applications:
- Operator —
k8s/envs/dev/sys/cnpg/application.yaml. Helm chartcloudnative-pg/cloudnative-pg, version0.23.0, installed intocnpg-system. Pod monitor enabled so Prometheus scrapes the controller. - Instances —
k8s/envs/<env>/middleware/postgres/manifests/cluster.yaml, applied by a per-env Argo CD Application that targets thedatanamespace.
The operator is dev-only in the path layout, but a single operator install reconciles Cluster CRs across all namespaces in the cluster — there is no separate prod operator.
Cluster CR¶
Both environments use the same image:
imageName: ghcr.io/cloudnative-pg/postgis:17-3.5
Postgres 17 with PostGIS 3.5 bundled. The bootstrap step runs CREATE EXTENSION postgis; CREATE EXTENSION postgis_topology; after initdb, so apps can use spatial types without any extra step.
Dev (postgres-dev)¶
- 1 instance,
10 GiPVC. - DB
tomoda_dev, ownertomoda_dev_user. max_connections=100,shared_buffers=128MB,log_statement=all(verbose for debugging).- Resources: 100m–500m CPU, 256Mi–512Mi memory.
Prod (postgres-prod)¶
- 1 instance,
20 GiPVC. The comment# Increase to 2+ for HA when readyflags that HA is intentional future work. - DB
tomoda_prod, ownertomoda_prod_user. shared_buffers=256MB,effective_cache_size=512MB,work_mem=4MB,log_statement=none,log_min_duration_statement=1000(only log slow queries).- Resources: 100m–1000m CPU, 256Mi–1024Mi memory.
Both clusters expose pg_hba: host all all 0.0.0.0/0 md5 — in-cluster network reach is open, but NetworkPolicies in app namespaces and the absence of an external Service make the surface small.
Credentials¶
The bootstrap credentials are populated by ExternalSecrets pulling tomoda-db-password from the gsm-tomoda ClusterSecretStore (GCP Secret Manager). The ESO template formats them as a kubernetes.io/basic-auth Secret named postgres-<env>-credentials with username = the env owner and password from GSM. CNPG reads that secret during initdb.
The Tomoda backend's backend-secrets-<env> ExternalSecret pulls the same tomoda-db-password key separately and exposes it as DB_PASSWORD to the application — both sides stay in sync as long as the GSM value isn't changed.
DSNs (application-facing)¶
CNPG creates postgres-<env>-rw, postgres-<env>-ro, and postgres-<env>-r Services automatically. To keep the backend's DB_HOST env var stable across CNPG renames, each Cluster manifest declares a Service of type ExternalName that aliases the legacy hostname to the CNPG -rw Service:
| Env | Application connects to | Aliases to |
|---|---|---|
| dev | postgres-postgresql.data.svc.cluster.local |
postgres-dev-rw.data.svc.cluster.local |
| prod | prod-postgres-postgresql.data.svc.cluster.local |
postgres-prod-rw.data.svc.cluster.local |
These exact hostnames are baked into the Tomoda backend-config-<env> ConfigMap (see Tomoda).
Backups¶
Each Cluster ships its WALs and base backups to GCS via Barman:
backup:
barmanObjectStore:
destinationPath: gs://tomoda-db-backups-development-485000/<env>/
googleCredentials:
gkeEnvironment: true
retentionPolicy: "30d"
gkeEnvironment: true tells Barman to use the pod's ambient Workload Identity credentials. The Cluster CR sets a serviceAccountTemplate so every Postgres instance pod runs as a K8s ServiceAccount annotated to impersonate cnpg-backup-sa@development-485000.iam.gserviceaccount.com. That GCP SA has write access to the backup bucket — see Backup infra.
A ScheduledBackup resource alongside the Cluster (manifests/backup.yaml) triggers a full base backup daily at 03:00 UTC, using the same barmanObjectStore method. Retention is 30 days.
Operations¶
Runbook-level steps — base backup on demand, point-in-time recovery, storage expansion, major-version upgrades, switchover — are in the Postgres operations runbook. A quick orientation:
- Manual base backup — apply a
BackupCR pointing at the cluster; CNPG runs it immediately and uploads to the configureddestinationPath. - Scaling storage — increase
spec.storage.sizeon the Cluster CR. CNPG resizes the PVC; the StorageClass (standard-rwo) supports online expansion on GKE. - Version upgrade — bump
imageName(e.g.17-3.5→18-3.5). CNPG performs an in-place minor upgrade per pod; majors require aCluster.spec.bootstrap.recoveryflow against a backup. - Connectivity check —
kubectl -n data exec -it postgres-<env>-1 -- psqlfor ad-hoc shells; for UI access use pgAdmin. - Monitoring —
monitoring.enablePodMonitor: trueon both clusters means Prometheus (in themonitoringnamespace) scrapes per-instance metrics; alerts on replication lag, WAL archive failure, and disk usage are wired into the platform alerting bundle.