Photon Indexer¶
The CronJob at k8s/apps/photon-indexer/cronjob.yaml is the future home of the monthly Photon index rebuild. Today it is suspended; manual builds run on a developer or operator machine via scripts/photon-index-local.sh and are uploaded to the same GCS bucket the production Photon pod polls.
Currently suspended
spec.suspend: true — the CronJob does not execute. Flipping it to false is blocked on several pieces of infrastructure that have not been provisioned yet (see "Blockers" below). Until then, the manifest is a design document for the eventual in-cluster job.
Schedule (planned)¶
schedule: "0 2 1 * *" # 02:00 UTC, 1st of every month
concurrencyPolicy: Forbid # one build at a time
startingDeadlineSeconds: 86400 # tolerate up to 24h of cluster maintenance
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
backoffLimit: 1
activeDeadlineSeconds: 172800 # hard kill after 48h
Once enabled, this runs once per month on the 1st at 02:00 UTC. Forbid prevents overlap if a previous build is still going (planet builds can take days). startingDeadlineSeconds lets the CronJob recover after node-pool maintenance windows.
Resource budget¶
The build is RAM-bound (osm2pgsql cache + Photon JVM heap) and disk-heavy (Nominatim Postgres + tar archive):
resources:
requests: { memory: "200Gi", cpu: "28", ephemeral-storage: "350Gi" }
limits: { memory: "240Gi", cpu: "32", ephemeral-storage: "400Gi" }
env:
- { name: JAVA_HEAP, value: "200g" }
This is sized for a planet build. It cannot run on the shared GKE node pools — it requires a dedicated photon-indexer spot node pool that is not yet terraformed. See Photon Indexer infra for the planned pool spec and Workload Identity binding.
The pod selects onto the dedicated pool and tolerates spot pre-emption:
nodeSelector:
cloud.google.com/gke-spot: "true"
workload: photon-indexer
tolerations:
- { key: dedicated, value: photon-indexer, effect: NoSchedule }
- { key: cloud.google.com/gke-spot, value: "true", effect: NoSchedule }
What the job does¶
Container image: asia-east1-docker.pkg.dev/development-485000/tomoda-prod-repo/photon-indexer:latest (not yet built — see Blockers). When enabled, the indexer will:
- Read OSM data into a Nominatim Postgres (the warm DB referenced below).
- Run Photon's
nominatim-importagainst that DB to produce a multilingual Lucene index covering 28 languages:en, ja, ko, zh, zh-Hans, zh-Hant, ar, he, hi, vi, th, id, tr, es, fr, de, it, pt, nl, pl, ru, sv, no, da, fi, el, cs, uk. - tar + MD5 the result and upload to
gs://development-485000-photon-indexvia Workload Identity (the K8sphoton-indexerServiceAccount → GCP SAphoton-indexer@development-485000.iam.gserviceaccount.com).
The Photon Deployment polls the bucket every 24h and atomically swaps to the new index — there is no other handoff.
Two source files describe the build pipeline in detail:
k8s/apps/photon-indexer/cronjob.yaml— the future in-cluster CronJob shown here.k8s/apps/photon-indexer/docker-compose.build.yml— the local Nominatim + photon-indexer compose stack used byscripts/photon-index-local.shtoday.
Blockers¶
The header comment in cronjob.yaml lists what needs to land before suspend: false is safe:
- Warm Nominatim Postgres — Cloud SQL or a StatefulSet so the CronJob doesn't re-import OSM from scratch each month. Today the manual script imports OSM end-to-end every run.
- Nominatim incremental updates — Geofabrik diff replication into the warm DB so each month is a diff, not a full reload.
- Slim CronJob — rewrite this manifest to run only Photon's
nominatim-importstep against the warm DB, then tar + upload to GCS. - Indexer node pool —
photon-indexerGKE spot pool, terraformed. - Workload Identity binding — the K8s SA → GCP SA mapping is created during manual bootstrap (see bootstrap doc); the K8s side will activate when the pool exists.
- Indexer image in Artifact Registry — build and push.
Today's path¶
While the CronJob is suspended, indexes are built monthly by an operator running scripts/photon-index-local.sh on a beefy machine (developer laptop for regional builds, one-off GCE VM for planet). The script wraps docker-compose.build.yml and uploads to the same GCS bucket — so the production Photon pod is agnostic to which path produced the index. The Photon Multilang Rollout doc covers the current operator workflow.