Skip to content

Photon Indexer

Infrastructure for the multilingual Photon geocoder index: a GCS bucket that holds versioned index tarballs, plus a service account the indexer uses to upload them. The actual indexer job (a Kubernetes CronJob) is currently suspended and indexes are built manually on a developer's machine.

The Photon GCS bucket + service account + Workload Identity binding live outside Terraform — they're created manually via the bootstrap doc so terraform destroy can never delete the planet index (each rebuild costs ~$500 of compute). The K8s side is still GitOps-managed: workload manifests at k8s/apps/photon-indexer/, build script scripts/photon-index-local.sh.

Why this exists

rtuszik/photon-docker (the image we run for Photon itself) downloads its index over plain HTTP with no authentication. We can't serve indexes from a private GCS bucket, and we don't want to. So we host them at a public URL on GCS and rely on the fact that they are 100% derived from OSM data (no privacy concerns).

The bucket holds two kinds of objects:

  1. Versioned tarballsphoton-db-planet-multilang-2026-06.tar.bz2, etc.
  2. latest-* aliases — copies of the most-recent good build, used by Photon pods that pin to "latest stable".

Bucket

Field Value
Name ${project_id}-photon-index (currently development-485000-photon-index)
Location asia-east1
Storage class STANDARD
Uniform bucket-level access On
Versioning Off — we keep filenames versioned manually
Retention policy Effectively infinite (100 years) — set on the bucket itself, so even accidental gcloud storage rm calls are blocked
Not Terraform-managed Deliberately — moves the bucket outside of terraform destroy's reach

Lifecycle

Age Action
35 days Transition to Nearline (cost optimisation)

No delete lifecycle. Planet indexes cost ~$500 of compute each to rebuild — old versions stay forever. If you need to manually clear cruft someday, it's a deliberate gcloud storage rm against specific objects, not a lifecycle-driven sweep.

Public read

allUsers is granted roles/storage.objectViewer on the bucket. This is intentional and required because rtuszik/photon-docker fetches over unauthenticated HTTP.

The org has the iam.allowedPolicyMemberDomains constraint set, which normally blocks allUsers bindings. The bootstrap doc provisions an org-policy override at the project level (iam.allowedPolicyMemberDomains set to allowAll: true, scoped to this project only):

spec:
  inheritFromParent: false
  rules:
    - allowAll: true

Org policy override is project-scoped

The override applies to the entire development-485000 project, not just this bucket. Any other bucket in this project could also be made publicly readable now. We accept that trade-off because the project is single-tenant. If we ever split prod into its own project, do not copy this override blindly — re-evaluate first.

The org policy API is enabled by Terraform too (google_project_service.orgpolicy_api). Disabling it with disable_on_destroy = false is deliberate — pulling the API would not roll back the override, just make it un-manageable.

Service account

resource "google_service_account" "photon_indexer" {
  account_id   = "photon-indexer"
  display_name = "Photon Indexer"
}
  • Bound to the bucket with roles/storage.objectAdmin so it can both upload new tarballs and overwrite the latest-* aliases.
  • Bound via Workload Identity to the K8s SA data/photon-indexer, ready for the CronJob — even though the CronJob is currently suspended, the binding is in place so you don't have to touch Terraform when un-suspending it.
serviceAccount:${project_id}.svc.id.goog[data/photon-indexer]

CronJob: currently SUSPENDED

Don't expect the cluster to be building indexes

The CronJob in k8s/apps/photon-indexer/ is suspended awaiting Nominatim setup. As of this writing, no automated index builds are happening. Until we stand up Nominatim and un-suspend the CronJob, indexes are built and uploaded by hand from a developer's machine using scripts/photon-index-local.sh. That script authenticates as the photon-indexer SA via ADC.

The CronJob was suspended (not deleted) because:

  • The SA, IAM bindings, and bucket are all in place — un-suspending should be a one-line change.
  • Re-creating the CronJob from scratch later is more risk than leaving it dormant.

See the Photon multilang rollout runbook for how local builds work today, and the Photon K8s deployment page for how Photon pods consume the bucket.

Resource names (deterministic — no lookup needed)

Local scripts and K8s manifests hardcode these directly since they're stable contracts (never renamed):

Resource Value
Bucket development-485000-photon-index
Public base URL https://storage.googleapis.com/development-485000-photon-index
Indexer service account photon-indexer@development-485000.iam.gserviceaccount.com

The public base URL is what Photon pods are configured to fetch from.