Skip to content

Bootstrap

One-time setup steps that must run before Terraform can manage anything. These create the resources Terraform itself depends on (state bucket, the Photon index bucket — kept manual deliberately so terraform destroy can't touch them) plus the bootstrap Argo CD applications.

Read this before tearing down a cluster

The Photon index bucket is intentionally outside Terraform management. Each multilingual planet index costs roughly $500 worth of compute to rebuild (multiple hours of pipeline + storage). Detaching it from Terraform means terraform destroy can never accidentally delete it. The steps in this doc create those resources by hand once, then leave them alone forever.

What lives outside Terraform

Three permanent resources that Terraform never touches:

Resource Why it's outside Terraform
gs://development-485000-tfstate Hosts the Terraform state itself. Classic chicken-and-egg — Terraform can't manage the bucket that holds its own state
gs://development-485000-photon-index $500 of compute per planet index. Survives any terraform destroy. Re-attachable via terraform import if needed later
photon-indexer@…iam.gserviceaccount.com SA used by the indexer to write to the bucket. Bound to the K8s SA via Workload Identity — manually maintained alongside the bucket

Everything else (GKE, VPC, Cloud Build, Argo CD, ACM, S3, CloudFront, Cloudflare records) is Terraform-managed.

Prerequisites

# 1. gcloud authenticated for both shell + ADC (Application Default Credentials,
#    used by Terraform + Google client libraries).
gcloud auth login
gcloud config set project development-485000

gcloud auth application-default login
gcloud auth application-default set-quota-project development-485000

# 2. Required gcloud components — installed once per machine
gcloud components install gke-gcloud-auth-plugin

Step 1 — Terraform state bucket

Run once per project. Creates the GCS bucket that holds Terraform state for both infrastructure/gcp/ and infrastructure/aws/.

gcloud storage buckets create gs://development-485000-tfstate \
  --project=development-485000 \
  --location=asia-east1 \
  --uniform-bucket-level-access \
  --public-access-prevention

# Object versioning so a bad apply doesn't lose state
gcloud storage buckets update gs://development-485000-tfstate --versioning

# Verify
gcloud storage buckets describe gs://development-485000-tfstate \
  --format="value(name,location,versioning.enabled)"

State is stored under two prefixes — terraform/state/ for the GCP stack and aws/state/ for the AWS stack. Splitting by prefix keeps state files separate; sharing the bucket means one set of permissions and one place to look for history.

Step 2 — Photon index bucket + service account

Run once per project. Creates the bucket where Photon multilingual indexes are uploaded and the service account the indexer uses to write to it.

PROJECT_ID="development-485000"
BUCKET="${PROJECT_ID}-photon-index"

# 2a. The bucket itself. STANDARD class with NEARLINE transition at 35 days
#     for cost savings. NO DELETE LIFECYCLE — planet indexes cost ~$500 of
#     compute each to rebuild. Old indexes stay forever; manual cleanup only.
#     Public read so the rtuszik/photon-docker image (which has no GCS auth)
#     can download index tarballs.
gcloud storage buckets create "gs://${BUCKET}" \
  --project="${PROJECT_ID}" \
  --location=asia-east1 \
  --default-storage-class=STANDARD \
  --uniform-bucket-level-access \
  --lifecycle-file=<(cat <<'EOF'
{
  "lifecycle": {
    "rule": [
      { "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"}, "condition": {"age": 35} }
    ]
  }
}
EOF
)

# 2a-bis. Belt-and-suspenders: turn on bucket retention policy so even an
# accidental `gcloud storage rm` can't blow indexes away. Set effectively
# infinite (100 years) — adjust later if you ever genuinely want to delete.
# NOTE: once set, this can only be RELAXED by `gcloud storage buckets update
# --no-retention-policy`, but the policy itself protects against accidental
# rm and rapid lifecycle reconfiguration.
gcloud storage buckets update "gs://${BUCKET}" --retention-period=3155760000s

# 2b. Relax the iam.allowedPolicyMemberDomains org policy on this project so
#     the bucket can grant allUsers read. Scoped to the project — does not
#     loosen the org-level constraint anywhere else.
gcloud services enable orgpolicy.googleapis.com --project="${PROJECT_ID}"

gcloud org-policies set-policy --project="${PROJECT_ID}" /dev/stdin <<EOF
name: projects/${PROJECT_ID}/policies/iam.allowedPolicyMemberDomains
spec:
  inheritFromParent: false
  rules:
    - allowAll: true
EOF

# 2c. Public read on the bucket
gcloud storage buckets add-iam-policy-binding "gs://${BUCKET}" \
  --member="allUsers" --role="roles/storage.objectViewer"

# 2d. Service account for the indexer
gcloud iam service-accounts create photon-indexer \
  --display-name="Photon Indexer" \
  --description="Builds and uploads Photon multilingual indexes to GCS" \
  --project="${PROJECT_ID}"

# 2e. SA can write to the bucket
gcloud storage buckets add-iam-policy-binding "gs://${BUCKET}" \
  --member="serviceAccount:photon-indexer@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

# 2f. Workload Identity binding so the GKE Job pod can impersonate the SA.
#     Namespace + KSA name must match the K8s manifests in k8s/.../photon-indexer/.
gcloud iam service-accounts add-iam-policy-binding \
  "photon-indexer@${PROJECT_ID}.iam.gserviceaccount.com" \
  --project="${PROJECT_ID}" \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:${PROJECT_ID}.svc.id.goog[platform/photon-indexer]"

Bucket name (${PROJECT_ID}-photon-index) is the contract — backend code, K8s manifests, and the local index-build script all hardcode it. Don't rename.

Step 3 — terraform init against the GCS backend

After Step 1, Terraform can initialise its remote state. Run in each Terraform directory:

cd infrastructure/gcp && terraform init
cd ../aws && terraform init

Both directories have a backend "gcs" block in backend.tfterraform init reads it, talks to GCS, and refuses to operate against local state. If you ever see Backend reinitialization required, run terraform init -reconfigure.

Step 4 — Terraform apply (cluster + supporting resources)

# GCP — GKE cluster, VPC, Argo CD, Cloud Build, Artifact Registry, IAM, ...
cd infrastructure/gcp
terraform plan -out=plan.out
terraform apply plan.out

# AWS — S3 asset bucket, CloudFront distribution, ACM cert (Cloudflare-validated)
cd ../aws
terraform workspace select default
terraform plan -out=plan.out
terraform apply plan.out

Both should report 0 destroys when run against a fresh state (because Step 1 + Step 2 left the existing buckets out of Terraform's view).

Step 5 — GitHub App for ARC self-hosted runners

Optional unless you want the self-hosted ARC runner pool in k8s/envs/platform/arc-*/ to come up. Skip this step if you're fine relying on GitHub-hosted runners (or until you exhaust the GitHub free-tier minutes).

The runner pool authenticates to GitHub as a GitHub App, not a personal access token. Apps are scoped, rotatable, and don't tie CI to a single human's GitHub account.

  1. Go to https://github.com/organizations/tomoda-labs/settings/appsNew GitHub App.
  2. Fill in:
    • GitHub App name: Tomoda ARC Runners
    • Homepage URL: https://github.com/tomoda-labs/devops
    • Webhook URL: disable webhooks (uncheck "Active")
  3. Permissions — Repository permissions:
    • Actions: Read and write
    • Administration: Read and write
    • Checks: Read
    • Metadata: Read (auto-selected)
  4. Where can this GitHub App be installed? → Only on this account.
  5. Click Create GitHub App.
  6. On the next page, note the App ID (top of the page, ~6 digits).
  7. Click Generate a private key → downloads tomoda-arc-runners.<date>.private-key.pem. Treat this file like a password.
  8. Left sidebar → Install App → install on the tomoda-labs org → choose "All repositories" (or selected repos for tighter scope).
  9. After install, the URL is https://github.com/organizations/tomoda-labs/settings/installations/<installation-id>note the Installation ID (the numeric path segment).

Push all three to GCP SM:

PROJECT_ID=development-485000

# App ID (numeric, not sensitive)
echo -n "<app-id-from-step-6>" | gcloud secrets create tomoda-github-app-id \
  --project="${PROJECT_ID}" --replication-policy=automatic --data-file=-

# Installation ID (numeric, not sensitive)
echo -n "<installation-id-from-step-9>" | gcloud secrets create tomoda-github-app-installation-id \
  --project="${PROJECT_ID}" --replication-policy=automatic --data-file=-

# Private key — the PEM file from step 7. Send the WHOLE FILE including the
# BEGIN/END markers.
gcloud secrets create tomoda-github-app-private-key \
  --project="${PROJECT_ID}" --replication-policy=automatic \
  --data-file=tomoda-arc-runners.<date>.private-key.pem

# Verify
gcloud secrets list --project="${PROJECT_ID}" --filter="name~tomoda-github-app"
# Expected: tomoda-github-app-id, tomoda-github-app-installation-id, tomoda-github-app-private-key

After uploading, delete the local PEM file — the only authoritative copy now lives in GCP SM:

shred -u tomoda-arc-runners.<date>.private-key.pem   # Linux
rm -P tomoda-arc-runners.<date>.private-key.pem      # macOS

If you rotate the App's private key in the GitHub UI later, push the new PEM via gcloud secrets versions add tomoda-github-app-private-key --data-file=…, then restart the ARC controller + listener pods to pick up the new key without waiting for ESO's 1h refresh:

kubectl rollout restart deploy/arc-controller-gha-rs-controller -n arc-systems
kubectl delete pod -n arc-runners -l app.kubernetes.io/component=runner-set-listener

Step 6 — Apply the Argo CD bootstrap

Once Argo CD is running (created by infrastructure/gcp/argocd.tf), point it at this repo:

# Use the new GKE context
gcloud container clusters get-credentials gke-tomoda \
  --zone asia-east1-a --project development-485000

# App-of-apps root. Creates the platform/dev/prod Argo Applications that
# recurse through k8s/envs/ and bring up the rest of the cluster.
kubectl apply -f k8s/envs/bootstrap.yaml

Reconciliation takes ~5-10 minutes — kubectl get applications -n argocd -w shows progress.

To skip a tier (e.g. you only want platform + prod, not dev), after the bootstrap apply: kubectl delete application dev -n argocd.

Verification checklist

After all the steps above:

  • gcloud storage buckets list --filter="name~tfstate" shows development-485000-tfstate
  • gcloud storage buckets list --filter="name~photon-index" shows development-485000-photon-index
  • gcloud iam service-accounts list --filter="email~photon-indexer" shows the SA
  • terraform state list (in each dir) shows resources — no Photon entries
  • kubectl get applications -n argocd shows the bootstrap apps in Synced / Healthy