Bootstrap¶

One-time setup steps that must run before Terraform can manage anything. These create the resources Terraform itself depends on (state bucket, the Photon index bucket — kept manual deliberately so terraform destroy can't touch them) plus the bootstrap Argo CD applications.

Read this before tearing down a cluster

The Photon index bucket is intentionally outside Terraform management. Each multilingual planet index costs roughly $500 worth of compute to rebuild (multiple hours of pipeline + storage). Detaching it from Terraform means terraform destroy can never accidentally delete it. The steps in this doc create those resources by hand once, then leave them alone forever.

What lives outside Terraform¶

Three permanent resources that Terraform never touches:

Resource	Why it's outside Terraform
`gs://development-485000-tfstate`	Hosts the Terraform state itself. Classic chicken-and-egg — Terraform can't manage the bucket that holds its own state
`gs://development-485000-photon-index-usc1`	$500 of compute per planet index. Survives any `terraform destroy`. Re-attachable via `terraform import` if needed later
`photon-indexer@…iam.gserviceaccount.com`	SA used by the indexer to write to the bucket. Bound to the K8s SA via Workload Identity — manually maintained alongside the bucket

Everything else (GKE, VPC, Cloud Build, Argo CD, ACM, S3, CloudFront, Cloudflare records) is Terraform-managed.

Prerequisites¶

# 1. gcloud authenticated for both shell + ADC (Application Default Credentials,
#    used by Terraform + Google client libraries).
gcloud auth login
gcloud config set project development-485000

gcloud auth application-default login
gcloud auth application-default set-quota-project development-485000

# 2. Required gcloud components — installed once per machine
gcloud components install gke-gcloud-auth-plugin

Step 1 — Terraform state bucket¶

Run once per project. Creates the GCS bucket that holds Terraform state for both infrastructure/gcp/ and infrastructure/aws/.

gcloud storage buckets create gs://development-485000-tfstate \
  --project=development-485000 \
  --location=us-central1 \
  --uniform-bucket-level-access \
  --public-access-prevention

# Object versioning so a bad apply doesn't lose state
gcloud storage buckets update gs://development-485000-tfstate --versioning

# Verify
gcloud storage buckets describe gs://development-485000-tfstate \
  --format="value(name,location,versioning.enabled)"

State is stored under two prefixes — terraform/state/ for the GCP stack and aws/state/ for the AWS stack. Splitting by prefix keeps state files separate; sharing the bucket means one set of permissions and one place to look for history.

Step 2 — Photon index bucket + service account¶

Run once per project. Creates the bucket where Photon multilingual indexes are uploaded and the service account the indexer uses to write to it.

PROJECT_ID="development-485000"
BUCKET="${PROJECT_ID}-photon-index-usc1"

# 2a. The bucket itself. STANDARD class with NEARLINE transition at 35 days
#     for cost savings. NO DELETE LIFECYCLE — planet indexes cost ~$500 of
#     compute each to rebuild. Old indexes stay forever; manual cleanup only.
#     Public read so the rtuszik/photon-docker image (which has no GCS auth)
#     can download index tarballs.
gcloud storage buckets create "gs://${BUCKET}" \
  --project="${PROJECT_ID}" \
  --location=us-central1 \
  --default-storage-class=STANDARD \
  --uniform-bucket-level-access \
  --lifecycle-file=<(cat <<'EOF'
{
  "lifecycle": {
    "rule": [
      { "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"}, "condition": {"age": 35} }
    ]
  }
}
EOF
)

# 2a-bis. Belt-and-suspenders: turn on bucket retention policy so even an
# accidental `gcloud storage rm` can't blow indexes away. Set effectively
# infinite (100 years) — adjust later if you ever genuinely want to delete.
# NOTE: once set, this can only be RELAXED by `gcloud storage buckets update
# --no-retention-policy`, but the policy itself protects against accidental
# rm and rapid lifecycle reconfiguration.
gcloud storage buckets update "gs://${BUCKET}" --retention-period=3155760000s

# 2b. Relax the iam.allowedPolicyMemberDomains org policy on this project so
#     the bucket can grant allUsers read. Scoped to the project — does not
#     loosen the org-level constraint anywhere else.
gcloud services enable orgpolicy.googleapis.com --project="${PROJECT_ID}"

gcloud org-policies set-policy --project="${PROJECT_ID}" /dev/stdin <<EOF
name: projects/${PROJECT_ID}/policies/iam.allowedPolicyMemberDomains
spec:
  inheritFromParent: false
  rules:
    - allowAll: true
EOF

# 2c. Public read on the bucket
gcloud storage buckets add-iam-policy-binding "gs://${BUCKET}" \
  --member="allUsers" --role="roles/storage.objectViewer"

# 2d. Service account for the indexer
gcloud iam service-accounts create photon-indexer \
  --display-name="Photon Indexer" \
  --description="Builds and uploads Photon multilingual indexes to GCS" \
  --project="${PROJECT_ID}"

# 2e. SA can write to the bucket
gcloud storage buckets add-iam-policy-binding "gs://${BUCKET}" \
  --member="serviceAccount:photon-indexer@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

# 2f. Workload Identity binding so the GKE Job pod can impersonate the SA.
#     Namespace + KSA name must match the K8s manifests in k8s/.../photon-indexer/.
gcloud iam service-accounts add-iam-policy-binding \
  "photon-indexer@${PROJECT_ID}.iam.gserviceaccount.com" \
  --project="${PROJECT_ID}" \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:${PROJECT_ID}.svc.id.goog[platform/photon-indexer]"

Bucket name (${PROJECT_ID}-photon-index-usc1) is the contract — backend code, K8s manifests, and the local index-build script all hardcode it. Don't rename.

Step 3 — `terraform init` against the GCS backend¶

After Step 1, Terraform can initialise its remote state. Run in each Terraform directory:

cd infrastructure/gcp && terraform init
cd ../aws && terraform init

Both directories have a backend "gcs" block in backend.tf — terraform init reads it, talks to GCS, and refuses to operate against local state. If you ever see Backend reinitialization required, run terraform init -reconfigure.

Step 4 — Terraform apply (cluster + supporting resources)¶

# GCP — GKE cluster, VPC, Argo CD, Cloud Build, Artifact Registry, IAM, ...
cd infrastructure/gcp
terraform plan -out=plan.out
terraform apply plan.out

# AWS — S3 asset bucket, CloudFront distribution, ACM cert (Cloudflare-validated)
cd ../aws
terraform workspace select default
terraform plan -out=plan.out
terraform apply plan.out

Both should report 0 destroys when run against a fresh state (because Step 1 + Step 2 left the existing buckets out of Terraform's view).

Step 5 — GitHub App for repo access + ARC runners¶

Required. This single GitHub App serves two purposes: it is the credential Argo CD uses to clone this private repo (Step 6), and it authenticates the self-hosted ARC runner pool in k8s/envs/platform/arc-*/. Apps are scoped, rotatable, and don't tie access to a single human's GitHub account.

If you only want Argo (not ARC runners), you still need this App — Argo can't sync without repo read access.

Go to https://github.com/organizations/tomoda-labs/settings/apps → New GitHub App.
Fill in:
- GitHub App name: Tomoda ARC Runners
- Homepage URL: https://github.com/tomoda-labs/devops
- Webhook URL: disable webhooks (uncheck "Active")
Permissions — Repository permissions:
- Contents: Read-only (Argo CD clones the repo with this)
- Actions: Read and write
- Administration: Read and write
- Checks: Read
- Metadata: Read (auto-selected)
Where can this GitHub App be installed? → Only on this account.
Click Create GitHub App.
On the next page, note the App ID (top of the page, ~6 digits).
Click Generate a private key → downloads tomoda-arc-runners.<date>.private-key.pem. Treat this file like a password.
Left sidebar → Install App → install on the tomoda-labs org → choose "All repositories", or "Only select repositories" and include devops (Argo can't clone a repo outside the installation's scope).
After install, the URL is https://github.com/organizations/tomoda-labs/settings/installations/<installation-id> — note the Installation ID (the numeric path segment).

Push all three to GCP SM:

PROJECT_ID=development-485000

# App ID (numeric, not sensitive)
echo -n "<app-id-from-step-6>" | gcloud secrets create tomoda-github-app-id \
  --project="${PROJECT_ID}" --replication-policy=automatic --data-file=-

# Installation ID (numeric, not sensitive)
echo -n "<installation-id-from-step-9>" | gcloud secrets create tomoda-github-app-installation-id \
  --project="${PROJECT_ID}" --replication-policy=automatic --data-file=-

# Private key — the PEM file from step 7. Send the WHOLE FILE including the
# BEGIN/END markers.
gcloud secrets create tomoda-github-app-private-key \
  --project="${PROJECT_ID}" --replication-policy=automatic \
  --data-file=tomoda-arc-runners.<date>.private-key.pem

# Verify
gcloud secrets list --project="${PROJECT_ID}" --filter="name~tomoda-github-app"
# Expected: tomoda-github-app-id, tomoda-github-app-installation-id, tomoda-github-app-private-key

After uploading, delete the local PEM file — the only authoritative copy now lives in GCP SM:

shred -u tomoda-arc-runners.<date>.private-key.pem   # Linux
rm -P tomoda-arc-runners.<date>.private-key.pem      # macOS

If you rotate the App's private key in the GitHub UI later, push the new PEM via gcloud secrets versions add tomoda-github-app-private-key --data-file=…, then restart the ARC controller + listener pods to pick up the new key without waiting for ESO's 1h refresh:

kubectl rollout restart deploy/arc-controller-gha-rs-controller -n arc-systems
kubectl delete pod -n arc-runners -l app.kubernetes.io/component=runner-set-listener

Step 6 — Apply the Argo CD bootstrap¶

Once Argo CD is running (created by infrastructure/gcp/argocd.tf), give it a credential for this private repo, then apply the app-of-apps.

# Use the new GKE context. If kubectl errors with "gke-gcloud-auth-plugin not
# found", install it (`gcloud components install gke-gcloud-auth-plugin`) and
# make sure it's on PATH.
gcloud container clusters get-credentials gke-tomoda \
  --zone us-central1-a --project development-485000

Repo credential (seed secret). Argo can't clone the private repo until it has a credential, and that credential can't come from the manifests Argo can't read yet — so create it imperatively from the GitHub App keys already in GCP SM (Step 5). This is the one bootstrap secret that lives outside ESO; the App private key is streamed straight from Secret Manager and never written to disk.

PROJ=development-485000
kubectl create secret generic repo-devops -n argocd \
  --from-literal=type=git \
  --from-literal=url=https://github.com/tomoda-labs/devops.git \
  --from-literal=githubAppID="$(gcloud secrets versions access latest --secret=tomoda-github-app-id --project=$PROJ)" \
  --from-literal=githubAppInstallationID="$(gcloud secrets versions access latest --secret=tomoda-github-app-installation-id --project=$PROJ)" \
  --from-file=githubAppPrivateKey=<(gcloud secrets versions access latest --secret=tomoda-github-app-private-key --project=$PROJ)
kubectl label secret repo-devops -n argocd argocd.argoproj.io/secret-type=repository --overwrite

# App-of-apps root. Creates the platform/dev/prod Argo Applications that
# recurse through k8s/envs/ and bring up the rest of the cluster.
kubectl apply -f k8s/envs/bootstrap.yaml

Reconciliation takes ~5-10 minutes — kubectl get applications -n argocd -w shows progress.

To bring up only some tiers, either apply bootstrap.yaml and delete the tiers you don't want (kubectl delete application dev -n argocd), or apply only the Application blocks you do want. Prod is intentionally left for the Production Launch runbook — don't apply the prod Application during beta.

Verification checklist¶

After all the steps above:

gcloud storage buckets list --filter="name~tfstate" shows development-485000-tfstate
gcloud storage buckets list --filter="name~photon-index-usc1" shows development-485000-photon-index-usc1
gcloud iam service-accounts list --filter="email~photon-indexer" shows the SA
terraform state list (in each dir) shows resources — no Photon entries
kubectl get applications -n argocd shows the bootstrap apps in Synced / Healthy

Environments — what dev vs prod means in this single-cluster setup
Photon Indexer — how the index pipeline runs
Secrets Management — the GCP SM + AWS SM bridge