Skip to content

ARC Self-Hosted Runners

Tomoda ARC Runners GitHub App avatar

GitHub Actions runners that live on the GKE cluster instead of GitHub-hosted. Saves the free-plan Actions-minutes limit and gives CI the same cluster + network access the rest of the workloads have.

Powered by Actions Runner Controller (ARC) — the GitHub-blessed K8s operator. Runner pods are ephemeral: spawn on a queued job, register with GitHub, run the job, terminate. Scales to 0 between jobs.

Architecture

GitHub Actions queue
        │
        │  poll for jobs
        ▼
┌─────────────────────────────┐
│ ARC controller              │   (arc-systems namespace)
│ gha-runner-scale-set-       │   - Always-on, ~50m CPU / 64Mi RAM
│ controller                  │   - Leader-elected operator
└─────────────────────────────┘
        │  manages
        ▼
┌─────────────────────────────┐
│ Runner Scale Set listener   │   (arc-runners namespace)
│ Service: tomoda-arc         │   - Polls GitHub API
└─────────────────────────────┘
        │  spawns N pods per job, up to maxRunners
        ▼
┌─────────────────────────────┐
│ Runner pods (ephemeral)     │   (arc-runners namespace)
│ runs-on: tomoda-arc         │   - 200m CPU / 512Mi requested, 1.5/2Gi limit
│ ghcr.io/actions/actions-    │   - Lives only while a job is running
│ runner:latest               │   - Tears down between jobs
└─────────────────────────────┘

Cost (asia-east1 spot pricing)

Component Idle cost Active cost
ARC controller ~$0.50/mo (always-on tiny pod) same — does no real work between jobs
Listener pod bundled in controller cost same
Runner pods $0 (terminated between jobs) ~$0.002/min of e2-medium spot per concurrent job
Extra node capacity (autoscale-up when concurrent jobs exceed current nodes) $0 most of the time ~$0.004/min of e2-medium when peaking
Realistic monthly total ~$0.50 ~$2-5/mo for 5-10 PRs/day

Math: 5 PRs × 4 workflows × 8 min ≈ 160 runner-min/day = 80 hrs/mo of compute. At spot pricing in asia-east1, that's $1-2 of actual job time on top of the $0.50 controller baseline.

The node pool already has room (high_mem_spot_nodes.max_node_count: 6 after PR #14). ARC runners get the same spot nodes as everything else — preemption mid-job is acceptable since GitHub Actions retries on its own.

Capping behaviour

  • minRunners: 0 in k8s/envs/platform/arc-runners/values.yaml — scales to zero between jobs
  • maxRunners: 3 — no more than 3 concurrent jobs. Two reasons:
    1. Bounds the worst-case compute spend
    2. Prevents starving prod (or dev tomoda) of node capacity under a CI storm
  • Per-pod limits: 1500m CPU / 2 Gi RAM — heavy Go builds + Node lint fit; native iOS / Android builds would need a larger pool

Opting a workflow in

In any workflow YAML, replace the runs-on: value:

# Before
jobs:
  build:
    runs-on: ubuntu-latest

# After
jobs:
  build:
    runs-on: tomoda-arc

That's it. The next git push to a branch with that workflow queues a job; ARC sees it within ~10s and spawns a runner pod; the pod registers as tomoda-arc and runs the job; pod terminates.

The runner label tomoda-arc matches the chart release name in arc-runners/application.yaml (releaseName: tomoda-arc). If you ever rename the release, update every workflow's runs-on:.

Verification

# 1. Controller is running
kubectl get pods -n arc-systems
#   NAME                                READY   STATUS
#   arc-gha-rs-controller-...           1/1     Running

# 2. Runner Scale Set is registered with GitHub
kubectl get autoscalingrunnerset -n arc-runners
#   NAME         MINIMUM   MAXIMUM   CURRENT   STATE
#   tomoda-arc   0         3         0         <pending until first job>

# 3. Listener pod is polling GitHub
kubectl get pods -n arc-runners -l app.kubernetes.io/component=runner-set-listener
#   NAME                                READY   STATUS
#   tomoda-arc-...-listener             1/1     Running

# 4. The ESO-projected GitHub App secret exists
kubectl get secret arc-github-app-credentials -n arc-runners
#   NAME                          TYPE     DATA   AGE
#   arc-github-app-credentials    Opaque   3      <new>

# 5. Trigger a test job
#    In a workflow YAML, change runs-on to tomoda-arc + push.
#    Watch ARC spawn a pod:
kubectl get pods -n arc-runners -w
#   New pod appears within ~10s of the job being queued
#   Pod completes + terminates within seconds of the job finishing

If a runner pod never spawns when a job is queued, check the listener pod logs:

kubectl logs -n arc-runners -l app.kubernetes.io/component=runner-set-listener --tail=50

Most likely cause: GitHub App permissions are missing or wrong. Re-verify the App has Repository: Actions: Read+Write and Administration: Read+Write.

What runners CAN'T do (current limits)

  • macOS-only jobs — runner pods are Linux containers. iOS-build workflows that need a Mac runner stay on GitHub-hosted macOS runners (or runs-on: macos-latest).
  • Docker-in-Docker — the default runner image doesn't include Docker. If you need to docker build inside a job, swap to an image that includes the Docker CLI + add a sidecar for the Docker daemon (or use kaniko / buildah which work in plain containers).
  • Persistent caches — runner pods are ephemeral. actions/cache@v4 still works (cache lives on GitHub-side), but local Docker layer caching, pip wheels in $HOME, etc. don't survive job-to-job.

For native macOS builds, keep using runs-on: macos-latest (and watch the GitHub minutes meter — those are billed at 10× the Linux rate). Most CI workloads (Go tests, ESLint, Playwright headless, docs build) run fine on Linux ARC pods.

Rotation

  • GitHub App private key: in the App settings page, generate a new key + delete the old. Push the new PEM to GCP SM:

    gcloud secrets versions add tomoda-github-app-private-key \
      --project=development-485000 --data-file=new-key.pem
    

    Then either wait 1h (ESO sync) or force restart:

    kubectl rollout restart deploy -n arc-systems
    kubectl delete pod -n arc-runners -l app.kubernetes.io/component=runner-set-listener
    
  • GitHub App itself: deleting the App invalidates all credentials immediately. Re-create per the bootstrap doc, push new IDs + key to GCP SM.