GitHub Actions macOS runners | Elastic pool vs always-on (2026)

When iOS and macOS teams ship on tight cadences, CI rarely looks like a flat line—you get short windows where many workflows stack in the same hour. GitHub-hosted macOS minutes can drain fast, so orgs add self-hosted runners on Apple hardware. The next fork matters: an elastic pool of cloud Macs that scale in and out, or always-on nodes that stay warm. The answer is queue shape, acceptable latency, and where caches live—not vendor slogans.

p95

Queue + cold start budget

Duty

% busy hours vs idle

Cache

Hit rate target on main

Model the peak before you buy capacity

Elastic pools win when busy minutes are sparse but concurrency spikes are tall: a few days per month where you need six runners, and the rest of the time two would suffice. Always-on nodes win when work arrives continuously—nightlies, per-PR matrix builds, and bots that must never wait for provisioning. Plot seven days of runner timestamps from your Actions logs: median queue depth, p95 time from queued to in_progress, and how often two jobs contend for signing assets on the same host. If p95 queue time already exceeds your acceptable “developer idle thumb-twiddling” budget, elastic scale-out only helps if the added machines become ready faster than the backlog grows—otherwise you are paying for cold starts on top of queueing.

For App Store week pressure and “rent vs buy” framing, we wrote a separate matrix you can reuse as a finance checklist: Emergency builds & App Store review in 2026: buy a Mac or rent a cloud Mac by day or week?

Latency is three numbers, not one slogan

Separate control-plane latency (runner picks up the job), data-plane latency (git fetch, cache restore, artifact upload), and tool latency (Xcode compile). Elastic pools often improve control-plane contention by adding labels, but if every fresh VM repeats a five-minute dependency bootstrap, your wall clock barely moves. Always-on runners amortize that bootstrap across hundreds of jobs—at the cost of idle power and drift risk if you do not pin images.

Network path matters: measure RTT and throughput from the runner to your Git host and to any remote cache (S3-compatible, Artifactory, or Actions cache). A slow TLS handshake to a far-away region shows up as “slow Xcode” in screenshots. For headless persistence patterns—see Deploying OpenClaw on a cloud Mac in 2026: macOS checks vs Linux VPS, launchd persistence, and a reproducible FAQ.

Definition check

“Elastic” here means capacity you can add within minutes and retire when idle—not a magic zero-queue switch. If your provider cannot allocate another Mac before your spike ends, you still need always-on baseline capacity.

Caches: sticky disk vs shared object store

Apple builds are cache-sensitive. DerivedData, CocoaPods, and SwiftPM artifacts dominate restore time. Elastic nodes that discard disks on shutdown should push caches outward—versioned buckets or a read-heavy network share—with strict keys tied to Xcode minor version and lockfile hashes. Always-on nodes can keep hot caches locally, but you must evict deterministically so one branch does not poison another. In both models, treat cache misses as part of SLO budgeting, not as rare accidents.

Signing and isolation

Self-hosted runners sharing one user account multiply keychain and provisioning-profile race conditions. Prefer one runner identity per machine or ephemeral users if your orchestration supports it; never parallelize release signing on a shared home directory without isolation guarantees.

Decision matrix at a glance

Use the table as a first-pass filter; then validate with the parameter checklist below.

Signal	Favor elastic pool	Favor always-on nodes
Duty cycle	Low average utilization, rare tall spikes	High sustained utilization across time zones
Queue SLO	Spikes tolerable if extra machines appear quickly	Strict pickup latency (<30s) most of the day
Cache strategy	Remote cache with good hit rate on cold runners	Large local SSD, predictable warm paths
Compliance	Ephemeral disks meet retention policies	Long-lived audit trail on fixed hosts

Executable parameter checklist (copy into runbooks)

These are the knobs we actually write into YAML, Terraform variables, or internal wiki tables when we size a fleet. Adjust names to match your orchestration layer; the intent is what matters.

Workflow + runner sizing (reference)

# Workflow concurrency (serialize noisy paths)
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

# Matrix fan-out ceiling (avoid stampeding caches)
strategy:
  max-parallel: 4

# Runner fleet (document in ops repo, not only UI)
baseline_always_on_runners: 2   # minimum hot capacity
burst_elastic_runners_max: 8    # provider-supported ceiling
idle_shutdown_minutes: 45       # elastic only; avoid thrash

# Cache keys (must include toolchain + lockfiles)
cache_key_prefix: xcode-16_2-spm-${{ hashFiles('**/Package.resolved') }}

# SLO targets (alert when exceeded)
queue_pickup_p95_seconds: 60
cache_restore_p95_seconds: 120

Hybrid that survives real orgs

Keep a small always-on tier for default branches and release tags, and route experimental workflows to elastic labels. Watch cache hit ratio on elastic runners separately—if it collapses, you are just shuffling queue time into restore time.

Weekly, compare billed runner hours to merged PR throughput. If cost rises without shipping speed, tighten concurrency groups or cache keys before you add metal.

Model the peak before you buy capacity

Latency is three numbers, not one slogan

Caches: sticky disk vs shared object store

Decision matrix at a glance

Executable parameter checklist (copy into runbooks)

Run those self-hosted macOS runners on hardware that stays out of the way

Right-size macOS CI before the next release spike