When iOS and macOS teams ship on tight cadences, CI rarely looks like a flat line—you get short windows where many workflows stack in the same hour. GitHub-hosted macOS minutes can drain fast, so orgs add self-hosted runners on Apple hardware. The next fork matters: an elastic pool of cloud Macs that scale in and out, or always-on nodes that stay warm. The answer is queue shape, acceptable latency, and where caches live—not vendor slogans.
Model the peak before you buy capacity
Elastic pools win when busy minutes are sparse but concurrency spikes are tall: a few days per month where you need six runners, and the rest of the time two would suffice. Always-on nodes win when work arrives continuously—nightlies, per-PR matrix builds, and bots that must never wait for provisioning. Plot seven days of runner timestamps from your Actions logs: median queue depth, p95 time from queued to in_progress, and how often two jobs contend for signing assets on the same host. If p95 queue time already exceeds your acceptable “developer idle thumb-twiddling” budget, elastic scale-out only helps if the added machines become ready faster than the backlog grows—otherwise you are paying for cold starts on top of queueing.
For App Store week pressure and “rent vs buy” framing, we wrote a separate matrix you can reuse as a finance checklist: Emergency builds & App Store review in 2026: buy a Mac or rent a cloud Mac by day or week?
Latency is three numbers, not one slogan
Separate control-plane latency (runner picks up the job), data-plane latency (git fetch, cache restore, artifact upload), and tool latency (Xcode compile). Elastic pools often improve control-plane contention by adding labels, but if every fresh VM repeats a five-minute dependency bootstrap, your wall clock barely moves. Always-on runners amortize that bootstrap across hundreds of jobs—at the cost of idle power and drift risk if you do not pin images.
Network path matters: measure RTT and throughput from the runner to your Git host and to any remote cache (S3-compatible, Artifactory, or Actions cache). A slow TLS handshake to a far-away region shows up as “slow Xcode” in screenshots. For headless persistence patterns—see Deploying OpenClaw on a cloud Mac in 2026: macOS checks vs Linux VPS, launchd persistence, and a reproducible FAQ.
Caches: sticky disk vs shared object store
Apple builds are cache-sensitive. DerivedData, CocoaPods, and SwiftPM artifacts dominate restore time. Elastic nodes that discard disks on shutdown should push caches outward—versioned buckets or a read-heavy network share—with strict keys tied to Xcode minor version and lockfile hashes. Always-on nodes can keep hot caches locally, but you must evict deterministically so one branch does not poison another. In both models, treat cache misses as part of SLO budgeting, not as rare accidents.
Decision matrix at a glance
Use the table as a first-pass filter; then validate with the parameter checklist below.
| Signal | Favor elastic pool | Favor always-on nodes |
|---|---|---|
| Duty cycle | Low average utilization, rare tall spikes | High sustained utilization across time zones |
| Queue SLO | Spikes tolerable if extra machines appear quickly | Strict pickup latency (<30s) most of the day |
| Cache strategy | Remote cache with good hit rate on cold runners | Large local SSD, predictable warm paths |
| Compliance | Ephemeral disks meet retention policies | Long-lived audit trail on fixed hosts |
Executable parameter checklist (copy into runbooks)
These are the knobs we actually write into YAML, Terraform variables, or internal wiki tables when we size a fleet. Adjust names to match your orchestration layer; the intent is what matters.
# Workflow concurrency (serialize noisy paths) concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true # Matrix fan-out ceiling (avoid stampeding caches) strategy: max-parallel: 4 # Runner fleet (document in ops repo, not only UI) baseline_always_on_runners: 2 # minimum hot capacity burst_elastic_runners_max: 8 # provider-supported ceiling idle_shutdown_minutes: 45 # elastic only; avoid thrash # Cache keys (must include toolchain + lockfiles) cache_key_prefix: xcode-16_2-spm-${{ hashFiles('**/Package.resolved') }} # SLO targets (alert when exceeded) queue_pickup_p95_seconds: 60 cache_restore_p95_seconds: 120
Weekly, compare billed runner hours to merged PR throughput. If cost rises without shipping speed, tighten concurrency groups or cache keys before you add metal.
Run those self-hosted macOS runners on hardware that stays out of the way
Sizing elastic pools and always-on baselines is easier when the underlying Macs are predictable: native macOS, Homebrew and Xcode without emulation layers, and Apple Silicon memory bandwidth that keeps Swift and linker spikes from turning into swap storms. A Mac mini M4 class host draws on the order of ~4W at idle, stays quiet on a desk or rack shelf, and pairs well with long-lived launchd supervised runners.
For unattended CI, stability and security matter as much as peak GHz: macOS crash rates stay low across months of uptime, while Gatekeeper, SIP, and FileVault reduce the attack surface compared with typical Windows build VMs. That combination lowers midnight pages and keeps signing environments trustworthy.
If you are standardizing self-hosted Actions capacity for 2026 peaks, VPSSpark cloud Mac mini M4 plans are a practical place to prototype both elastic burst and always-on tiers — explore plans now and match runner policy to real queue data, not guesswork.