3 Cloud Macs, 500 iOS CI builds a day: our GitHub Actions Mac Runner setup (2026)

Q: How many Macs for 500 iOS builds per day?

Most teams need three Apple Silicon Cloud Macs (two Fast Mac runners + one Release runner) once jobs are tiered and caches are warm.

Q: How many jobs can a GitHub Actions Mac runner run at once?

On 16GB M4: two light jobs or one Archive—not two Archives in parallel.

Bottom line first: roughly 500 iOS CI builds per day does not require eight Mac minis for most product teams—three Apple Silicon Cloud Macs are enough if you tier GitHub Actions jobs and isolate fast PR work from Release work with dedicated self-hosted Mac runners, instead of letting every machine run three full Archives at once.

Three months ago we took over Xcode CI for an iOS engineering group. Their situation at handoff looked like this:

18 developers
2 apps in a shared monorepo
GitHub Actions as the only CI system
about 500 macOS jobs per day (PR checks, unit tests, nightly release)

The procurement plan on the table said: buy eight Mac minis for an on-prem runner rack. We split two weeks of workflow logs by job type first. The real mix looked nothing like the gut assumption—roughly 70% was PR validation and integration builds, 20% was Simulator unit tests, and only about 10% was Archive plus upload (if you book TestFlight upload separately, Archive and Upload split further, as in Figure 1).

What we shipped: 2 Fast Mac runners (PR + unit tests) + 1 Release Mac runner (Archive, notarization, TestFlight)—three Cloud Mac nodes carrying all iOS CI/CD. Below is the full capacity model, and why we skipped Xcode Cloud and full Bitrise in favor of self-hosted runners on dedicated hardware.

Figure 1 · Daily ~500 iOS CI build mix (two-week GitHub Actions sample for this team)

PR Build

65%

Unit Test

20%

Capacity math: from 91 Mac-hours down to 63

Five hundred jobs spread across twenty-four hours averages about twenty-one per hour. That average is misleading. What breaks SLOs is peak: US/EU morning merge windows, the day before a release when every squad opens a PR, and “fix the red build” storms that run three to five times the midnight baseline. Capacity planning has to model queue depth at peak, not mean load at 3 a.m.

The second variable is per-job Mac time. Simulator unit tests often land at four to eight minutes. A PR integration build with dependency resolution and scheme graph work is commonly twelve to twenty minutes. Release Archive plus notarization plus App Store Connect upload routinely runs twenty-five to forty-five minutes on a cold tree. For this customer we used two-week P50s as a planning shorthand: six minutes for light jobs, thirty-five minutes for heavy jobs. If you wrongly assume thirty percent of five hundred runs are Archives, the spreadsheet says you need eight Macs. After you split the taxonomy, heavy jobs are a thin slice of five hundred—not the majority.

Without caching, a conservative split of 325 light + 175 heavy jobs implies about 91 Mac-hours per day. Three Macs offer seventy-two physical Mac-hours in a calendar day (24 × 3), so the naive model looks under-provisioned. That is the trap that sells eight Mac minis.

Turn on DerivedData and SPM caches, serialize L2 on a dedicated release node, and route with runner labels. Light jobs fell to about four minutes average; heavy jobs on a warm release path averaged about twenty-two minutes. Total demand dropped to roughly 63 Mac-hours. With a peak factor around 1.3, three Cloud Macs sit inside an acceptable queue band. That is the arithmetic behind “three is enough”—not optimism, but job mix plus cache plus queue isolation.

We also sanity-checked utilization: two fast nodes at two concurrent light jobs each gives four parallel PR/test slots during business hours. One release node at one L2 job respects Keychain and codesign reality. When peak factor spikes above 1.5, you feel it in L1 queue_wait first; L2 should stay bounded if you refuse to run double Archive on sixteen gigabytes of unified memory.

Teams shopping for a GitHub Actions Mac runner fleet often ask whether M4 Pro on the release node is mandatory. For this workload, M4 on fast nodes plus M4 Pro on release helped notarization and export steps that spike single-core and disk IO; fast nodes stayed on base M4 because L1 jobs are parallel but individually smaller. The lesson generalizes: buy core count and memory shape per tier, not one SKU for all three boxes.

Translate jobs into Mac-hours weekly, not monthly averages. Five hundred per day is 3,500 per week; at six minutes light and thirty-five heavy uncached, you are already past three Macs. Cached P50s are what make three nodes credible. Log run_duration by workflow name for fourteen days before you sign a hardware PO or a Cloud Mac annual plan.

Figure 3 · Daily macOS Mac-hour demand (same 500 jobs/day model)

No cache

91 hrs

With cache

63 hrs

Xcode CI tiers: do not put Archive and PR in one Mac runner queue

Pipelines that survive 500 iOS builds per day on GitHub Actions almost always implement three Xcode CI tiers. Treat tiers as separate queues with labels, not as one FIFO pool where a forty-minute Archive blocks a six-minute lint job.

L0: SwiftLint, single-module compile checks—route to macos-fast.
L1: PR integration build without Archive—same Fast Mac runner pool; each machine can run two light jobs in parallel when memory headroom allows.
L2: Archive, notarization, TestFlight upload—route to macos-archive; at most one concurrent job per machine.

GitLab CI, Buildkite, and Jenkins macOS agents follow the same idea: tag routing instead of a single “macos” label that means everything. Buildkite burst elasticity and artifact retention patterns are covered in Buildkite self-hosted macOS agents on Cloud Mac. If you are still deciding buy vs rent for a runner pool, pair this article with your internal FinOps model for CapEx Mac minis vs OpEx Cloud Mac.

Why tiers matter for searchers comparing Cloud Mac CI to hosted macOS: hosted runners are homogeneous; your self-hosted pool is not. You manufacture heterogeneity on purpose—fast silicon for throughput, one release-shaped machine for deterministic signing and upload.

In workflow YAML, express tiers with if: guards and path filters so monorepo changes do not fan out full iOS graphs on docs-only commits. L0 should be sub-five minutes when caches are warm; if L0 routinely exceeds ten minutes, fix scheme selection before you add a fourth Mac. L1 should produce an .app or test bundle without Archive; developers learn to trust green L1 as merge-ready while L2 remains nightly or release-branch gated.

Matrix builds multiply job count without multiplying Mac-hours if matrices are mostly Simulator-only on fast labels. Watch for hidden L2 triggers inside matrix legs—one mistaken xcodebuild archive in a matrix template can dominate Mac-C for hours and look like a capacity emergency.

How three Cloud Macs divide work (with topology)

We deployed a static three-node layout (names are flexible; roles should not be):

Node	Runner label	Role	Concurrency
Mac-A	`macos-fast`	PR build, unit test	2× L0/L1
Mac-B	`macos-fast`	Symmetric to Mac-A, fast queue	2× L0/L1
Mac-C	`macos-archive`	Archive, notarization, upload	1× L2

Figure 2 · Three-node queue topology (GitHub Actions self-hosted runner)

Mac-AFast Mac runner

Mac-BFast Mac runner

Fast queuePR build · unit test · L0/L1

Mac-CRelease Mac runner

Archive queueArchive · notarization · TestFlight upload

Two fast machines carry throughput; one release machine carries deterministic delivery. On ship day, if L2 queue depth exceeds SLO, you may temporarily add macos-archive to Mac-A—but disable its L0 parallelism first. Otherwise Keychain contention and DerivedData lock fights produce flaky signatures that look like “Xcode is random” in Slack. Keep Xcode minor versions aligned across all three nodes per Apple Xcode release notes; a one-node drift during a dot release week is a common source of link errors on cache hit.

GitHub Actions Mac runner concurrency on 16GB M4

For light Xcode CI: two jobs per machine. For full Archive: one job per machine. Running two Archives in parallel on one 16GB Mac is the top reason a 500-jobs-per-day plan looks “saturated” when the real issue is mis-parallelism.

Why GitHub Actions Mac runners queue (and how to fix it)

Many teams start with GitHub-hosted macOS runners minute bundles, then hit queue_wait spikes when PR volume rises. The bottleneck is usually not “GitHub is slow” in isolation. It is a stack of policy and workflow choices: (1) organization-level macOS concurrency caps; (2) workflows that Archive on every PR by default; (3) no split between fast and slow self-hosted Mac runner labels, so light jobs sit behind forty-minute release work.

Before this customer moved to three dedicated Cloud Mac nodes, hosted-runner queue_wait P95 exceeded forty minutes during EU morning merges. After fast/archive dual queues on self-hosted hardware, L1 P95 dropped under eight minutes for integration builds. L2 remained predictable because only Mac-C accepted archive-tagged jobs.

A practical hybrid for US/EU product teams: three Cloud Macs as baseline self-hosted pool, plus hosted runners only for L0 during open-source contribution weeks or rare spikes. Cost stays bounded; signing secrets and Match repos do not migrate weekly. That pattern is why “500 builds per day” discussions on forums often confuse hosted queue pain with true capacity shortage—fix the taxonomy first, then count Mac-hours.

If you are evaluating GitHub Actions Mac runner labels, document them in the repo README: which workflows require macos-fast, which require macos-archive, and which are allowed on hosted macos-latest as overflow. Developers stop “just adding runs-on macos” once labels are socialized.

Org admins should set macOS concurrency explicitly and communicate it in #ios-ci. When queue_wait rises, the first question is “did we spawn extra workflows?” not “did the provider fail?” We saw this team’s spike traced to a well-meaning “run all schemes on PR” change that doubled L1 count overnight—labels were fine; workflow fan-out was not.

Self-hosted runners also shift security responsibility: patch macOS monthly, rotate registration tokens, and restrict who can push workflow files that touch signing steps. Cloud Mac providers that offer snapshot or reinstall APIs make compromised-runner recovery faster than shipping a Mac mini back to the office.

Self-hosted runner vs Xcode Cloud

Xcode Cloud fits teams deeply tied to Apple’s pipeline who want minimal runner ops. Self-hosted Mac runners fit organizations that need queue control, custom cache keys, and the option to run Archive beside internal Jenkins or Buildkite. Comparison at a glance:

Dimension	Xcode Cloud	Cloud Mac + self-hosted runner
Billing	Minute bundles + concurrency caps	Cloud Mac subscription / daily rent, Mac-hours predictable
Queue	Platform-wide scheduling	You own fast/archive labels
Secrets	ASC integration is streamlined	Match / Keychain runbooks are yours
Best fit	Low-frequency release, little customization	500+ iOS CI/CD jobs per day with SLO targets

Signals that minute bundles are no longer comfortable—switchover FAQs and archive handoff patterns—are in Xcode Cloud minute caps and Cloud Mac archive FAQ.

Why this team did not choose Xcode Cloud

They ran a serious Xcode Cloud evaluation and still landed on GitHub Actions + self-hosted runner. Core reasons: (1) backend and Android already live in GitHub—splitting CI doubles YAML and secret rotation; (2) they need custom cache keys and monorepo matrix dimensions Apple’s defaults do not expose the same way; (3) release-week job counts exceed the minute bundle comfort zone, and queue behavior is not self-service tunable. Xcode Cloud is not “bad”; it misaligns with a goal stated plainly in their RFC: ~500 jobs per day with owned queue SLOs.

Another subtle point for US/EU buyers comparing Cloud Mac CI: Xcode Cloud optimizes Apple-native happy paths. This team’s happy path is “PR fast, Archive rare, upload serialized”—which maps cleanly to runner labels, not to a single Apple-managed queue where PR and Archive compete anonymously.

They still use App Store Connect and notarytool the Apple way; only the compute schedule moved under their control. That hybrid—Apple distribution rails, owned Mac runners—is common among mid-size product companies that outgrew hobby CI but are not ready to operate a twelve-node metal fleet.

Bitrise vs self-hosted Mac runner cost intuition

Bitrise and similar mobile DevOps SaaS products remove agent patching and offer polished workflows. Pricing is concurrency-tiered. For eighteen developers, two apps, and ~500 jobs per day, annual spend often exceeds a three Cloud Mac subscription once you include peak concurrency add-ons—and Archive parallelism may still hit plan ceilings.

Bitrise shines when you will not touch runner plumbing and ship infrequently. Teams willing to invest two weeks wiring self-hosted Mac runners on Cloud Mac typically see node costs pay back in three to six months versus premium SaaS tiers. If you already use Bitrise, a phased move—keep orchestration, move only L2 to a dedicated release Cloud Mac—is viable; you do not need a big-bang migration to validate queue wins.

Buildkite Mac agent: pros and cons

Buildkite keeps queues and UI in the cloud while agents run on your metal. Burst scaling and artifact retention are excellent; the tradeoff is another orchestration surface small teams may resent for only three Macs. This customer PoC’d Buildkite: burst behavior was best-in-class, but engineering standardized on native GitHub Actions self-hosted runners because everyone already lives in Actions YAML.

If you already run Buildkite, mount three Cloud Mac agents with the same fast/archive tag strategy as this article’s topology. The Buildkite Cloud Mac burst guide walks artifact upload paths and queue SLO metrics that transfer directly to Actions with different metric names.

Cloud Mac vs local Mac mini CI

Eight on-prem Mac minis mean high CapEx, depreciation schedules, rack power, and datacenter or closet access audits for signing keys. Three Cloud Mac nodes mean predictable OpEx, day-rent PoC before annual commit, and region choice near your Git remote or artifact bucket.

Local CI wins when one machine compiles more than fourteen hours a day every day and specs stay frozen for years. Cloud Mac CI wins when peaks swing with releases, contractor weeks, or App Review seasons that add notarization retries. This team kept two office Macs for daily dev work and moved heavy Xcode CI to the cloud so eight minis would not idle fifty weeks a year.

For short-cycle iOS teams comparing CircleCI hosted macOS to dedicated runners, see CircleCI cloud macOS vs self-hosted runner SLO FAQ. The SLO framing—queue_wait P95 by tier—is the same even when the vendor logo changes.

Queue SLO: scale on data, not gut feel

Instrument at minimum: queue_wait_seconds (P95), run_duration bucketed by L0/L1/L2, cache_hit_ratio, and l2_concurrent (should rarely exceed three org-wide). Example thresholds we used: L1 queue P95 under eight minutes; L2 queue P95 under twenty-five minutes. If L2 P95 is bad three days in a row while cache hit is already above sixty percent, discuss a fourth release node—do not buy hardware on one red Monday.

Export dashboards to the same place on-call already looks—PagerDuty or Slack webhook from your metrics store. Tie “failed job” alerts separately from “slow queue” alerts; teams that only page on red builds miss the week queue_wait crept from six to twenty-two minutes before developers noticed.

Record runner version and Xcode build number as labels on every metric series. During Xcode 16.x dot releases, regressions often cluster on one node that upgraded early. Roll upgrades Mac-C first (release), then fast nodes after a green L2 smoke Archive.

SLO reviews should be weekly for the first month after migration, then monthly. Bring product release calendar into the review: if marketing adds a second ship day, model job count before hardware—not after Twitter complaints.

Caching: the lever that unlocks 63 Mac-hours

Key DerivedData by branch + Xcode version. Bust SPM and CocoaPods caches when lockfiles change. Fast nodes share read-heavy cache volumes; the release node keeps local NVMe for L2 DerivedData warmth across nightly Archives. Signing material lives in a vault path, never inside cache tarballs.

Cache keys must include macOS and Xcode minor versions. Otherwise you get “cache hit” builds that fail at link time after a silent Xcode bump on one runner. We also excluded main release keys from PR branches to prevent pollution, and ran a weekly cache prune job on fast nodes only.

GitHub Actions cache entries and self-hosted rsync volumes can coexist: use Actions cache for SPM tarballs shared across forks policy allows, and NVMe DerivedData on the runner for the hot paths Apple tools expect locally. Document maximum cache size per node so a rogue branch does not fill disk and break all four parallel L1 slots.

Measure cache_hit_ratio per tier. Below forty percent on fast nodes after two weeks usually means keys are too narrow or workflows bust cache too aggressively. Above eighty percent with rising link failures means keys are too broad—tighten Xcode patch in the key.

Minimal GitHub Actions self-hosted runner setup

Register three runners: macos-fast ×2, macos-archive ×1. Add concurrency on L2 so release jobs do not cancel each other mid-notarization:

GitHub Actions · Release Mac runner

concurrency:
                group: ios-archive-${{ github.ref }}
                cancel-in-progress: false
                jobs:
                archive:
                runs-on: [self-hosted, macos-archive]
                steps:
                - uses: actions/checkout@v4
                - run: xcodebuild archive -scheme App -archivePath build/App.xcarchive

Use a dedicated macOS user on the release node with Match profiles installed. After reboot, run an unlock/login helper before enabling unattended night queues. Unattended Archive and export flags are documented in xcodebuild and Fastlane; treat those docs as the source of truth for CLI changes each Xcode major.

On fast nodes, set --max-workers in the runner config to match your concurrency policy (two for L0/L1). On the release node, max workers stays at one regardless of CPU count—memory and Keychain dominate, not core count.

Install Rosetta only if legacy dependencies require it; otherwise skip to reduce variance. Keep Command Line Tools and Xcode.app paths consistent in /.github/workflows via xcode-select in a setup step that fails fast when drift is detected.

For TestFlight upload, isolate API key permissions per app in a monorepo—two apps sharing one key with overly broad roles is a common audit finding. The release node should be the only machine that holds upload credentials in a login keychain; fast nodes never need ASC upload keys.

Release day: when 500 jobs become 650

Playbook order: (1) pause non-critical L0 workflows; (2) burst hosted runners for L1 only if self-hosted fast queue P95 blows past SLO; (3) add a fourth Cloud Mac on daily rent for forty-eight hours if L2 backlog persists. Do not permanently raise L2 to two parallel Archives per machine—you pay with random notarization failures and angry release managers.

Communicate the playbook in #releases before ship week. Engineers tolerate slower PR checks for one day if Archive and TestFlight upload stay green.

Pre-warm release caches the afternoon before ship: run one dry L2 on release/* so Mac-C DerivedData is hot before the tagging flood. If App Store Connect is slow globally, queue_wait on upload steps rises even when Mac-hours are fine—do not misread that as “need more Macs.”

When you actually need a fourth Mac

L2 queue P95 stays above forty minutes for a full week after cache optimization.
Monorepo grows past five apps sharing one release node and the nightly window is exhausted.
Policy insists on full Archive per PR—fix job structure first; buying metal without taxonomy change only buys pain.

Anti-patterns that fake a capacity crisis

Archive on every PR; one undifferentiated GitHub Actions Mac runner label; two Archives on one 16GB M4; cache keys without branch; dashboards that only show pass/fail without queue_wait—all of these make three Cloud Macs look “too small” and resurrect the eight–Mac mini spreadsheet. Fix the workflow graph before you fix procurement.

Other frequent mistakes: storing provisioning profiles on shared SMB mounts with latency spikes; running UI tests on the release node “because it is idle”; mixing beta and release Xcode on fast nodes for “just one experiment”; and ignoring cancel-in-progress on feature branches while leaving it false on release—each creates red herrings in capacity meetings.

When someone proposes “just add five more Mac minis,” ask for a fourteen-day histogram of job types and P95 queue_wait by label. If the histogram matches Figure 1 and L2 is ten percent, the proposal is usually emotion from one slow Friday, not math.

FAQ: short answers developers search for

How many Macs for 500 iOS builds per day?

With job tiers (PR / unit test / Archive split) and DerivedData caching in place, most teams need three Apple Silicon Cloud Macs: two Fast Mac runners + one Release Mac runner. If more than half your five hundred runs are full Archives, fix the pipeline or plan a fourth release node—not eight fast machines.

How many jobs can a GitHub Actions Mac runner run at once?

On 16GB M4: two light jobs (PR build, unit test) or one Archive job. Do not run two Archives in parallel long term.

Why not high concurrency on Archive?

Memory pressure triggers swap, disk queues stretch, and Keychain locks plus codesign contend—failures look like flaky timeouts, not reproducible compile errors.

Is Cloud Mac cheaper than Xcode Cloud?

For high-frequency long-term iOS CI/CD with dedicated self-hosted runners and stable signing secrets, a three-node Cloud Mac pool is usually more predictable than minute-metered Xcode Cloud. Low-frequency release teams optimizing for zero ops should still try Xcode Cloud first.

Bitrise or three Cloud Macs?

Bitrise saves ops time and is great for fast onboarding. At ~500 jobs per day with queue and cache control requirements, self-hosted Mac runners on Cloud Mac often cost less per year and let you define L2 concurrency yourself.