CI Is Dead and GitHub Hasn't Noticed | Agent Loop & Cloud Mac

Answers you may be looking for

GitHub Actions macOS runner is slow / stuck on Queued
iOS CI builds fail or keep retrying
Xcode build times out in CI but passes locally
Cloud Mac vs GitHub Actions — which to pick
CI pipeline results are unstable and hard to reproduce

If you run GitHub Actions or self-hosted macOS runners, the loudest complaints in 2026 are oddly specific: longer queues, drifting Xcode wall times, the same PR re-running workflows, and greens that flip red overnight. This post walks through four layers — concrete symptoms first, then how agents turn CI into a loop, then where Cloud Mac fits as the execution substrate. The headline’s “CI is dead” angle is saved for the FAQ; the body starts with failures you can match to your own job logs.

1 · Layer 1: Why GitHub Actions feels slower, noisier, and flakier in 2026

Across iOS, Flutter, and macOS teams we’ve worked with over the past six months, the feedback clusters into four problems — not “CI is obsolete,” but these symptoms showing up at once.

Queued

macOS jobs waiting in line

±40%

Xcode wall-time swing on same branch

3×+

Workflow runs per PR

1.1 macOS runner queue

On hosted macos-latest, time in Queued often exceeds actual build time. Open a job timeline: 40 minutes waiting, 12 minutes running — while the team is still tuning xcodebuild flags. The diagnostic is still wait time >> run time; see our macOS runner queue runbook. In 2026, queues stretch partly because macOS concurrency caps haven’t kept up with job volume, and partly because Archive and heavy jobs hog slots, pulling fast-feedback PR checks into the same line.

1.2 Xcode build time variance

Re-run the same commit in CI: Xcode compile takes 18 minutes once, 31 minutes the next, while a local Mac stays around 15. Usual suspects stack up — cold runner startup, DerivedData cache miss, CocoaPods re-resolve, Xcode patch mismatch between image and laptop. Each alone looks like “caching isn’t wired right.” When every retry hits a cold environment, variance turns into “we can’t trust CI.”

1.3 Retry count spiraling

Beyond explicit retry-on-failure in workflows, teams now see implicit retries: an agent or bot reads logs → patches → pushes → triggers Actions again. A PR that used to mean “two human pushes” becomes “eight machine pushes,” and minutes billing plus queue pressure rise together. You blame flaky tests; the real issue is the same PR running multiple different commits through CI.

1.4 Flaky CI: green sometimes, red sometimes, unreproducible

Classic flaky: intermittent test timeouts, simulator boot failures, keychain locks on signing. In 2026 there’s another layer: each agent round ships a different diff, so you can’t tell whether the test is flaky or the agent broke something else. Release engineers hate this line: “CI was green last night; the same tag is red this morning.”

What you see	Often mistaken for	Worth checking first
Job stuck in Queued	GitHub outage	macOS concurrency, Archive on PR paths
Xcode time swings	Project got bigger	Caches, image Xcode version, cold start
Many runs on one PR	Developers pushing too much	Agent/bot retries, missing `concurrency`
Green then red	Bad tests	Commit consistency per run, env drift

2 · Layer 2: CI didn’t break — it got loopified

Fix queue, caches, or tests in isolation and you usually buy one round of relief. Look at all four symptoms together and a shared thread appears: CI is no longer a one-shot check; it’s a loop of fail → fix → re-run. Before agents, that loop lived in engineers’ heads (change code, push again). Now it’s wired into infrastructure — models read logs, emit patches, and trigger workflows automatically.

The chain is already familiar:

PR opens → CI fails (tests or compile)
Agent reads Actions logs → generates a fix commit
Push → new workflow → may fail again → fix again
Repeat until green, or until a human stops it / tokens run out

It’s still called GitHub Actions / CI on the invoice; behavior-wise it’s an agent retry loop. You’re not debugging “one failed build” anymore — you’re looking at a probability distribution over many attempts. That’s how layer 1 shows up: queues stuffed with multi-round jobs, Xcode times swinging on cold starts, flakiness from shifting code-and-environment combos.

One useful framing (concrete, not philosophical): classic CI assumes a deterministic path — same commit, same environment, same outcome. After agent loops, the path is probabilistic — which attempt turns green, how many commits landed in between, whether Archive fired by accident. You don’t need a new platform tomorrow; this explains why “just add caching” doesn’t close every gap.

Dimension	Traditional CI (single pass)	Loopified CI (agent retry loop)
Trigger count	Human pushes, few runs	Auto re-runs on failure, ×N
Code revision	PR head stays fixed	Commits change through the loop
Failure meaning	Current code is wrong	Maybe “not enough tries yet”
Cost	minutes × rate	minutes + tokens + cold starts

GitHub keeps improving workflows, runners, and caching — all aimed at single deterministic jobs. When a repo already runs agent loops by default, those wins help but don’t cancel the multiplier of job count × retry rounds. That’s what “GitHub hasn’t noticed” points at: product story still says CI/CD; usage has drifted toward an auto-fix loop engine.

3 · Layer 3: Structure shift — from pipeline to retry loop + execution substrate

Align layer 1 symptoms with layer 2 cause and the structural change is one sentence: CI went from a linear pipeline to a feedback-loop execution system.

Fig. 1 · Loopified CI: agent decisions + runner execution + feedback loop

Developer intentiontests · lint · release boundaries

AI Agentread logs · patch · re-trigger

Execution substrateRunner / Cloud Mac · compile · sign

Retry loopfail → fix → re-run

Old model: Code → Build → Test → Result — one failure stops the line. New model: Code → Agent → Modify → Execute → Retry → … → Result — failure is part of the loop, not the end. The green check on GitHub still feels good; the meaning shifted — maybe attempt four went green, and the commit in between isn’t what you reviewed.

Here’s a term that’s engineering, not hype: execution substrate — agents can rewrite code freely, but compile, sign, and upload must land on a stable, reproducible macOS surface. Serverless jobs are too short to carry state; local Macs sleep and upgrade Xcode; hosted runners queue and drift. Cloud Mac fills that layer: always on, toolchain pinned, environment snapshot-able — not “remote desktop,” but the one layer in a retry loop that should stay as deterministic as possible.

Control moves too: humans used to write workflows and machines obeyed; now humans set boundaries (what enters a PR, what may Archive, max retry rounds) while agents explore inside those rails and humans sign off on the outcome. macOS/iOS teams feel this hardest — signing and Archive shouldn’t spin forever inside a retry ring, or green no longer means shippable.

Core takeaway (layer 3)

CI didn’t vanish — it became a loop execution space. Agents decide what to change and how many tries; runners / Cloud Mac decide where it runs and whether the environment drifts. Split those layers and queue pain, Xcode variance, and flakiness start to make sense together.

4 · Layer 4: Stabilize the surface with Cloud Mac (actionable now)

You don’t need GitHub to rewrite its narrative or rip out Actions. For loopified CI, the four moves we see validated most often on VPSSpark — and where Cloud Mac beats hosted runners — are below.

4.1 Split pools: limit retry blast radius

Run L0/L1 on PRs only (analyze, unit tests, simulator builds); keep Archive off PR paths. Ship IPA and notarization on main/tags in an isolated pool. However wild the agent loop gets, a 35-minute Archive won’t clog the fast pool and queue everyone. Hard rules and examples: CI hard rules; Flutter dual-pool topology: 2 Cloud Macs — fast vs archive.

4.2 Warm environment: don’t cold-start every retry

Persist PUB_CACHE, DerivedData, and Pods download caches; keep runners booted and online 24/7. The agent’s second retry shouldn’t spend 15 minutes on pod install again — or Xcode variance gets misread as “the project slowed down.” Cloud Mac buys environment retention, not just CPU minutes per job.

4.3 macOS execution substrate: pin the toolchain

Pin Flutter/Xcode majors via image or fvm; physically isolate Archive machines from the fast pool — no shared DerivedData. Every retry round should run on the same Distribution cert and CLT set — that’s what makes iOS CI auditable. Self-hosted boundaries: GitHub docs on self-hosted runners.

4.4 Retry isolation: cap the loop

Use concurrency in workflows to cancel stale runs on the same PR; set separate timeouts and max push counts for agent triggers; restrict signing machines to the release pool. Syntax reference: workflow syntax docs. Goal isn’t to kill agents — it’s to keep the loop inside guardrails.

PoC suggestion (1–2 days)

Start with one Cloud Mac as a macos-fast warm pool; leave Archive on hosted runners for now. Watch P95 queue and Xcode wall-time variance for three days. Add a second Archive machine once the fast pool is stable — same “start with two” series, motivation extended from queue pain to agent retries.

5 · FAQ

Is “CI is dead” just clickbait?

The title is loud, but the pattern is real: your pipeline may stay green and PRs still merge, while the same code and checks no longer produce a stable, predictable path or outcome. “CI is dead” doesn’t mean GitHub Actions is unusable — it means the classic assumption that build and verify are a deterministic process breaks when agents keep changing code and re-firing workflows. A fairer line: CI is still here; the semantics moved from continuous integration toward continuous attempt.

Will GitHub change the product for this?

Yes, gradually — probably slower than usage is shifting. You’ll still see workflow tweaks, runner capacity, cache improvements — all serving traditional CI. Agent-oriented pieces (sandboxing, call audit, token billing) will likely land incrementally rather than as an overnight narrative rewrite. Teams don’t have to wait for official definitions: split PR/Archive pools, cap retries, lock signing machines on today’s Actions — that’s the low-cost guardrail for the agent era.

How is Cloud Mac different from hosted macOS runners?

Short version: hosted runners rent execution time for one job; Cloud Mac buys a long-lived stable environment. Occasional releases, tolerable queue, short jobs — hosted macos-latest is often enough. Agent loops or high-frequency iOS CI need Xcode, certs, and caches resident, fast and Archive pools physically separated, and environments that don’t sleep or silently upgrade. Cloud Mac pays off there — especially for signing, Archive, and notarization, where every retry round should hit the same key and toolchain, not a cold-start lottery.

How does this relate to OpenClaw / local agents?

No conflict — different layers. OpenClaw, Cursor Agent, local Copilot handle what to change and how to orchestrate tasks (gateway / scheduling). Runners / Cloud Mac handle where and in what environment builds run — compile, test, sign, upload. Agents can patch locally or loop in CI; either way, macOS builds eventually need a reproducible execution surface. Design orchestration and execution separately, or you get “smart agent, unfamiliar CI environment every time.”