Three pool models · Queue SLOs · Symptom matrix · Six-step runbook · FAQ
Tech leads, DevOps owners, and platform leads who must defend CI queue SLOs often debate during scaling: dedicated nodes versus shared rotation, when to add burst capacity, and how long p95 wait means a real capacity deficit. This article names who faces which problem when a Mac Mesh links remote Macs yet lacks a shared vocabulary for isolation, idle cost, and queue observability; then states the outcome: use three pool boundaries, 13-week rolling SLOs, and a symptom decision matrix so adding machines becomes auditable instead of intuitive. You get a hidden-tax breakdown, three-pool table, SLO metrics, six-step runbook, hard thresholds, and sizing matrix. Cross-read seat locks and mutex, Merge Queue routing, buy-vs-rent TCO, shared build pool topology, artifact fan-out, and private mesh access; order nodes via the order page and help center.
Linking remote Macs into a mesh does not automatically yield contract-grade CI capacity. These five recurring taxes slow delivery more than adding another runner.
Measuring success by machine-hours: counting uptime while ignoring successful builds per month and queue p95—so dedicated nodes sit idle yet look “sufficient.”
No isolation SLO on shared pools: DerivedData, keychains, and login sessions bleed across tenants as noisy neighbors instead of traceable misconfigurations.
Burst without caps: elastic peaks become unauditable month-end surprises, and sharing labels with Merge Queue amplifies starvation.
Label mismatch masquerading as shortage: deep queues with runner CPU under 40% usually mean job→runner affinity errors, not a true capacity deficit.
Cross-region RTT plus seat hoarding: network-heavy steps retry more above ~150ms RTT while seats stay booked without entering the SLO denominator.
Deliverables: three-pool dictionary, 13-week wait/complete dashboards, shared-pool isolation counters, and a one-page burst preemption policy. Skip any of these and “scale the mesh” should not be an OKR.
Next: a table aligning Dedicated, Shared, and Burst by lease semantics, billing unit, and interruptibility.
These pools are not marketing labels—they are lease semantics, billing units, and interruptibility combined. Print the matrix and pick one default for the quarter.
| Pool | Lease & isolation | Cost profile | Best for | Main risk |
|---|---|---|---|---|
| Dedicated | Single-tenant lease; best cache locality | High idle cost; predictable bills | Release trains, signing hosts, compliance | Feels like CapEx when underutilized |
| Shared rotation | Time-sliced multiplex; needs seat locks | Often lowest cost per successful build/month | Daily PRs; default for small teams | noisy neighbors |
| Burst | Preemptible; short lease | Peak delay traded for marginal cost | Timezone batches, release weeks | Runaway bills without caps |
Bottom line: every job class must answer interruptibility and weeks of cache locality needed. If not, do not enter shared rotation.
Section three aligns queue SLOs with the symptom matrix so label mismatch is not mistaken for shortage.
Minimum metric set (13-week rolling): Wait SLO (enqueue→assign p50/p95/p99), Complete SLO (standard job wall time), Isolation SLO (shared-pool failures from neighbors).
| Symptom | Runner CPU | Likely cause | First action |
|---|---|---|---|
| p95 wait >15 min sustained | >78% | Real capacity deficit | Add Dedicated or split pool |
| High wait, peaks only | <40% | Label mismatch | Audit job→runner affinity |
| Queue oscillates hourly | 55–70% | Timezone batches | Time-shift jobs or burst pre-book |
| Disk latency alerts | any | DerivedData churn | Cache key generation |
After aligning seat locks, you can split wait into real queueing versus lock starvation.
Freeze the three-pool dictionary: document lease, billing, and interruptibility.
Export a 13-week baseline: segment p95 by workflow.
Bind runner labels: split heavy Xcode from light lint.
Write burst preemption: bill cap plus interruptible job allowlist.
Private mesh and artifacts: see private mesh topology.
Review preemption: choose Dedicated or continue burst.
wait_p95_business_hours_minutes complete_p95_release_train_minutes shared_pool_neighbor_fail_rate burst_preempt_count / burst_successful_builds
| Size × volatility | Default pool | Burst role | Upgrade signal |
|---|---|---|---|
| Small team · low volatility | Shared | Optional | 13-week p95 breach |
| Small team · high volatility | Shared + Burst | Release-week overflow | Preemption rate >20% |
| Platform · multi-region | Dedicated + Shared | Interruptible jobs only | Isolation SLO breach |
Once pools and SLOs live in repo assets, laptops doubling as CI or verbal shared machines rarely survive audits. For teams that need iOS CI and seat isolation on contract-grade cloud Mac Mini capacity, VpsMesh Mac Mini cloud rental is usually the better fit. See the pricing page, help center, and order page.
Most 5–15 person teams start on Shared with seat caps and lock TTL; move to Dedicated for release trains. See the seat-lock article.
Not if preemption caps and billing rules are in the change ticket; burst only absorbs interruptible overflow.
When p95 exceeds threshold for 13 weeks and CPU stays above ~78%, or isolation SLO breaks—add dedicated nodes. See the pricing page.