Should shared-pool DerivedData be isolated per tenant directory or per workspace hash?

Default to workspace-hash buckets bound to seat leases; trigger LRU on that bucket when the lease ends. For parallel branches see the worktree isolation article; do not let global ~/Library/Developer/Xcode/DerivedData grow without bounds.

After local cleanup, do we still need golden-image drift checks?

Yes. Disk cleanup only reclaims runtime garbage; it does not replace snapshot drift checklists. Compare golden-image checksums weekly. Onboarding and plans are on the help center and pricing page.

2026 Mac Mesh Shared Build Pool Disk Waterline: DerivedData Cleanup and Three-Layer Cache Runbook

Q: What happens to jobs when the waterline hits the 92% hard stop?

Runners should fail-fast and report disk_waterline_hard_stop to avoid half-written artifacts; schedulers route jobs to nodes with headroom or trigger Burst. Seat and lock TTL semantics are in the seat-lock article.

01

Five hidden taxes before a shared build pool disk fills up

In 2026 Mac Mesh tickets, disk issues are rarely "we needed 100GB more." More often there is no shared contract across tenant rotation, cache locality, and artifact lifecycles, so APFS looks fine while Xcode fails writing temp files.

01
Unbounded DerivedData sharing: multiple repos share ~/Library/Developer/Xcode/DerivedData; indexes and module caches interleave by branch, and one clean deletes a neighbor's ModuleCache, showing up as random link failures—not disk full.
02
CocoaPods / Gradle global cache with no TTL: ~/Library/Caches/CocoaPods and ~/.gradle/caches only grow; old tarballs stay after Pod upgrades, and worktree multi-branch parallelism amplifies contention.
03
Artifacts "uploaded but still local": object storage succeeded but $CI_ARTIFACTS_DIR has no retention policy, and the rsync completion hook is not bound—IPA/dSYM slowly eat the disk.
04
APFS snapshots vs "available" space: local snapshots make df look healthy while real writable space breaks at compile peaks; missing per-volume, per-layer waterline_used_pct metrics.
05
Cleanup vs seat-lock races: sweeping directories before lease release, or conflicting with seat lock TTL, causes "disk cleared but build red" secondary incidents.

Deliverables: three-layer directory dictionary, warn/hard dual waterlines, LRU on lease end, golden-image drift weekly checks kept separate. Without any of these, do not promise "any monorepo can run in parallel" on a shared pool. The next section compares three cleanup philosophies so you avoid "everyone SSHs in Friday night and rm -rf."

02

Table: manual sweeps vs waterline daemon vs golden-image reset

Disk governance is not "clean harder." Balance build hit rate, auditable cleanup, and tenant isolation. Pin this table in change review: each layer (L1/L2/L3) gets one default strategy only.

Strategy	L1 DerivedData	L2 Pods/Gradle	L3 Artifacts	Best for	Main risk
Manual cron	Weekend rm of global dir	Occasional pod cache prune	find by age	Tiny teams, low parallelism	Neighbor deletes, no audit trail
Waterline daemon	LRU per workspace hash	Evict on capacity	48h after rsync success	Shared pool default	Needs metrics and lock contract
Image reset	Snapshot rollback clears	Refreshed with image	Volume replace	Drift out of control, compliance snapshots	Cold-start compile slowdown

Bottom line: shared pools should default to "waterline daemon"; image reset only as quarterly fallback with the golden image drift checklist, not daily LRU.

When Dedicated pools and Shared rotation coexist, L1 cache keys must carry a pool-type tag or shared-pool sweeps will evict dedicated-node locality.

Three-layer directory layout (attach to runbook)

L1: /var/mesh/cache/deriveddata/{workspace_hash}, bound via Xcode DERIVED_DATA_DIR. L2: /var/mesh/cache/cocoapods, /var/mesh/cache/gradle—do not write back to user-home global caches. L3: /var/mesh/artifacts/{job_id}—after upload, keep only checksum sidecar files. Monitoring can report layer_*_bytes per tier instead of a vague "/ partition 85%."

03

Six-step runbook: from waterline script to three-layer auto reclaim

These six steps assume runners are on Mac Mesh labels and seats are acquired before the job and released after. Do not skip order: waterlines without metrics are blind deletes.

01
Freeze the three-layer dictionary and paths: write L1/L2/L3 roots and warn (82%) / hard (92%) thresholds into repo mesh-disk-policy.yaml, and register default mount points in the image checklist.
02
Deploy disk-waterline probe: every 60s collect volume use and per-layer bytes; export to Prometheus/OpenTelemetry; on hard threshold runners enter drain and fail-fast new jobs.
03
Isolate DerivedData: CI injects DERIVED_DATA_DIR to the workspace-hash bucket; lease end triggers LRU on that bucket—never sweep global DerivedData.
04
L2 dependency cache evict: wrap pod cache clean as capacity-driven, not time-driven; point GRADLE_USER_HOME at mesh dirs and cap max-cache-size.
05
Artifacts and rsync hooks: object-storage multipart-complete callback deletes local L3; failed retries keep 7 days—fields aligned with the artifact runbook.
06
Weekly check and drill: compare golden-image checksums, simulate job reject at 90% waterline, log cleanup audit; when coordinating Burst overflow, clear L3 before accepting interruptible jobs.

Minimum disk-waterline probe fields

hostname
pool_type
volume_mount
waterline_used_pct
waterline_warn_threshold
waterline_hard_threshold
layer_l1_deriveddata_bytes
layer_l2_cocoapods_bytes
layer_l2_gradle_bytes
layer_l3_artifacts_bytes
seat_lease_id
last_cleanup_ts_unix
cleanup_evicted_bytes_1h
disk_waterline_hard_stop

Note: Probe output should be the first row on your Grafana board, not only OS alerts. Plot cleanup_evicted_bytes_1h with successful builds to tell real cleanup from "fewer builds so disk looks better."

04

Symptom matrix: triage by layer or by pool first

Disk alerts often overlap queue SLO symptoms. Use the table to see whether the issue is capacity, cache keys, or artifact pile-up before choosing sweep scope.

Symptom	layer_* dominant	Likely root cause	First action
Only Xcode step fails	L1 high	DerivedData cross-talk or index corruption	Clear bucket by workspace hash
Mixed Android/iOS pool slow	L2 high	Pods/Gradle never evicted	Tighten L2 capacity cap
Upload OK, disk still full	L3 high	rsync hook not bound	Add object-storage callback
df OK, writes fail	snapshots	APFS local snapshots	Reduce snapshot retention + probe

Warning: Do not run volume-level rm -rf while holding a seat lock. Cleanup scripts must see seat_lease_id empty or lease expired, or they delete an in-flight ModuleCache.

If L1 refills within 24 hours after bucket clear, review missing worktree isolation causing multiple full DerivedData trees on one node—before buying more disk.

05

Three hard thresholds and quotable ops parameters

These values are field compromises from multiple 16GB/24GB shared pools. Attach them to change tickets as external SLO annexes; Dedicated pools may lower warn by 5 points for stabler index hot cache.

Dual waterline: waterline_warn_threshold=82 triggers L3→L2→L1 evict order; waterline_hard_threshold=92 rejects new jobs and sets disk_waterline_hard_stop=1.
L1 max residency: shared pool per workspace bucket 14 days or 32GB, whichever hits first; Dedicated may use 28 days with a dedicated tag.
L3 local retention: delete within 48 hours after rsync/upload success; failed queue keeps 7 days, then alert and verify objects exist in object storage.

On 512GB system volumes with ~60% reserved for mesh, cap L2 combined at 80GB (40GB soft cap each for CocoaPods and Gradle) and L3 per-job directories at 12GB (including dSYM). Treating "weekend cron only" or "everyone SSH-deletes cache" as the long-term plan usually lacks audit fields and seat contracts—neighbor deletes, compile cold starts, and half-written artifacts spike in release week. For teams that need iOS/Android CI and disk SLOs on contract-grade cloud Mac Mini capacity, VpsMesh Mac Mini cloud rental is usually the better fit. See the pricing page, help center, and order page.

FAQ

2026 Mac Mesh Shared Build Pool Disk Waterline
DerivedData Cleanup and Three-Layer Cache Runbook

Five hidden taxes before a shared build pool disk fills up

Table: manual sweeps vs waterline daemon vs golden-image reset

Three-layer directory layout (attach to runbook)

Six-step runbook: from waterline script to three-layer auto reclaim

Symptom matrix: triage by layer or by pool first

Three hard thresholds and quotable ops parameters

Top three reader questions

2026 Mac Mesh Shared Build Pool Disk WaterlineDerivedData Cleanup and Three-Layer Cache Runbook

Five hidden taxes before a shared build pool disk fills up

Table: manual sweeps vs waterline daemon vs golden-image reset

Three-layer directory layout (attach to runbook)

Six-step runbook: from waterline script to three-layer auto reclaim

Symptom matrix: triage by layer or by pool first

Three hard thresholds and quotable ops parameters

Top three reader questions

2026 Mac Mesh Shared Build Pool Disk Waterline
DerivedData Cleanup and Three-Layer Cache Runbook