Self-hosted runners · workload identity · TTL and audit · decision matrix
Platform and mobile leads who run CI across a mesh of remote Macs rarely fail because compilers are slow. They fail because long-lived PATs, deploy private keys, and cross-region copy scripts land on every runner, and offboarding windows multiply blast radius. This article breaks down three secret risk classes, compares OIDC workload identity with PATs and deploy keys, delivers a six-step Runbook with exchange patterns, defines TTL and minimum audit fields, and closes with a hosting platform × compliance × outbound registry matrix. It interlinks shared build pools, observable task chains, and artifact and cache locality so identity boundaries and byte paths stay aligned.
SSH jump hosts, signing identities, and warm caches can all be green while incidents still trace to a US-East disk image, a three-year-old PAT on a Singapore node, or one private key reused for staging and production. The root cause is that identity is still modeled as people instead of machine sessions scoped to pipelines and environments. That gap couples tightly to idempotency keys and shared pool mutex; without structured claims you can only answer who logged in, not which build consumed which audience.
Long-lived disk tax: organization PATs and kubeconfig files baked into image layers, plists, or dotfiles behave like universal keys for any shell session; even 600 permissions still widen readers through backups and forensics.
Cross-region copy tax: rsyncing the same private material to three regions triples exposure whenever a clone leaves the fleet; mixed with artifact sync plans it becomes unclear which path actually leaked.
Rotation lag tax: spreadsheets that track who owns which secret defer rotations whenever they collide with release trains, producing zombie credentials everyone knows should die but nobody dares touch.
Environment bleed tax: one runner serving both mainline gates and external contributions stacks multiple tokens in the environment; weak job isolation lets a staging audience ride into production publish steps.
Observability blind tax: build logs that omit token_issuer, subject, and ttl_remaining_sec cannot prove which trust chain minted a session after the fact.
Treat the list as a preflight checklist before comparing OIDC with PATs so you graduate from works on my machine to auditable mesh CI. Pair with SSH versus VNC baselines to separate interactive assumptions from unattended refresh cadence.
No option is universally best; each must match organization size, audit granularity, and outbound registry policy. OIDC binds sessions to repositories and environments, which fits multi-region meshes. PATs are fast to adopt but weak to audit. Deploy keys remain necessary for a narrow set of signing flows yet resist fine-grained revocation. Regional affinity must enter trust policies; otherwise a US-East audience accidentally consumed in Singapore turns incidents into timezone guessing games.
| Dimension | OIDC workload identity | Long-lived PAT | Deploy private key |
|---|---|---|---|
| Granularity | Repository, environment, and branch subjects with optional claims | Often org- or user-wide; splitting means more tokens | Usually one key pair per slot unless you multiply certs |
| Revocation speed | Disable trust policy or shorten TTL for global effect | Depends on platform UI plus client caches | Requires CRL or fingerprint deny lists plus client behavior |
| Multi-region fit | Strong; claims can carry region and runner fingerprint | Medium; copying equals broadcasting | Medium; signing is hard requirement but distribution is wide |
| Observability | Issuer, audience, and jti map cleanly to logs | Only hash prefixes and acting accounts | Needs extra hooks for key id and signature targets |
| Run cost | High upfront configuration, cheap rotations later | Low start, expensive audits and revokes | Medium; certificate lifecycle work is unavoidable |
Mesh CI security depends on whether sessions explain builds, not on whether green builds sometimes happen.
If you already operate shared build pool runners, paste this comparison into your architecture note so identity does not remain a hallway handshake.
The steps stay vendor-agnostic: GitHub Actions, GitLab, or a bespoke scheduler only change field names, not the deliverables. Each step should map to a reviewable change ticket. When combining with task chain handoffs, echo job_id and environment inside the envelope.
Freeze trusted issuers: allow only organization-controlled issuer URLs, reject wildcard hostnames, and record diffs in infrastructure change history.
Isolate audiences per environment: staging, production, and compliance partitions each get their own audience string; never reuse one audience across environments.
Fail runner boot scripts on plaintext: scan for PAT filenames and kubeconfig patterns and abort registration when matches appear.
Exchange OIDC for cloud STS: follow each cloud short-session pattern and write credentials to memory file descriptors instead of persistent paths.
Cap TTL and renewal: session length should cover 1.5× build P95 plus a hard ceiling; renewal failure must page, never silently fall back to long-lived keys.
Drill revocation: randomly drop one trust policy and verify every region refuses new sessions within one minute while in-flight jobs fail predictably.
export RUNNER_FINGERPRINT="$(system_profiler SPHardwareDataType | shasum | awk '{print $1}')"
export OIDC_AUDIENCE="vpsmesh-ci-prod-${RUNNER_REGION}"
node scripts/exchange-oidc-for-sts.mjs \
--issuer "${ACTIONS_ID_TOKEN_REQUEST_URL}" \
--audience "${OIDC_AUDIENCE}" \
--runner-fingerprint "${RUNNER_FINGERPRINT}"
Note: keep STS results in process memory or tmpfs and revoke in job teardown hooks; never write exchange output back into golden images.
Mesh value is running the same pipeline on Macs in different cities, yet identity must co-design with regional affinity and registry egress policy. Otherwise Singapore pulls images quickly while STS regions mismatch, or US-East tokens return 403 against buckets in Tokyo. Triage issuer and audience first, then runner fingerprints inside claims, and only then suspect compiler caches.
Claims first: verify repository, environment, and ref; drift usually means reused workflow templates without parameters.
Affinity second: pick STS regions that align with artifact buckets and registries or satisfy compliance allow lists.
Caches last: when cache keys or staged publish drift, return to byte paths and checksum fields.
Log jti and remaining TTL: index jti in build logs to join cloud audit trails with pipelines.
Failure-domain drill: cut network to one region and confirm others never inherit its session files or tmpfs mounts.
Align with mutex: exchange credentials before acquiring leases so half sessions do not occupy seats.
Warning: decrypting long-lived material to disk and deleting after use still risks crash residue; prefer memory and kernel keyrings with forced teardown at job boundaries.
The three bands below come from cross-region iOS and macOS pipeline reviews; treat them as preflight checks, not guarantees. Replace them with your own histograms and attach raw distributions to architecture approvals.
job_id, environment, or jti; otherwise questionnaires cannot close.| Platform | Compliance | Registry egress | First choice |
|---|---|---|---|
| GitHub Actions | Standard | Public registry allowed | OIDC to cloud STS with per-environment audiences on runner groups |
| GitLab | Standard | Private registry required | CI_JOB_JWT bound to IdP; pulls via same-region cache |
| Custom scheduler | High | Restricted outbound | Partitioned signing service with mTLS; PAT only as break-glass |
| Heavy fork traffic | Medium | Mixed | Separate audiences for forks versus internal repos; forbid shared runner workspaces |
Laptops, borrowed machines, and whoever-is-free SSH habits keep failing audit isolation and refresh cadence. Even perfect OIDC cannot compensate for sleep and patch windows that break token renewal.
Common mistake: optimizing interactive convenience while ignoring unattended refresh and disk residue; the two modes need opposite controls.
Teams that ship iOS and macOS continuously while aligning OIDC sessions with auditable fields often stall on procurement and multi-site cabling. Borrowed hardware rarely supports forced revocation and seat isolation. For production-grade mesh CI with rotatable identity boundaries, VpsMesh Mac Mini cloud rental is usually the better fit: elastic billing cycles, selectable regions, and dedicated nodes you can reference in contracts so policy debates rest on measurable uptime instead of informal promises.
Align runner groups and environment audiences before task chain envelopes and lease fields; cross-read shared build pools and observable task chains. For capacity, review Order regions and sizes.
Add rotation, two-person review, and log retention labor to per-build cost, then compare Pricing with the three-year TCO article.
Start with the Help Center and cross-read SSH versus VNC; when credentials misbehave, return here for audience and claims checks.