OpenClaw on a Linux VPS
with systemd User Services

linger · XDG_RUNTIME_DIR · daemon install and verify · layered triage · API region vs egress checks

OpenClaw Linux VPS systemd user linger API region

Platform engineers, SREs, and self-hosted agent operators hit the same failure mode in 2026: the service looks fine during an SSH session, then user-level systemd stops after logout, XDG_RUNTIME_DIR is missing under non-interactive paths, gateway logs are read as one blob with channel and model issues, and console-selected API regions disagree with the VPS egress path. This article lists five pre-production taxes, a three-way comparison of bare-metal systemd versus systemd-in-container versus Docker-only, a six-step reproducible runbook with commands, a checklist plus three citeable technical facts, and a decision matrix. Pair it with the install and doctor checklist and the Docker Compose production baseline. Order flows live on the order page.

01

Why “I can start it manually” is not the same as unattended

Running OpenClaw on Linux moves long-lived processes, socket directories, logs, and restart semantics from personal habit into auditable units. The five items below usually arrive together and all point to one gate: put linger and XDG_RUNTIME_DIR on the acceptance sheet before debating Docker.

  1. 01

    Session binding: Without linger, ending an interactive SSH session can stop the user systemd manager, so units go quiet overnight while tickets only say “it worked yesterday.”

  2. 02

    Missing runtime dir: Cron, minimal shells, or wrong service types can leave XDG_RUNTIME_DIR empty, so sockets and state paths fail with errors split between the app and systemd.

  3. 03

    Skipped layering: Gateway not listening, channel credentials, model routing, and upstream HTTP 429 are merged into one story called “OpenClaw is broken” without per-layer samples.

  4. 04

    Region vs egress drift: The console or env vars point to region A while the VPS path presents region B hints in headers, which looks like flaky auth instead of a stable 403.

  5. 05

    Mixed boundaries: Docker stacks plus user units on one host disagree on restart order and health semantics, so rollbacks are unclear about which layer to stop first.

If you are comparing host user units with container PID 1, treat the next table as a review slide, not a slogan.

02

Three residency models: bare-metal systemd, systemd-in-container, Docker-only

Decide who owns restart semantics, log rotation, linger semantics, and the boundary between sockets and host ports. There is no universal winner, only an ops boundary that matches your skills.

ModelTypical fitMain benefitMain cost
Bare-metal systemd (user)Single VPS, tight work with host firewall and loopbackAligns with distro tooling, units line up with journalMust handle linger and login session edges
systemd-in-containerMulti-process supervision inside an imageFeels like a classic Linux service hostImage and privilege edges are sharper, debug spans host and container
Docker-onlyCompose or an orchestrator already owns health and restartVersioned artifacts and rollback paths are obviousHost user linger semantics are not automatic

Reproducible acceptance is not “it runs on my laptop,” it is “the unit survives SSH logout, journal reasons are legible, and region hints are captured with the same commands twice.”

03

Six-step runbook: linger, runtime dir, install, verify, layered signals

Order the work as keep the user manager alive unattended, confirm the runtime directory, install units, triage in layers, then capture egress snapshots. Each step should ship saved command output. Gateway baselines belong in the install and doctor checklist.

  1. 01

    Pick the service user: Fix the account and primary group, avoid mixing root with a deploy user. Deliverable: id plus a short loginctl user-status snippet.

  2. 02

    Enable linger: Turn on linger for the deploy user so user@ can run without an active login. Deliverable: show-user prints linger=yes.

  3. 03

    Validate XDG_RUNTIME_DIR: Print the variable from the same profile path your unit uses, expect a /run/user/<uid> shaped value.

  4. 04

    Install and enable: Place the unit in the user scope, run daemon-reload and enable --now, confirm Active state and main pid with status.

  5. 05

    Sample by layer: Check gateway listen and config parse first, then channel tokens and webhook reachability, then upstream model quotas and region headers. Keep the last two hundred journal lines per layer.

  6. 06

    Egress consistency: Resolve the same hostname and capture TLS-visible metadata before and after changes. Do not promote a single RTT sample to a performance claim.

linger, runtime dir, user units (example)
loginctl show-user "${USER}" -p Linger
sudo loginctl enable-linger "${USER}"
systemctl --user show-environment | grep XDG_RUNTIME_DIR || true
echo "${XDG_RUNTIME_DIR}"
systemctl --user daemon-reload
systemctl --user status openclaw-gateway.service --no-pager
journalctl --user -u openclaw-gateway.service -n 200 --no-pager

Note: Replace openclaw-gateway.service with your real unit name. If your image uses another gateway binary, still trust the unit file ExecStart line.

04

Go-live checklist and three citeable technical facts

Map every item to an owner and a review cadence. Region checks collect repeatable TLS and response metadata only, not invented throughput rankings.

  1. S1

    Linger gate: Change records must attach show-user Linger=yes output as text or a screenshot.

  2. S2

    Unit boundary: State which ports user units bind versus which ports containers publish, then sync firewall docs.

  3. S3

    Log retention: Document journal persistence or remote forwarding so debug logs cannot fill the disk and mimic a crash.

  4. S4

    Layered runbook: For gateway, channel, and model layers, keep at least three “pass to advance” checks as commands or URLs.

  5. S5

    Region snapshots: Store resolver output and header samples before and after each release window for rollback contrast.

  • Linger semantics: loginctl enable-linger affects whether the per-user systemd manager stays alive; it is not automatically equivalent to choosing Docker.
  • XDG_RUNTIME_DIR: Under a systemd user session it usually resolves to /run/user/<uid>. When it is missing off-login, sockets fall back to non-writable or unstable paths.
  • Region hint headers: Many upstreams return region or edge hints in headers or error bodies. Pin tool versions and flags so cached proxy responses are not mistaken for origin truth.

Warning: One successful curl is not a durable region proof after a CDN change. A fixed hostname with repeatable commands beats a single lucky sample.

05

Decision matrix and stable cloud egress: when to leave raw bash

If linger, unit names, port matrix, and region snapshots are not versioned, Linux residency is only half done. The other half is sharing the same responsibility language as gateway triage. Use the matrix as a review slide.

Team postureDefault pickAcceptance signalCommon trap
Solo maintainer, fast iterationDocker Compose baselineHealth checks and restart policy are reviewable in composeIgnoring mem_limit and log rotation causes false hangs
Multi-tenant hostContainer boundary plus isolated project namesEach stack has its own data directoryMixing with user units creates restart races
Host-tight couplingUser systemd plus lingerJournal stays continuous after SSH endsSkipping XDG_RUNTIME_DIR on non-interactive paths

Interactive bash, linger-free nohup, or hand-rolled watchdog loops usually pay back during change review and audits. Upstream region policy shifts are also harder to explain without egress snapshots. By contrast, dedicated cloud Mac capacity with selectable regions and predictable network tiers makes stable egress and golden images easier to own alongside iOS builds or desktop handoff.

Common trap: Assuming Docker removes all systemd semantics. If a user unit still fronts the gateway outside Compose, linger and the runtime directory remain hard gates.

Personal scripts and unversioned environment exports rarely survive handoff, compliance, or rollback with an external SLA. When OpenClaw must ship with upstream region policy, TLS fingerprints, and a fixed egress narrative, bash-only paths usually lack auditable change tickets. For teams that need iOS handoff, CI regression, and automation agents in one acceptance story, and want ordering and region tiers instead of self-managed egress games, VpsMesh Mac Mini cloud rental is usually the better fit: dedicated nodes simplify ACLs and hostnames, collaboration stays close to high-churn loops, and ops language can align with the team private network build-node runbook. See pricing for region mixes, and treat connection boundaries per the help center.

FAQ

Three questions readers ask most

After interactive sessions end, systemd --user can stop and take user-level OpenClaw units with it. Verify linger before production and align connection plus residency guidance in the help center so overnight exits are not misread as upstream outages.

Pin hostname and tool versions, store resolver output, and capture TLS-visible metadata for the same endpoint. Compare console region settings with environment variables. For Compose-level health semantics, read the Docker Compose production baseline article.

When restart, log rotation, limits, and health checks are fully declared in Compose or an orchestrator, and you do not rely on host user sockets plus linger semantics, Docker-only is often simpler. Finish section three layering before mixing with user units.