OpenClaw on Docker Compose in 2026
Networking and CLI Reachability

Namespace matrix · network_mode choices · reverse-proxy WebSocket · allowedOrigins · six-step runbook

OpenClaw Docker Compose network Gateway CLI troubleshooting

Operators self-hosting OpenClaw on a VPS with Compose often hit a mirage: docker compose ps is green, Gateway logs scroll, yet the openclaw CLI on the host or in a sidecar container keeps timing out or returns 502. The root cause usually sits in how network namespaces combine with bind addresses, not the model API key. This article states who has which problem, uses a three-layer symptom tree to separate process health from true reachability, compares bridge, host, and network_mode: service in a matrix, walks a six-step reproducible runbook across published ports, loopback binds, reverse-proxy WebSocket upgrades, and allowedOrigins, and closes with auditable technical facts plus a decision matrix. Cross-read the Docker Compose production baseline, the multi-instance isolation checklist, and the Gateway hardening checklist; for stable nodes and predictable egress, use the order page.

01

Translate healthcheck green into network terms: five frequent misreads

A Compose healthcheck often probes loopback inside the container or process liveness. It does not automatically prove that DNS resolution, iptables, and user-space proxies all cooperate on the path from a CLI container to the Gateway service name. The five patterns below arrive together in tickets; separating them thins your incident log immediately.

  1. 01

    Listening on 127.0.0.1 only: when the Gateway binds loopback, sibling services on the same bridge get connection refused via the service name; it feels like random timeouts even though nothing ever left that network namespace.

  2. 02

    CLI on the host with a container hostname: copying openclaw-gateway:18789 into a host shell profile misaligns resolution and routing instantly.

  3. 03

    Reverse proxy forwards HTTP but not Upgrade: browsers or CLIs using WSS see 400 or silent drops while application logs still show Gateway ready.

  4. 04

    allowedOrigins drift from real origins: mixing production domains, internal aliases, and MagicDNS-style names rejects handshakes at the app layer while packet captures look fine.

  5. 05

    network_mode: service restart races: after restart ordering changes, downstreams still hit old container IPs or stale port mappings, producing intermittent success.

Print the next section as a review handout: allow only one matrix cell to change per architecture change, and attach paired outputs for curl inside the same namespace versus curl from the outer namespace.

Add a time dimension: during rolling updates Compose briefly runs old and new containers together. If DNS caches and client pools diverge, you see first request succeeds, then minutes of failures. Before raising timeouts, refresh resolution on the initiator and compare connection reuse against the current endpoint from docker inspect. If you also chain user-space or corporate proxies, log CONNECT tunnel targets separately from direct targets so a 407 from the proxy is not misread as application auth failure.

Another easy miss is MTU and fragmentation on cross-cloud or cross-carrier paths, which inflates into sporadic timeouts. When large payloads fail while tiny health checks stay green, narrow captures to WebSocket frame sizes and TLS record boundaries instead of rewriting application routes first.

Once those signals live in the change ticket, align timestamps between openclaw logs and edge access logs. Most teams can collapse mystifying networking into a single configuration field within thirty minutes, which is also the context depth a minimal repro package should carry when asking for outside help.

02

Compose network models: default bridge, host, and shared network namespaces

When you pick a model, write down who initiates the connection, what name resolves to which address, and which NAT layers sit in between. Without that table the team ping-pongs between changing ports, extra_hosts, and reverse-proxy upstreams.

ModelTypical listen patternOther services in the same compose fileHost processes
Default bridge with published ports0.0.0.0 inside the container or explicit publishesUse the Compose service name and internal portUse 127.0.0.1:published or a host NIC IP
Host networkingShares the host stack; binds are host-visibleOther containers that stay off bridge cannot keep the old service-name pathCheck port collisions and INPUT firewall chains alongside containers
network_mode: service:gatewayShares the Gateway netns; loopback semantics alignSidecars may call 127.0.0.1:gateway-portThe host still needs published ports or a proxy; nothing is inherited automatically

True reachability means repeating the same hostname, port, and TLS parameters inside the initiator network namespace and getting consistent responses, not a one-off curl from a laptop.

03

Six-step runbook: from process liveness to WebSocket upgrades and origins

The sequence keeps the cheapest observations first; stop and save output whenever a step fails. If field names diverge from your install, cross-check the install and doctor troubleshooting checklist.

  1. 01

    Label the initiator: note whether commands run on the host, in a Gateway sidecar, or in a standalone CLI container; capture hostname and a short ip route summary.

  2. 02

    HTTP probe inside the netns: from the initiator namespace, GET or HEAD the target hostname and port, verify status codes and body prefixes, and rule out pure DNS failure.

  3. 03

    WebSocket probe: exercise the upgrade path you actually use, record edge versus application response headers, and align timestamps with logs.

  4. 04

    Listen matrix inside Gateway: if listeners bind 127.0.0.1 only and cross-service access is required, move to 0.0.0.0 or adopt a shared netns and update the runbook accordingly.

  5. 05

    Reverse-proxy quadruple: confirm upstream points at container IP versus published port, verify Connection and Upgrade forwarding, and ensure idle timeouts are not clipping long-lived sessions.

  6. 06

    Origins checklist: enumerate real Origin strings or equivalents for browsers, CLIs, and CI; every missing row is a release blocker.

Inside bridge stack versus host (examples)
docker compose ps
docker compose exec cli sh -lc 'getent hosts openclaw-gateway; curl -fsS -o /dev/null -w "%{http_code}\n" http://openclaw-gateway:18789/health || true'
curl -fsS -o /dev/null -w "%{http_code}\n" http://127.0.0.1:18789/health || true
docker compose logs --no-color --tail=200 openclaw-gateway

Note: replace service names, ports, and health paths with the values from your repository; keep the pattern of running the same URL from two namespaces.

04

Reverse proxies and Gateway parameters: three facts that belong in the change ticket

This section lists facts you can name in configuration files, not vibes about CDNs misbehaving. For memory limits and log rotation language, return to the Compose production baseline.

  • Published ports and iptables: ports: mappings create DNAT rules on the host; misunderstanding ordering between local firewall policies and Docker chains yields container-to-container success while the host path fails, or the opposite.
  • 127.0.0.1 bind semantics: listening on loopback inside a container netns only accepts loopback ingress for that netns; this does not automatically contradict publishing ports, but it often contradicts sibling containers addressing you by service name.
  • WebSocket coexisting with HTTP/2: when an edge buffer mis-handles upgrade or keep-alive timeouts, streams die quietly tens of seconds after the handshake; align timestamps on both sides before rotating API keys.

Warning: do not simultaneously change reverse-proxy upstreams, Gateway binds, and CLI configuration without a recorded baseline; triangular changes make bisection impossible.

05

Decision matrix: when a CLI sidecar or host networking is worth the complexity

Write a boolean for whether the CLI must share loopback semantics with the Gateway before choosing network_mode: service: or host. Use the matrix for design review, not slogans.

ConstraintSafer defaultKey acceptance signalMain risk
CLI and Gateway in one compose filebridge plus explicit 0.0.0.0 bindsservice-name resolution matches internal-port curlFirewall and published-port documentation drifts from reality
Must share localhost semanticssidecar with network_mode: service:gatewaysidecar restart does not resurrect stale connection poolsupgrade ordering couples with volume mount permissions
Mature host reverse proxy already existspublished loopback only plus TLS termination at the edgepacket or log proof shows consistent WS upgradeallowedOrigins misses URL shapes the CLI actually uses

Relying on ad-hoc tunnel scripts or hand-edited hosts files binds mean time to recovery to individual memory. When upstream certificates or internal DNS change, triage regresses into all-hands meetings.

Common pitfall: seeing 502 and rotating model keys first; finish HTTP and WebSocket probes from section three before touching credentials.

Opening ports temporarily without a checklist rarely proves default-deny, explicit-allow posture for audits. When OpenClaw must ship alongside fixed egress, hostnames, and mutual TLS policy, ad-hoc VPS networking layers often lack signable change records. For teams that need iOS builds, desktop handoff, and persistent agents on dedicated machines with predictable regions and network tiers, and want fewer loops guessing host versus container netns, VpsMesh Mac Mini cloud rental is usually the better fit: dedicated nodes simplify how you describe listeners and ACLs in the same language as the team private network runbook; pricing lives on the pricing page and connectivity guidance on the help center.

FAQ

Three questions readers ask first

Run HTTP probes inside the same netns as the CLI, then validate WebSocket upgrade, and only then revisit CLI hostnames for accidental host loopback or stale aliases. For hardening, see the production hardening checklist.

Use it when helpers must share the Gateway stack and loopback view; document restart ordering for the shared service and cross-check official connectivity notes in the help center.

Upstream timeouts, missing Upgrade header forwarding, or allowedOrigins gaps for the real client origin; align edge and application logs before rotating keys or model routes. For plans and egress needs, see the pricing page.