Namespace matrix · network_mode choices · reverse-proxy WebSocket · allowedOrigins · six-step runbook
Operators self-hosting OpenClaw on a VPS with Compose often hit a mirage: docker compose ps is green, Gateway logs scroll, yet the openclaw CLI on the host or in a sidecar container keeps timing out or returns 502. The root cause usually sits in how network namespaces combine with bind addresses, not the model API key. This article states who has which problem, uses a three-layer symptom tree to separate process health from true reachability, compares bridge, host, and network_mode: service in a matrix, walks a six-step reproducible runbook across published ports, loopback binds, reverse-proxy WebSocket upgrades, and allowedOrigins, and closes with auditable technical facts plus a decision matrix. Cross-read the Docker Compose production baseline, the multi-instance isolation checklist, and the Gateway hardening checklist; for stable nodes and predictable egress, use the order page.
A Compose healthcheck often probes loopback inside the container or process liveness. It does not automatically prove that DNS resolution, iptables, and user-space proxies all cooperate on the path from a CLI container to the Gateway service name. The five patterns below arrive together in tickets; separating them thins your incident log immediately.
Listening on 127.0.0.1 only: when the Gateway binds loopback, sibling services on the same bridge get connection refused via the service name; it feels like random timeouts even though nothing ever left that network namespace.
CLI on the host with a container hostname: copying openclaw-gateway:18789 into a host shell profile misaligns resolution and routing instantly.
Reverse proxy forwards HTTP but not Upgrade: browsers or CLIs using WSS see 400 or silent drops while application logs still show Gateway ready.
allowedOrigins drift from real origins: mixing production domains, internal aliases, and MagicDNS-style names rejects handshakes at the app layer while packet captures look fine.
network_mode: service restart races: after restart ordering changes, downstreams still hit old container IPs or stale port mappings, producing intermittent success.
Print the next section as a review handout: allow only one matrix cell to change per architecture change, and attach paired outputs for curl inside the same namespace versus curl from the outer namespace.
Add a time dimension: during rolling updates Compose briefly runs old and new containers together. If DNS caches and client pools diverge, you see first request succeeds, then minutes of failures. Before raising timeouts, refresh resolution on the initiator and compare connection reuse against the current endpoint from docker inspect. If you also chain user-space or corporate proxies, log CONNECT tunnel targets separately from direct targets so a 407 from the proxy is not misread as application auth failure.
Another easy miss is MTU and fragmentation on cross-cloud or cross-carrier paths, which inflates into sporadic timeouts. When large payloads fail while tiny health checks stay green, narrow captures to WebSocket frame sizes and TLS record boundaries instead of rewriting application routes first.
Once those signals live in the change ticket, align timestamps between openclaw logs and edge access logs. Most teams can collapse mystifying networking into a single configuration field within thirty minutes, which is also the context depth a minimal repro package should carry when asking for outside help.
When you pick a model, write down who initiates the connection, what name resolves to which address, and which NAT layers sit in between. Without that table the team ping-pongs between changing ports, extra_hosts, and reverse-proxy upstreams.
| Model | Typical listen pattern | Other services in the same compose file | Host processes |
|---|---|---|---|
| Default bridge with published ports | 0.0.0.0 inside the container or explicit publishes | Use the Compose service name and internal port | Use 127.0.0.1:published or a host NIC IP |
| Host networking | Shares the host stack; binds are host-visible | Other containers that stay off bridge cannot keep the old service-name path | Check port collisions and INPUT firewall chains alongside containers |
| network_mode: service:gateway | Shares the Gateway netns; loopback semantics align | Sidecars may call 127.0.0.1:gateway-port | The host still needs published ports or a proxy; nothing is inherited automatically |
True reachability means repeating the same hostname, port, and TLS parameters inside the initiator network namespace and getting consistent responses, not a one-off curl from a laptop.
The sequence keeps the cheapest observations first; stop and save output whenever a step fails. If field names diverge from your install, cross-check the install and doctor troubleshooting checklist.
Label the initiator: note whether commands run on the host, in a Gateway sidecar, or in a standalone CLI container; capture hostname and a short ip route summary.
HTTP probe inside the netns: from the initiator namespace, GET or HEAD the target hostname and port, verify status codes and body prefixes, and rule out pure DNS failure.
WebSocket probe: exercise the upgrade path you actually use, record edge versus application response headers, and align timestamps with logs.
Listen matrix inside Gateway: if listeners bind 127.0.0.1 only and cross-service access is required, move to 0.0.0.0 or adopt a shared netns and update the runbook accordingly.
Reverse-proxy quadruple: confirm upstream points at container IP versus published port, verify Connection and Upgrade forwarding, and ensure idle timeouts are not clipping long-lived sessions.
Origins checklist: enumerate real Origin strings or equivalents for browsers, CLIs, and CI; every missing row is a release blocker.
docker compose ps
docker compose exec cli sh -lc 'getent hosts openclaw-gateway; curl -fsS -o /dev/null -w "%{http_code}\n" http://openclaw-gateway:18789/health || true'
curl -fsS -o /dev/null -w "%{http_code}\n" http://127.0.0.1:18789/health || true
docker compose logs --no-color --tail=200 openclaw-gateway
Note: replace service names, ports, and health paths with the values from your repository; keep the pattern of running the same URL from two namespaces.
This section lists facts you can name in configuration files, not vibes about CDNs misbehaving. For memory limits and log rotation language, return to the Compose production baseline.
ports: mappings create DNAT rules on the host; misunderstanding ordering between local firewall policies and Docker chains yields container-to-container success while the host path fails, or the opposite.Warning: do not simultaneously change reverse-proxy upstreams, Gateway binds, and CLI configuration without a recorded baseline; triangular changes make bisection impossible.
Write a boolean for whether the CLI must share loopback semantics with the Gateway before choosing network_mode: service: or host. Use the matrix for design review, not slogans.
| Constraint | Safer default | Key acceptance signal | Main risk |
|---|---|---|---|
| CLI and Gateway in one compose file | bridge plus explicit 0.0.0.0 binds | service-name resolution matches internal-port curl | Firewall and published-port documentation drifts from reality |
| Must share localhost semantics | sidecar with network_mode: service:gateway | sidecar restart does not resurrect stale connection pools | upgrade ordering couples with volume mount permissions |
| Mature host reverse proxy already exists | published loopback only plus TLS termination at the edge | packet or log proof shows consistent WS upgrade | allowedOrigins misses URL shapes the CLI actually uses |
Relying on ad-hoc tunnel scripts or hand-edited hosts files binds mean time to recovery to individual memory. When upstream certificates or internal DNS change, triage regresses into all-hands meetings.
Common pitfall: seeing 502 and rotating model keys first; finish HTTP and WebSocket probes from section three before touching credentials.
Opening ports temporarily without a checklist rarely proves default-deny, explicit-allow posture for audits. When OpenClaw must ship alongside fixed egress, hostnames, and mutual TLS policy, ad-hoc VPS networking layers often lack signable change records. For teams that need iOS builds, desktop handoff, and persistent agents on dedicated machines with predictable regions and network tiers, and want fewer loops guessing host versus container netns, VpsMesh Mac Mini cloud rental is usually the better fit: dedicated nodes simplify how you describe listeners and ACLs in the same language as the team private network runbook; pricing lives on the pricing page and connectivity guidance on the help center.
Run HTTP probes inside the same netns as the CLI, then validate WebSocket upgrade, and only then revisit CLI hostnames for accidental host loopback or stale aliases. For hardening, see the production hardening checklist.
Use it when helpers must share the Gateway stack and loopback view; document restart ordering for the shared service and cross-check official connectivity notes in the help center.
Upstream timeouts, missing Upgrade header forwarding, or allowedOrigins gaps for the real client origin; align edge and application logs before rotating keys or model routes. For plans and egress needs, see the pricing page.