Edge ingress · Idempotency · CI payloads · Three-tier Gateway logs · Pairs with the non-IM triggers guide
Small teams running OpenClaw Gateway 24/7 on VPS / Docker often drive work through Inbound Webhooks or CI completion callbacks. Two failure modes dominate: duplicate deliveries amplify side effects, and logs blur whether the fault sits at the edge, Gateway, or downstream tools. This guide provides an auth and timing-window matrix, a six-step rollout runbook, a copy-paste curl probe, and a three-stage Gateway logging split for triage. For trigger-source architecture, pair with non-IM trigger automation; for channel and model triage, see runtime troubleshooting; align TLS and reverse-proxy defaults with production domain reverse proxy checklist.
IM channels ship platform retries and visible context, so duplicate messages are easy to reason about. Webhooks and CI callbacks usually spike during unattended windows; duplicates and out-of-order arrivals look like model instability. CI reruns, merge queues, or manual job retries often emit multiple terminal notifications for one logical build. If Gateway ignores pipeline_run_id, stage, and conclusion in the idempotency key, you get double spend, duplicate tickets, or duplicate downstream merges overnight.
Clock and timeout stacking is subtle: container drift versus host NTP can break signed timestamp windows; an edge proxy_read_timeout shorter than CI retry backoff surfaces as Gateway 502 while the edge actually tears the connection. Without splitting logs into edge / Gateway / tools, you tune the wrong layer.
Duplicate blind spots: logging only HTTP 200 without idempotency hits hides doubled side effects until finance or customers complain.
Edge versus Gateway gap: access logs lack request ids, so Gateway traces cannot join and triage becomes guesswork.
Payload drift: CI templates rename JSON fields while Gateway still routes on old keys, looking like flaky branching.
IP allowlists versus dynamic egress: hosted runner IPs rotate into 403s that are misread as OpenClaw auth bugs.
Maintenance misalignment: announced downtime plus nightly CI batches create retry storms that fill Gateway queues.
Pick auth based on threat model: shared secrets plus IP allowlists for internal CI; HMAC or mTLS when exposure grows—verify at the edge or a sidecar, not scattered across app code.
| Option | Ops cost | Typical threat | Best for |
|---|---|---|---|
| Shared Key + Header | Low | Low forgery cost after key leakage | Intranet CI, single team, quick key rotation |
| HMAC (payload digest) | Medium | Implementation errors can lead to false rejections or bypasses | Webhooks that require tamper resistance and simple replay windows |
| Bearer Token (short-term) | Medium | Token leakage window depends on TTL and distribution chain | A team that already has OIDC/issuance services |
| mTLS | high | Certificate rotation and revocation chain is complex | Multi-tenant, strong compliance, long-term fixed integration solution |
Assume TLS termination and reverse-proxy wiring follow the production checklist and Gateway binds locally; fix mixed content or WebSocket drops before turning on Webhooks or logs stay noisy.
Freeze URLs and paths: Use separate path prefixes for Webhooks and CI callbacks, and prohibit mixing the same Location with Control UI static resources.
Inject request id: Edge generation or transparent transmission of X-Request-Id, Gateway log must print the same field to facilitate three-segment correlation.
Define idempotent key: combine delivery_id or CI's check_suite_id + job_id + conclusion , write to deduplication storage and set TTL.
Alignment time window: The signature or timestamp verification window should be larger than the maximum CI retry interval and reserve NTP drift margin.
Fault injection self-test: Deliberately send duplicate payloads and out-of-order conclusions to verify whether the side effect count is always 1.
Archive audit fields: Record the caller IP, key version number, and idempotent key hit results. The retention period meets internal compliance instead of only storing 200.
curl -sS -X POST "https://YOUR_DOMAIN/openclaw/hooks/ci" \
-H "Content-Type: application/json" \
-H "X-Webhook-Signature: sha256=REPLACE" \
-H "X-Idempotency-Key: pipeline-12345-success" \
-H "X-Request-Id: probe-$(date +%s)" \
--data '{"event":"workflow_job","conclusion":"success","repository":"acme/app"}'
Tip: Please only trigger the probe in the test warehouse; the production environment should be configured with an independent rotation window for the key to avoid mixing it with the IM channel key.
Tier 1 — Edge: Read reverse-proxy access logs for upstream status, bytes, and request_time before touching Gateway; 499/502 here means timeouts or TLS/SNI issues, not model routing.
Tier 2 — Gateway: Track auth failures, idempotent hits, queue depth, and JSON parse errors; compare with runtime troubleshooting to see whether requests never reached handlers.
Tier 3 — Tools/models: Pair tool-call IDs with outbound HTTP status even when CI sees HTTP 200, or you miss empty-response failures.
Note: Do not enter the complete key or signature raw material into the info level log; auditing can use the key version number and truncated fingerprint instead.
One-click association: For any customer service ticket, first ask for X-Request-Id, search each of the three logs once, and then write a conclusion.
Layered alarm: Edge 5xx and Gateway 4xx must not share the same silence rule, otherwise no one will know about the retry storm.
Review template: Record the four columns of "failure level, first exception field, idempotent key, and caller IP version" for reuse.
Defaults for internal reviews—replace with your CI callback SLA before promising anything externally.
| Team maturity stage | Webhook entry strategy | First priority improvement |
|---|---|---|
| 0→1 Verification | Shared key + manual curl | Complement idempotent keys and request ids, prohibiting the production of manual key distribution |
| 1→10 collaboration | HMAC + IP/CIDR allow list | Split two sets of keys and audit buckets between CI and external suppliers |
| 10→100 scale | Short-lived token or mTLS + rate limiting | Integrate gateway observations into unified alarm and on-call layering |
Ephemeral VPS egress and hand-rolled keys amplify webhook flakes during migrations. Teams that want OpenClaw, CI callbacks, and production-grade Mac capacity in one ops story usually benefit from VpsMesh cloud Mac Mini rental: predictable regions and networking so webhook triage and build triage share the same vocabulary.
First check reverse-proxy access logs for upstream status and latency, then search Gateway logs with the same X-Request-Id, then drill into tool calls. When ordering stable egress nodes, use the order page.
Finish the install and doctor checklist before enabling Webhooks; otherwise the same signature failures appear in all three log tiers.
Remote access guidance lives in the help center; compare plans on the pricing page.