Running OpenClaw on a small Linux cloud VPS while keeping Ollama on a home machine or private LAN is a common 2026 layout: public channels and webhooks land on the VPS, but weights and GPU stay where power and policy are friendly. The failure modes are almost always networking, not the model itself—wrong upstream URL, TLS terminated twice, a corporate HTTP_PROXY stealing localhost traffic, or an SSH tunnel that drops silently after sleep. This note collects a reproducible wiring pattern and a three-layer triage so you stop guessing whether a 502 came from Nginx, the Gateway, or Ollama still loading a model into VRAM.
Topology: who owns which hop
Picture a straight line: browser or messenger channel → TLS on 443 → reverse proxy (Caddy or Nginx) → OpenClaw Gateway loopback port → HTTP client to Ollama. Ollama may listen on the same VPS (127.0.0.1:11434), on another container bridge IP, or on a port forwarded from your house through SSH. Write that list on paper before editing config; the moment you mix localhost inside a container with localhost on the host without publishing ports, you get “it worked yesterday” bugs.
Gateway upstream to Ollama
Point the Gateway’s model backend at a single canonical base URL—typically http://127.0.0.1:11434 when Ollama is co-located, or http://127.0.0.1:18080 when you park a tunnel on a dedicated local port. Avoid chaining through another HTTPS hop inside the machine unless you enjoy double encryption and certificate mismatch errors. After any change, hit /api/tags with curl from the same user systemd uses; if curl works but the Gateway does not, you are looking at a different user, different network namespace, or a stale unit override. For production TLS and rollback around the Gateway, we keep a longer checklist in 2026 OpenClaw Gateway production on Linux: onboard wizard, doctor --fix, HTTPS reverse proxy, upgrade and rollback.
TLS split: public edge vs internal HTTP
Terminate TLS only at the reverse proxy. Let Caddy or Nginx speak HTTPS to the world and plain HTTP to 127.0.0.1:18789 (example Gateway bind). If you instead point the proxy to an https:// upstream on loopback, you must maintain certificates for that inner hop and renew them under the same schedule as the outer edge—usually not worth it. Enable HTTP/1.1 keep-alive to Ollama for streaming responses; some default proxy templates disable streaming buffering—verify you are not accidentally accumulating the entire completion before forwarding. When tightening exposure, compare SSH-only access versus public HTTPS using the matrix in 2026 OpenClaw Linux cloud hosts: minimal attack surface — firewall templates, Gateway loopback binding, SSH tunnel management vs public HTTPS (matrix + FAQ).
Reproducible SSH tunnel (home Ollama → VPS)
From an always-on home host running Ollama, reverse-forward to the VPS so the cloud sees Ollama as loopback: ssh -N -o ServerAliveInterval=30 -o ExitOnForwardFailure=yes -R 127.0.0.1:18080:127.0.0.1:11434 user@vps. On the VPS, point the Gateway upstream to http://127.0.0.1:18080. Keep GatewayPorts no, use keys, and run the client under systemd on the home side so sleep does not drop the tunnel. Document which side owns port 18080 before opening tickets.
# Ollama reachable on forwarded port curl -sS http://127.0.0.1:18080/api/tags | head # Gateway health after proxy (replace host) curl -sS https://gw.example.com/healthz
NO_PROXY and friends (matrix)
Many VPS images export HTTP_PROXY for mirrors; align upper and lower case or unset them for the Gateway unit. Use the matrix when Ollama must stay on-loopback.
| Traffic target | Symptom if wrong | Fix |
|---|---|---|
127.0.0.1 / localhost |
Gateway logs show proxy connect to corporate gateway; 502 in milliseconds | NO_PROXY=127.0.0.1,localhost,::1 on the systemd unit |
Docker bridge (172.17.0.2) |
Intermittent timeouts when compose restarts reorder IPs | Use a stable service name + user-defined bridge; add that name to NO_PROXY |
| Unix socket upstream | ECONNREFUSED despite socket file present |
Match UID inside unit; check ProxyCommand not wrapping local tools |
read_timeout and confirm GPU memory headroom. An instant 502 with a tiny response body usually means nothing is listening on the upstream socket yet.
Layered triage: 502 vs hang vs reset
Layer A — Edge: read Nginx or Caddy error logs; connect() failed (111: Connection refused) on HTTPS means nothing listened on the upstream port. Layer B — Gateway: confirm the outbound URL matches what curl used from the same unit. Layer C — Ollama: check logs for model load; first token after a cold start can exceed default proxy read timers. If only long prompts fail, suspect edge body limits before blaming Ollama.
curl proof from the VPS loopback. That sequence ends most debates in one round.
On a cloud Mac mini, local Ollama stays in the same room as your IDE
When you are iterating on prompts, tools, and Gateway plugins, co-locating Ollama with a desktop-class machine removes SSH tail latency and makes streaming completions feel instant. Apple Silicon’s unified memory architecture is especially friendly to mid-size models that would thrash on a small VPS with no GPU, and macOS gives you a native Unix shell plus Homebrew without container networking surprises.
A Mac mini M4 idles at roughly 4W, stays whisper-quiet, and pairs well with long-running tunnels or local reverse proxies—lower crash noise than a cobbled Windows box and far less operational drag than babysitting PCI passthrough on self-built hardware.
If you want that stability without buying metal upfront, VPSSpark cloud Mac mini M4 is a practical place to start — explore plans now and keep OpenClaw, Ollama, and Xcode-class tooling on one coherent host.