title: @kloss_xyz: I've been running multi agent orchestration with my OpenClaw for over a week. It...
author: kloss_xyz
content_type: twitter_article
published: 2026-02-13T00:10:23+00:00
source_url: https://x.com/kloss_xyz/status/2022101005064974600
word_count: 4637
I've been running multi agent orchestration with my OpenClaw for over a week. It's getting much, muc
I've been running multi agent orchestration with my OpenClaw for over a week. It's getting much, much closer to where I want it to be. The issue is though that everyone's telling you theirs is perfect, and they did it in a day.
They're lying. I've been building mine for a week plus.
Every single thing that could break when I built mine, literally broke.
What follows is every annoying ass problem I hit, every issue power users have reported, and every config that actually made this thing work reliably. Whether you're running one agent or ten with subagents, whether you're technical or just getting started, this is the reference guide I wish existed when I began. This isn't a setup guide. It's an optimization one.
This isn't something you skim and forget. Please bookmark it and save it. Come back to it every time something goes sideways with OpenClaw.
Let's get into it.
1. Upgrading Your OpenClaw
If you're coming from an old OpenClaw setup, the good news is that installer actually handles the migration pretty well. It moves your .clawdbot directory to .openclaw and creates a symlink back so nothing breaks, and your config, soul files, memory, and workspace all carry over. The clawdbot package also stays as a compatibility shim.
The real problem is what it doesn't actually clean up. The old clawdbot-gateway.service can keep running alongside the new openclaw-gateway.service in some cases, and when both try to grab port 18789 you'll get a restart loop where the new gateway fails over and over with "another gateway instance is already listening." It looks like OpenClaw is broken but it's actually just fighting itself with its own predecessor for the port.
Before you install OpenClaw, stop and disable the old service completely:
Then uninstall the old npm package (npm uninstall -g clawdbot) and check for leftover files in /usr/local/bin/clawdbot or /usr/local/lib/node_modules/clawdbot since residual packages can silently interfere with the new install. After that, install OpenClaw fresh and your existing workspace files will be picked up automatically.
2. Agent Stability (Hangs, Crashes, and Silent Deaths)
Your agent will hang, it will crash, and it will go completely silent for minutes with no explanation. This is normal when you're running agents 24/7, and the fix is building around it.
Write a simple watchdog script that pings the gateway health endpoint every 15 minutes. If it doesn't respond, auto-restart the whole thing. You shouldn't be babysitting this manually.
OpenClaw has a built-in diagnostic command called openclaw doctor that checks your config, gateway, channels, workspace, and permissions in one shot. Run it with the --fix flag (openclaw doctor --fix) and your actual issue and it will auto-repair common issues like permission problems, missing directories, legacy config keys, and outdated service paths. It backs up your config before making changes and won't touch your API keys or credentials, so it's safe to run whenever something feels off or after an upgrade. You can also use it without the --fix flag and some users have found better results that way.
3. Security (Lock This Down Immediately)
If your server is internet-facing, assume someone is already trying to brute force their way in. This isn't paranoia, it's what happens to every exposed server within hours of going live.
At minimum:
But honestly, the better move is to not expose it at all. Use Cloudflare Tunnel or Tailscale to access your server without opening ports to the internet. In your OpenClaw config, set gateway.bind: "loopback" so the gateway only listens locally. No exposed ports means no attack surface for nefarious actors.
4. Plugins Breaking Your Gateway
Plugins are powerful but they can take your entire gateway down. If something dies right after you install a plugin, that plugin is almost certainly the cause.
The fix is simple:
Check gateway.err.log for the actual error
Uninstall the plugin: openclaw plugins disable
Restart
The habit to build here is verifying your gateway restarts cleanly after every single plugin install. Don't install three plugins at once and then wonder which one broke everything, just go one at a time, verify, and move on.
5. The Autonomy Problem
This is the issue I see more than any other, where the agent doesn't listen, leaves tasks unfinished, or says "done" when the work is clearly broken.
Here's the thing: the agent is exactly as autonomous as your instructions make it. If your instructions are vague, the agent's behavior will be vague. If you don't define what "done" means, the agent will decide for itself, and its definition will be generous.
Put explicit rules in your AGENTS (.md) file:
Every time the agent claims "done," it must include the repo name, branch, and commit hash. It must verify its work with actual commands, not just say "I checked." Design heartbeat loops that catch incomplete tasks before they sit there rotting for hours.
The agent isn't being lazy. It's following the looseness in your instructions. Tighten those up and the behavior changes immediately.
6. Model Configuration
Too many models in your fallback list creates unpredictable behavior. The agent switches between different reasoning styles mid-task and the output quality becomes inconsistent.
Keep your model list to 2 or 3 maximum and stay within the same family, either all Anthropic or all OpenAI. Don't mix providers in the same fallback chain.
Configure your models in the actual config file, not through the TUI or GUI. Those interfaces sometimes don't persist settings correctly, and you'll wonder why your changes disappeared after a restart.
If you're using free models, put them last in the fallback chain and never as the primary model for anything critical since they're a safety net, not the foundation.
Here's how a solid failover config looks:
Auto-switches on failures so your agent never goes dark.
7. TUI Shows "(no output)"
This one drove me crazy before I figured it out. The TUI shows "(no output)" for every reply and nothing seems to work.
If you configured Telegram delivery with the --deliver flag but you've never actually sent a direct message to your bot on Telegram, the delivery failure kills the entire output pipeline, not just Telegram delivery but everything.
The fix is absurdly simple: open Telegram, send one message to your bot, and the pipeline unblocks and everything starts flowing.
8. Messages Getting Dropped
When your agent is busy processing a request and new messages come in, they can get silently dropped. You'll never know they existed and the sender thinks they were heard, but they weren't.
Enable queue mode:
Every message gets queued and processed in order with nothing lost, even if 5 messages pile up while the agent is in the middle of a long tool call.
9. Local Memory with QMD
If you want to avoid paying for embedding APIs, OpenClaw's QMD backend does BM25 keyword search, vector search, and reranking entirely on your local machine. It requires the qmd binary (install via bun install -g github.com/tobi/qmd ) and runs local GGUF models for embeddings and reranking.
The default builtin backend already does hybrid BM25 + vector search, but QMD adds a local reranker on top and can index multiple external folders beyond your workspace. The tradeoff is more moving parts and heavier CPU/disk usage, so if your memory set is small and mostly workspace Markdown, the builtin hybrid search is already solid without QMD. First search will be slow since QMD downloads local models (~300MB+) on the first query.
10. Making Responses Feel Human
Instant replies on Telegram and Discord feel robotic. Real people don't respond in 200 milliseconds. It immediately signals "this is a bot" to anyone paying attention.
Responses now arrive with natural timing, somewhere between 0.8 and 2.5 seconds of delay. It feels like a person typing instead of a machine firing back instantly, and that small detail makes a big difference in how people interact with your agent.
11. Controlling Who Can Spawn What
First, understand the difference between agents and subagents because most people conflate them.
Agents are your team, where each one is a distinct personality with its own workspace, its own SOUL (.md), its own model, and its own role. Think of them as employees: your CEO agent handles strategy and external communication, your CTO agent handles technical decisions, and your CMO handles content and marketing. They're all top-level, persistent, and always available.
Subagents are temporary background workers that any agent on your team can spawn to handle a task without blocking its main conversation. The subagent runs in an isolated session, does its work, and reports back when finished before getting archived. They're one-off workers, not permanent team members.
The important distinction: by default, subagents cannot spawn other subagents. This is intentional. It prevents runaway delegation chains that burn tokens exponentially. There's a feature request to make this configurable, but for now, the nesting stops at one level per spawn.
To control which agents your team members can delegate to, set up allowlists:
Your CEO can spawn subagents under the CTO, CMO, or CRO agent identities. Your CMO can't spawn work under the CTO's identity unless you explicitly allow it. This gives you a clean organizational hierarchy without runaway cross-delegation.
12. Different Models for Different Agents
Not every agent on your team needs the most expensive model. Your CEO needs strong reasoning for orchestration and decision-making. Your CTO agents need code-optimized models. Your COO running operational tasks can use something lighter and cheaper.
The pattern:
Your CEO runs on Opus for complex reasoning and strategic decisions. Your CTO runs on Codex for code generation and technical work. Your COO runs on Haiku for quick operational tasks and routine coordination.
Set global defaults in your config and then override per agent. This is how you control costs without sacrificing quality where it matters. A subagent doing a quick research task doesn't need Opus. Set a cheaper model as the subagent default via agents.defaults.subagents.model and keep your top-level agents on the higher-quality models.
13. Concurrency Settings
OpenClaw defaults are conservative on purpose. Two settings control parallel processing:
maxConcurrent: 4 controls how many top-level sessions can run at the same time. subagents.maxConcurrent: 8 controls how many subagent sessions can run in parallel.
If your hardware can handle it, crank these up. More concurrency means your multi-agent system processes work faster instead of queuing everything behind a bottleneck.
14. Nesting and Delegation Depth
By default, subagents cannot spawn other subagents. OpenClaw hardcodes this restriction to prevent runaway fan-out where delegation chains spiral and burn tokens exponentially.
The architecture that works is having your top-level agents (CEO, CTO, CMO) each spawn subagents for background work, and those subagents complete their tasks in a single shot and report back. No deeper nesting and no subagent-spawning-subagent chains.
There's a configurable override being discussed (via allowNesting: true or adding sessions_spawn to subagent tool allowlists), but the default behavior exists for a good reason. Keep your delegation flat where the top-level agent breaks work into atomic units that a subagent completes and returns.