OpenClaw Engineering, Chapter 10: Multi-Agent Systems

In Chapter 9, we covered how to schedule and orchestrate work. But we've mostly talked about single agents. The real power of agent frameworks emerges when you move beyond single-agent systems to multi-agent architectures. A solo consultant brings expertise in one domain. A consulting firm has specialists in strategy, operations, technology, finance, and more. They tackle multiple complex problems simultaneously and assemble teams tailored to specific needs. The same principle applies to agents: a team of specialized agents outperforms a single generalist.


The case for agent specialization

Each agent in a multi-agent system is a specialist with its own personality, tools, and knowledge base. One agent excels at planning and breaking down complex tasks. Another is expert in implementation and coding. A third is critical and analytical, good at spotting flaws. A fourth is observant and good at synthesizing information. The OpenClaw framework coordinates the team, routes work to the right specialists, facilitates communication, and orchestrates collaboration.

The benefits of this heterogeneous approach are substantial. First, specialization improves quality. A Planner agent trained for decomposition makes better plans than a generalist. A Coder optimized for implementation produces cleaner code. Second, parallel processing increases throughput. While the Planner thinks about strategy, the Coder works on implementation. Third, the system is more robust. If one agent fails, others continue. Fourth, debugging is easier. Each agent is focused and easier to understand than a monolithic system.

💡 Key idea: Multi-agent systems increase complexity. Don't jump to multi-agent architecture just because it sounds good. Start with a single agent, understand its failure modes, and scale to multiple agents when a single agent becomes the bottleneck or when specialization becomes valuable.

Routing via channel binding

A single Gateway can host multiple agent instances simultaneously. Each agent runs in its own context with its own memory, tools, and configuration. The Gateway acts as a router and message broker, directing incoming messages to the appropriate agents based on channel subscriptions. Channels are semantic strings describing a domain or topic of work: "content-generation," "code-review," "data-analysis," "customer-support."

When configuring your Gateway with multiple agents, each agent declares which channels it listens to. When a message arrives with a specified channel, the Gateway looks up all agents subscribed to that channel and delivers the message. The beauty of this pub-sub model is flexibility and decoupling. Adding a new agent is trivial—it just subscribes to the appropriate channels. Retiring an agent is also trivial—unsubscribe it. The central system doesn't need to know about all agents.

This routing pattern also enables graceful degradation. If an agent goes down or is overloaded, other agents listening to the same channel still receive messages. If you have two CodeAgents both subscribed to "code-generation," and one goes down, requests still get routed to the other. The system degrades gracefully rather than failing catastrophically.


Four roles: planner, coder, critic, surveyor

The most common multi-agent pattern divides responsibility into complementary roles. Each can be optimized independently. A Planner breaks down complex problems into actionable steps, reasons about sequence and dependencies, identifies what information is needed, and creates execution plans. Think of the Planner as a project manager or architect. When given a complex goal like "build a web application that processes user uploads," the Planner breaks it into phases: design database schema, build upload endpoint, implement storage, create reports.

A Coder executes: writes code that implements the Planner's design, debugs issues, integrates with existing systems, and produces working software. The Coder is a specialist in syntax, APIs, best practices, and making things work. When the Planner says "build a REST endpoint that accepts file uploads," the Coder writes the actual code.

A Critic reviews plans for logical flaws, reviews code for bugs and security issues, identifies missing requirements, and spots edge cases. Critics are skeptical by nature and assume things can go wrong. They ask hard questions: "What happens if this fails? Are we idempotent? Can we detect double charges? Is this secure?"

A Surveyor gathers and synthesizes information: searches for relevant information, pulls data from multiple sources, and keeps the team informed about the current state of knowledge. When the team needs to know "what's the best way to handle this edge case?", the Surveyor searches the codebase, reviews documentation, and brings back a comprehensive summary.

⚠️ Warning: Don't assign too many roles to a single agent. A single agent trying to be Planner+Coder usually does both poorly. Specialization and focus improve quality. If you must combine roles, be explicit about the constraints and document what quality you expect to lose.

Direct agent-to-agent communication

Multi-agent systems need communication mechanisms. The simplest is for agents to communicate through the central Gateway—Agent A sends a message, the Gateway routes it to Agent B. But OpenClaw also provides direct agent-to-agent communication that bypasses the Gateway: agentToAgent for synchronous conversations and sessions_send for asynchronous messaging.

agentToAgent enables synchronous, context-aware communication. When Agent A calls agentToAgent("CodeAgent", message), the message goes directly to the Code Agent, carrying full context about the caller, and the Code Agent responds synchronously. The calling agent blocks until it gets a response. This is high-bandwidth communication—agents can have a real conversation with back-and-forth. The Critic reviewing code might call agentToAgent("CodeAgent", "This function looks like it could have a race condition. Can you explain your synchronization strategy?"). The Code Agent responds immediately. This real-time back-and-forth is much richer than asynchronous communication.

sessions_send enables asynchronous, durable messaging. When Agent A calls sessions_send("CodeAgent", message), the message is persisted in a queue and the function returns immediately. The Code Agent processes it when ready. This is like email—fire-and-forget with eventual delivery. The advantage is decoupling: Agent A doesn't depend on Agent B being available. It sends a message and moves on. Agent B processes it whenever it has capacity. This enables scale and graceful degradation.

💡 Key idea: Use agentToAgent for immediate clarifications and high-stakes decisions where you need confirmation. Use sessions_send for bulk work distribution and when you're comfortable with eventual consistency. Most successful systems use a 70/30 split (mostly async with some sync).

Adversarial collaboration and taste gates

One of the most powerful patterns in multi-agent systems is adversarial collaboration: agents deliberately take opposing positions to stress-test ideas and find flaws. The concept comes from intellectual tradition: the best way to develop a strong thesis is to have intelligent people argue against it. Designate one agent as the Proposer (argues for the idea) and another as the Skeptic (argues against it). Let them have a conversation. The output is typically higher quality than either could produce alone.

A related pattern is Taste Gates: subjective filters that measure quality according to explicit criteria. Different agents might have different taste—one values simplicity, another values robustness, another values performance. Rather than merging these conflicting objectives, implement separate Taste Gates. Each gate evaluates output according to its criteria and produces a score. The best output is one that scores well across multiple gates. When generating code, you might have SimplicitGate, RobustnessGate, PerformanceGate, and SecurityGate. Generate multiple proposals, run each through all gates, and pick the proposal that optimizes the right tradeoff.

The beauty of this approach is that it works for any domain. Software architecture decisions, data pipeline design, API design, deployment strategy—all have tradeoffs. All benefit from evaluation through multiple lenses. Taste Gates force you to articulate what you care about and measure against it. Most organizations don't do this—they just build and hope. Better organizations use Taste Gates to make decisions systematically.


A real-world consulting analogy

Think about how consulting firms work. A solo consultant brings expertise in one domain. They can solve problems in their specialty but can't cover multiple specialties and their bandwidth is limited. A consulting firm is different. They have specialists in strategy, operations, technology, finance. They tackle multiple complex problems simultaneously. They assemble teams tailored to specific client needs—pairing different experts based on the problem domain. Clients get richer insights because they hear from multiple perspectives.

Multi-agent systems work the same way. Each agent is a specialist. The OpenClaw framework acts as the consulting firm's management. It coordinates the team, routes work, facilitates communication, and orchestrates collaboration. This heterogeneous approach is more powerful than any single agent because it leverages specialization, parallelism, diversity of perspective, and fault isolation.


Best practices for multi-agent systems

The best multi-agent systems have clear role clarity (every agent knows what it's supposed to do), explicit communication patterns (every agent knows how to talk to others), and shared evaluation criteria (every agent knows what good output looks like). These aren't nice-to-haves—they're essential. If you design your agents with adversarial roles and clear evaluation criteria, the system naturally produces good output. If you try to bolt these patterns on afterward, they feel forced and are often abandoned.

Also, don't overengineer. Start with the roles you actually need. A simple coding task might only need a Coder and a Critic. A research task might only need a Planner and a Surveyor. Only build all four roles when you genuinely need planning, implementation, quality assurance, and research working in concert.


What's next

We've covered the architecture of multi-agent systems and how to coordinate specialized agents. But agents are only as good as what they learn over time. Chapter 11 covers OpenClaw-RL: continuous learning in the background, training signals, and how agents improve their decision-making through experience.


📖 Get the complete book

All thirteen chapters and four appendices: the full Gateway and PiEmbeddedRunner walk-through, the Markdown brain spec, channel adapters for Telegram / WhatsApp / Discord / Slack, the SKILL.md authoring guide, the Lobster workflow language, multi-agent orchestration patterns, OpenClaw-RL training signals, the agentic zero-trust architecture, and the post-ClawHavoc supply-chain hardening playbook.

Get OpenClaw Engineering on Amazon →

2026-03-25

Sho Shimoda

I share and organize what I’ve learned and experienced.