What orchestration actually is

Every agent has an orchestration layer. Most teams just never decided what theirs does, so the model improvises it every turn, invisibly. That is the bug behind half your weird traces.

The orchestration layer is the function that decides what happens next when your agent is invoked. Get it right and the agent stays coherent across turns, tools, and workflows. Get it wrong and every other problem gets blamed on "the model": the hallucination, the wrong tool, the dropped context.

This is the opening guide in an ongoing series on agent orchestration. We start with what the layer is, the five questions it has to answer on every turn, and the moment a single agent stops being enough.

Key takeaways

Orchestration is a function ("what happens next"), not a component. It can live inside one LLM or in a separate coordinator.
Every turn, the layer answers five questions: where are we, what does the user want, who acts next, what context they need, how the answer returns.
Orchestration is not routing. Routing picks the next actor; orchestration is that plus state, sequencing, failure recovery, and human gates.
The orchestration layer owns the coordination semantics and trusted commit path for active workflow state, which may be durably persisted; long-term user memory remains an input.
Move to multi-agent only when a single agent demonstrably fails on a measurable failure mode, not for architecture aesthetics.

01FOUNDATIONS

It is a function, not a component

Orchestration is a function, not a box in your architecture diagram. The implementation is wide open. The same job can be done by any of these:

A single LLM that orchestrates itself through prompting
A separate router that picks a specialist
A state machine with deterministic transitions, calling an LLM only at slot-filling and generation nodes
A graph of nodes with conditional edges
A coordinator agent that delegates to worker agents
A plan-then-execute system: a planner emits a graph, an executor runs it

The implementation varies. The function, deciding what happens next, does not.

A bare LLM handles single-shot queries fine. You need an orchestration layer the moment the agent has to handle multiple workflows in one conversation, switch between tools and specialists, pause one workflow and resume another, recover when something fails, and stay coherent across many turns. Real chat agents need all five.

One distinction up front, because it causes more confusion than any other: orchestration is not routing. Routing picks the next actor, which is one of the five questions below. Orchestration is all five, plus failure recovery, retries, sequencing across actors, and human-in-the-loop gates. Reaching for "a router" when you actually need orchestration is a category error that surfaces later as missing state.

02THE FIVE QUESTIONS

The five questions every orchestrator answers

At every turn (or, for plan-then-execute agents, at every plan boundary), the orchestration layer has to answer five questions. Most orchestration-layer failures trace back to one of these answered badly.

What workflow is active, what has already been decided, what is paused, and what is still pending. This is the orchestrator's read of the board before it moves a piece.

A note on scope: this guide is primarily about chat-bound agents with a human in the loop. Although, the same five questions apply to autonomous long-horizon agents. They just get answered at plan time rather than per turn.

03A WORKED TURN

One turn, five questions

Here is a single turn through a customer support chat agent. Watch the five questions get answered in order.

The user says: "Hey, I ordered some headphones last Tuesday. Where's my package?"

Trace

Support chat agent: one turn

Thought
Q1: Where are we? First turn this session; no active workflow yet.
Thought
Q2: Intent is an order-status lookup. Cues: "headphones," "last Tuesday," "package."
Tool callorder_lookup220ms
Q3 / Q4: Call the order system with structured args, not the raw message.
Input
```
{
  "item": "headphones",
  "ordered": "2026-05-26",
  "customer": "<session>"
}
```

Observation

Order found: shipped and in transit.

Output

{
  "status": "shipped",
  "stage": "in_transit",
  "eta_days": 2
}

Final900ms
Q5: reply in the same conversational voice, streamed: "your headphones shipped and are in transit, about 2 days out."

Each question is also a failure mode, and an audit signal. When you read a real conversation trace, this is the table you are scoring it against. If you can call out which signals are healthy and which are broken, you have internalized the framework.

Question	Healthy signal	Broken signal
Q1 · Where are we?	Knows its place mid-conversation	Re-asks what you want, right after you have already said it
Q2 · What do they want?	Follows pivots; unpacks multi-intent messages	Routes to FAQ when you needed Order Status
Q3 · Who acts next?	Reaches for the right tool	Answers from training data instead of looking up
Q4 · What context?	Tool gets clean structured input	Tool gets the raw message and returns junk
Q5 · How does it return?	Voice stays consistent	Voice lurches; user feels handed off

One caveat worth stating plainly: failures inside specialists, tools, or data layers (model hallucinations, tool API errors, stale retrieval) are separate concerns. The orchestrator can route around them; it cannot fix them.

04TWO SYSTEMS

Two systems, same questions, different public patterns

There is no single right architecture. The only question that matters is whether the one you picked answers the five questions reliably for your workload. Two production systems make the point: the same questions can sit behind different product-level descriptions.

Intercom Fin is publicly presented as a single customer-facing agent that switches roles based on conversation context. That supports the broad single-agent pattern; it does not establish that orchestration lives inside one LLM. Intercom's documented engine includes retrieval, workflow checks, safety checks, and output validation, so the five questions may be handled across the runtime while the user experiences one agent.

Does the unified-agent pattern work? The published numbers say yes, within limits. Anthropic's own support deployment went from 36% at launch to 50.8% resolution within about a month (and roughly 58% a year on). Intercom reports individual customers such as Lightspeed resolving up to 65% and Sharesies almost 70%, against a fleet-wide average around 67% across 7,000+ customers, with 40M+ conversations resolved to date. Expect somewhere in the 40–70% band depending on knowledge-base quality and deployment maturity.

Product-level "single agent" and runtime topology are different claims. Fin shows that one customer-facing agent can present a unified experience; its public materials do not establish where every orchestration decision runs. Fountain publicly describes an explicit multi-agent execution layer. In either case, the five questions still need answers.

05STATE VS MEMORY

What the orchestrator does not own

The fastest way to wreck an orchestration layer is to let it absorb everything. When the orchestrator owns domain work, business logic, and permanent storage, it becomes a context bottleneck and a single point of failure.

Some things belong elsewhere by default:

Domain work belongs to the specialist.
Final authorization and policy enforcement belongs at a deterministic boundary: the tool, API, policy middleware, or approval gate. A specialist's prompt can guide behavior, but it is not a security boundary.
Business logic belongs in the tool or specialist, not in the prompt.

The subtlest line is between long-term user memory and active workflow state. The orchestration layer consumes one and owns the coordination semantics and trusted commit path for the other. That ownership does not require in-memory or short-lived storage: active workflow state may be durably persisted and linked to runtime checkpoints.

Long-term user memory: an input

User history, preferences, and prior conversations live in a dedicated memory store under this responsibility split. The orchestration layer queries that memory as an input; it does not treat it as the workflow ledger.

Active workflow state: coordinated & committed

What workflow is active, what is paused, what confirmations are pending, and what gates are in progress. This is the data layer behind Q1. The orchestration layer owns its schema, coordination rules, and trusted commit path, while a state store or checkpoint system may persist it durably.

06WHEN ONE AGENT BREAKS

When a single agent stops being enough

Start with one agent. Escalate only when it demonstrably fails. The trigger is evidence, not aesthetics, so it helps to know exactly which limits force the split. Try a few workload shapes:

Single agent, or split?

Tool permissions across workflows

System-prompt fit

Work shape

ALLOWNo limit fires. Ship a single agent and re-evaluate when failures emerge.

Five empirical pressures tell you when to reconsider a single agent, in rough order of strength. Not every pressure forces a split on its own:

Permissions / policy divergence. Workflows that need different capabilities, approvals, or regulatory handling are the strongest reason to isolate specialist contexts and tool sets. Enforce the distinction at the tool or API boundary: topology improves least privilege and auditability, but a separate prompt is not itself a security boundary.
Instruction-set bloat. When domain instructions and tool descriptions no longer fit coherently in one prompt, routing accuracy degrades. The ceiling is workload-specific, not a fixed workflow count.
Context-window saturation. Long sessions with many tool results overflow even large windows. Scoped per-specialist context is a workaround.
Parallel work. Start with concurrent tool calls under one agent. Split when independent tasks also benefit from separate context windows, ownership, retry and recovery loops, or token budgets.
Trace legibility. When many domains mash into one prompt, "why did the agent do that?" gets harder to answer.

What does not justify a split: persona or tone alone. Prompt templates with policy variables handle that. Persona is only a real trigger when it is paired with different permissions or escalation paths.

Read any agent trace in one pass

Q1: does it know where it is mid-conversation, or does each turn start fresh?
Q2: when you pivot or pack two intents into one message, does it keep up?
Q3: does it reach for the right tool, or improvise from training data?
Q4: does the tool get clean structured input, or a dump of the raw chat?
Q5: does the user-facing voice stay consistent?

References

Anthropic, How we built our multi-agent research system
Anthropic, Building Effective Agents
OpenAI, A Practical Guide to Building Agents
OpenAI Agents SDK, Running agents
Anthropic, How we built Claude Code auto mode: a safer way to skip permissions
Intercom, Fin AI Agent explained
Intercom, The Fin AI Engine
Intercom, Fin financial-services resolution benchmarks
Fountain, Fountain Launches Cue (April 14 2026)