Back to Blog
Agent Decomposition

What orchestration actually is

Every agent has an orchestration layer deciding what happens next. The five questions it must answer, and when one agent stops being enough.

12 min read
By Echo Theory Labs
agentsarchitectureorchestrationtopologydecomposition

Every agent has an orchestration layer. Most teams just never decided what theirs does, so the model improvises it every turn, invisibly. That is the bug behind half your weird traces.

The orchestration layer is the function that decides what happens next when your agent is invoked. Get it right and the agent stays coherent across turns, tools, and workflows. Get it wrong and every other problem gets blamed on "the model": the hallucination, the wrong tool, the dropped context.

This is the first of three guides on agent orchestration. We start with what the layer is, the five questions it has to answer on every turn, and the moment a single agent stops being enough.

Key takeaways

  • Orchestration is a function ("what happens next"), not a component. It can live inside one LLM or in a separate coordinator.
  • Every turn, the layer answers five questions: where are we, what does the user want, who acts next, what context they need, how the answer returns.
  • Orchestration is not routing. Routing picks the next actor; orchestration is that plus state, sequencing, failure recovery, and human gates.
  • The orchestrator owns ephemeral conversation state, not persistent memory. It queries memory, it does not store it.
  • Move to multi-agent only when a single agent demonstrably fails on a measurable failure mode, not for architecture aesthetics.
01FOUNDATIONS

It is a function, not a component

Orchestration is a function, not a box in your architecture diagram. The implementation is wide open. The same job can be done by any of these:

  • A single LLM that orchestrates itself through prompting
  • A separate router that picks a specialist
  • A state machine with deterministic transitions, calling an LLM only at slot-filling and generation nodes
  • A graph of nodes with conditional edges
  • A coordinator agent that delegates to worker agents
  • A plan-then-execute system: a planner emits a graph, an executor runs it

The implementation varies. The function, deciding what happens next, does not.

A bare LLM handles single-shot queries fine. You need an orchestration layer the moment the agent has to handle multiple workflows in one conversation, switch between tools and specialists, pause one workflow and resume another, recover when something fails, and stay coherent across many turns. Real chat agents need all five.

One distinction up front, because it causes more confusion than any other: orchestration is not routing. Routing picks the next actor, which is one of the five questions below. Orchestration is all five, plus failure recovery, retries, sequencing across actors, and human-in-the-loop gates. Reaching for "a router" when you actually need orchestration is a category error that surfaces later as missing state.

02THE FIVE QUESTIONS

The five questions every orchestrator answers

At every turn (or, for plan-then-execute agents, at every plan boundary), the orchestration layer has to answer five questions. Most orchestration-layer failures trace back to one of these answered badly.

What workflow is active, what has already been decided, what is paused, and what is still pending. This is the orchestrator's read of the board before it moves a piece.

A note on scope: this guide is primarily about chat-bound agents with a human in the loop. Although, the same five questions apply to autonomous long-horizon agents. They just get answered at plan time rather than per turn.

03A WORKED TURN

One turn, five questions

Here is a single turn through a customer support chat agent. Watch the five questions get answered in order.

The user says: "Hey, I ordered some headphones last Tuesday. Where's my package?"

Trace

Support chat agent: one turn

  1. Thought

    Q1: Where are we? First turn this session; no active workflow yet.

  2. Thought

    Q2: Intent is an order-status lookup. Cues: "headphones," "last Tuesday," "package."

  3. Tool callorder_lookup220ms

    Q3 / Q4: Call the order system with structured args, not the raw message.

    Input

    {
      "item": "headphones",
      "ordered": "2026-05-26",
      "customer": "<session>"
    }
  4. Observation

    Order found: shipped and in transit.

    Output

    {
      "status": "shipped",
      "stage": "in_transit",
      "eta_days": 2
    }
  5. Final900ms

    Q5: reply in the same conversational voice, streamed: "your headphones shipped and are in transit, about 2 days out."

Each question is also a failure mode, and an audit signal. When you read a real conversation trace, this is the table you are scoring it against. If you can call out which signals are healthy and which are broken, you have internalized the framework.

QuestionHealthy signalBroken signal
Q1 · Where are we?Knows its place mid-conversationRe-asks what you want, right after you have already said it
Q2 · What do they want?Follows pivots; unpacks multi-intent messagesRoutes to FAQ when you needed Order Status
Q3 · Who acts next?Reaches for the right toolAnswers from training data instead of looking up
Q4 · What context?Tool gets clean structured inputTool gets the raw message and returns junk
Q5 · How does it return?Voice stays consistentVoice lurches; user feels handed off

One caveat worth stating plainly: failures inside specialists, tools, or data layers (model hallucinations, tool API errors, stale retrieval) are separate concerns. The orchestrator can route around them; it cannot fix them.

04TWO SYSTEMS

Two systems, same questions, different architectures

There is no single right architecture. The only question that matters is whether the one you picked answers the five questions reliably for your workload. Two production systems make the point: same questions, opposite implementations.

Intercom Fin is a customer-service agent built as a single agent that switches roles based on conversation context. Its orchestration layer is implicit: it lives inside one LLM's reasoning. The five questions still get answered every turn, by the same model that writes the reply.

Does implicit orchestration work? The published numbers say yes, within limits. Anthropic's own support deployment went from 36% at launch to 50.8% resolution within about a month (and roughly 58% a year on). Intercom reports individual customers such as Lightspeed resolving up to 65% and Sharesies almost 70%, against a fleet-wide average around 67% across 7,000+ customers, with 40M+ conversations resolved to date. Expect somewhere in the 40–70% band depending on knowledge-base quality and deployment maturity.

Implicit or explicit, the work is the same. The architecture only decides where the five questions get answered: inside one model's head, or in a component you can name and inspect.

05STATE VS MEMORY

What the orchestrator does not own

The fastest way to wreck an orchestration layer is to let it absorb everything. When the orchestrator owns domain work, business logic, and permanent storage, it becomes a context bottleneck and a single point of failure.

Some things belong elsewhere by default:

  • Domain work belongs to the specialist.
  • Final permission enforcement belongs to the tool or specialist itself: defense in depth, not a prompt instruction.
  • Business logic belongs in the tool or specialist, not in the prompt.

The subtlest line is between persistent memory and ephemeral state. The orchestrator does not own one and does own the other.

Persistent memory: an input

Long-term user history, preferences, prior conversations. Lives in a dedicated memory store. The orchestrator queries it; it does not own it. Persistent memory is something the orchestrator consumes.

Ephemeral state: owned & mutated

What workflow is active, what is paused, what confirmations are pending, what gates are in progress. This is the data layer behind Q1. No single specialist can see across the whole conversation, so the orchestrator holds it.

06WHEN ONE AGENT BREAKS

When a single agent stops being enough

Start with one agent. Escalate only when it demonstrably fails. The trigger is evidence, not aesthetics, so it helps to know exactly which limits force the split. Try a few workload shapes:

Single agent, or split?

Tool permissions across workflows

System-prompt fit

Work shape

ALLOWNo limit fires. Ship a single agent and re-evaluate when failures emerge.

Five empirical limits push you off a single agent, in rough order of strength:

  • Permissions / policy divergence. Workflows with genuinely different tool permissions (a billing flow that needs payment-API access; a tech-support flow that must not have it) cannot safely share an agent. This is the strongest case for splitting.
  • Instruction-set bloat. When domain instructions and tool descriptions no longer fit coherently in one prompt, routing accuracy degrades. The ceiling is workload-specific, not a fixed workflow count.
  • Context-window saturation. Long sessions with many tool results overflow even large windows. Scoped per-specialist context is a workaround.
  • Parallel work. A single agent reasons sequentially. Simultaneous lookups across independent systems need parallel tool calling or parallel specialists.
  • Trace legibility. When many domains mash into one prompt, "why did the agent do that?" gets harder to answer.

What does not justify a split: persona or tone alone. Prompt templates with policy variables handle that. Persona is only a real trigger when it is paired with different permissions or escalation paths.

Read any agent trace in one pass

  • Q1: does it know where it is mid-conversation, or does each turn start fresh?
  • Q2: when you pivot or pack two intents into one message, does it keep up?
  • Q3: does it reach for the right tool, or improvise from training data?
  • Q4: does the tool get clean structured input, or a dump of the raw chat?
  • Q5: does the user-facing voice stay consistent?

Next: Part 2 — How to decompose an agent, where we turn these five questions into a decision procedure for choosing your topology.

References