When a single agent wins

A framework that always says "build the sophisticated thing" is not a framework. It is a sales pitch. The real test of the decomposition methodology from Part 2 is whether it will tell you not to split when splitting is the wrong call.

So let's run it, without bias, on a domain that is structurally different from the ATS recruiting agent we decomposed in Part 2: a consumer ecommerce support agent. The interesting question is whether the methodology lands where the industry has already empirically converged (Intercom Fin, Gorgias, Zendesk AI) on a single agent.

Key takeaways

Run on consumer ecommerce support, the framework lands on a single agent, matching the Intercom Fin pattern from Part 1.
Zero split criteria fire decisively; one (instruction-set bloat) is marginal. No evidence, no split.
Even a single agent makes real placement decisions, exercising three of the four patterns; placement is topology-independent.
Same five limits as the Part 2 recruiting agent, opposite verdict: the framework reads the evidence, it does not pattern-match.
Change the shape (B2B, regulated, marketplace) and the framework flips to multi-agent.

01THE DOMAIN

The domain

An AI support agent for a mid-market consumer ecommerce site: the workload that human Tier-1 support handled before automation. It is consumer-facing, single-issue-per-session typical, and high-volume. This is the Intercom Fin pattern from Part 1: one customer-facing agent that switches roles based on conversation context. That describes the public product experience, not Fin's private model or runtime topology.

The ATS recruiting agent from Part 2 is the foil here. That domain has 15+ workflows, FCRA- and TCPA-regulated tools, bulk-write blast radius, and compound "advance these, then schedule them, but skip conflicts" utterances. Multiple split criteria fire, with capability and policy divergence decisive, and a coordinator-plus-specialists topology is justified. Ecommerce support looks superficially similar ("a support agent with lots of workflows") and resolves completely differently. That contrast is the whole point.

02THE INPUTS

The inputs

Applying the pre-work from Part 2, #1. Expand each input to see what the domain actually looks like.

Workload inventory: ~10 clusters, ~25 sub-workflows

Order operations: status, modification, cancellation
Returns and refunds
Product information
Account management
Promo and pricing
Shipping issues
Pre-purchase advice
Policy questions
Escalation to a human
Fraud and dispute intake

~10 clusters, ~25 sub-workflows. Vocabulary stays consistent throughout: orders, products, customers.

Tool surface: 10 tools, 5 of them Type 2

Order management: Type 2 (orders have lifecycles)
Returns management: Type 2
Payments / refunds: Type 2 boundary (the refund record has a lifecycle)
Inventory: Type 1
Customer profile / CRM: Type 1
Knowledge base: Type 1
Shipping carrier: Type 2 (tracking lifecycle)
Communications: Type 1
Loyalty: Type 1
Human handoff: Type 1 (creates a Type 2 ticket elsewhere)

User patterns: consumer, single-issue, short

Consumers with variable technical literacy
Single-issue sessions, typically 1–10 turns
High emotional valence in problem cases
Compound intent is less common than recruiting, but it happens ("cancel my order and refund the discount I lost")
Anonymous → identified mid-session is common
High volume, so cost per interaction matters more than per-user delight

Constraints: PCI, privacy, fraud, cost, latency

PCI: card data must not flow through prompts or logs
Privacy: GDPR and CCPA
Fraud and chargeback risk on refunds and address changes
Brand voice
Cost-sensitive: high volume, low margin
Latency-sensitive
Multi-channel: web chat, email, sometimes SMS

03DOES ANYTHING FORCE A SPLIT

Does anything force a split?

The decision flow from Part 2 says: try a single agent, and only escalate if a named limit fires. Here is each limit, evaluated honestly against the inputs.

Limit	Fires?	Evidence
Permissions / policy divergence	No (mostly)	All workflows operate on the same customer’s data with similar permission shape. The runtime passes externally verified auth state, and refund or fraud tools enforce their own authorization, limits, and approval rules. The single agent reasons within those boundaries; it does not enforce them.
Instruction-set bloat	Marginal	10 clusters, ~25 workflows, consistent vocabulary. Manageable in one system prompt with policy templates.
Context saturation	No	A customer’s order history is small (often <10 active). KB retrieval is RAG-scoped, not full-context load.
Parallel work	No	Customers rarely need parallel multi-system reads. An order lookup plus a tracking call is sequential and trivially fast.
Trace legibility	Adequate	Most decisions are simple lookups or single-policy applications. Audit needs are modest at this scale.

Zero criteria fire decisively. One, instruction-set bloat, is marginal. Per the heuristic, the evidence to justify multi-agent simply is not here.

Now put it beside the ATS recruiting agent from Part 2. Same five limits, opposite verdict, on the strength of the domain's shape alone:

Limit	Ecommerce support	ATS recruiting
Permissions / policy divergence	No (mostly)	Yes, strongly
Instruction-set bloat	Marginal	Yes
Context saturation	No	Moderate
Parallel work	No	Supporting
Trace legibility	Adequate	Needed

Same nominal job description, "a support agent with lots of workflows," and the framework reads the evidence to one answer here and the opposite there. That is the discrimination it is supposed to do.

04THE VERDICT

Verdict: single agent

A single self-orchestrating agent, with the workflow clusters available via tools and a comprehensive system prompt. This lands on the same broad single-agent pattern Intercom publicly describes for Fin; its private implementation details are not public. Here is why each alternative loses:

Topology	Why it is not the right choice here
Router + specialists	Workflows are not independent enough: a return naturally pulls in order, customer, and shipping data. A flat router would route each turn somewhere different and lose continuity.
Coordinator + specialists	No multi-step workflows that pause/resume across many turns, no transition gates spanning specialists, no compound intent beyond what one capable model handles. A coordinator buys complexity without solving a problem.
Peer agents with handoffs	The domains are not peers; they are sub-areas of a single domain (customer support).
Plan-then-execute	Customer issues are conversational and exploratory; they do not support pre-computed plans.

The shape is the mirror image of the Part 2 recruiting agent: one reasoning node instead of a coordinator over specialists, with the tools hanging directly off it. The agent self-orchestrates the cross-tool sequences a coordinator would otherwise own.

The verdict: a single self-orchestrating agent calling its tools directly. A representative subset of the ten tools is shown; tap a node for its type.

The framework's value here is what it prevents: over-engineering. A team excited about multi-agent could easily land on a "specialist per cluster" architecture and pay multi-agent's token premium (Part 1's reminder that multi-agent systems run several times the token cost of a single agent) without a single failure mode to justify it.

05CONCERNS, STILL PLACED

Cross-cutting concerns still need placement

A single agent does not mean a single decision. Even here, every cross-cutting concern needs a deliberate placement. Three of Part 2's four patterns show up: embedded guidance, deterministic policy, and external encapsulation. Deterministic policy survives without a separate coordinator because the runtime and tools can enforce it around a single agent.

Concern	Placement	Why
Identity verification	Externally verified + tool-enforced	The runtime supplies verified auth state. Sensitive tools check it again before acting; the prompt only tells the agent when to request verification.
Brand voice / tone	Embedded	Single specialist, stable rule, low audit need. The canonical embedded case.
Refund policy (window, caps)	Prompt guidance + tool enforcement	Policy variables help the agent explain and apply the rules, while the refund tool enforces windows, caps, authorization, and approval thresholds.
Fraud screening	Tool-side policy + escalation	The fraud service or policy middleware scores and blocks actions; the agent gathers context and escalates. Add an auditor when cross-incident correlation or an immutable trail is required.
PII handling / PCI	Externally encapsulated	Payment data is tokenized before it reaches the agent; logging redacts PII before persistence.
Escalation rules	Embedded + handoff tool	Rules are stable; the actual handoff is a Type 1 tool call.
Channel-specific behavior	Prompt variable + transport controls	The agent adapts length and formality, while the channel boundary handles identity binding, consent, retries, delivery guarantees, and channel capabilities.

Five of the ten tools are Type 2. In a single-agent system the agent itself orchestrates cross-tool sequences: a return that triggers a refund that triggers a restock notification. The tools still enforce their own authorization, policy, and lifecycle invariants. In a multi-agent system the coordinator would own the sequence; the enforcement boundaries would remain the same.

06WHEN THE ANSWER FLIPS

When the answer flips

Here is the proof the framework discriminates rather than pattern-matching "ecommerce → single agent." Change the shape of the business and the same methodology returns a different topology. Flip through the variants:

Multi-user accounts, contracts, NET-30 terms. Permissions diverge (procurement vs. finance vs. end-user roles), and contract management is a workflow genuinely distinct from order ops. → Router or coordinator + specialists.

In every variant, topology isolates reasoning contexts and least-privileged tool sets; the underlying tools and APIs still enforce roles, consent, and regulatory preconditions.

Same nominal domain, different shapes, different answers. That is what a discriminating framework looks like: it reasons about the inputs instead of matching a label.

07WHAT HELD, WHAT STRAINED

What held, and where it strained

Honest stress-testing means reporting what strained, not just what held. One nuance surfaced.

What held up better than expected: externally verified authentication works as a meta-gate even in a single-agent system; multi-channel does not automatically fragment the reasoning topology, though channel identity, consent, delivery, and capabilities extend beyond Q5 formatting; high emotional valence changes the prompt, not the architecture; and PCI is a clean external-encapsulation case. The framework generalized to a structurally different domain and produced sensible results; its real value was the over-engineering it talked us out of.

The foundation in one line each

Part 1: every agent has an orchestration layer answering five questions; name it or the model improvises it.
Part 2: decompose on evidence; single agent first, capability divergence is the strongest architectural signal, with enforcement outside the prompt.
Part 3: the framework will tell you NOT to split when splitting is wrong, and flips when the shape changes.

Parts 1–3 form the foundational arc: what orchestration is, how to choose a system shape, and how to verify that a single agent really wins. The next parts move from that decision to the machinery it requires. Part 4 defines contracts for the boundaries that survived decomposition; Part 5 separates conversation history from the explicit workflow state needed to coordinate those contracts. Failure recovery, human gates, observability, and cross-session continuity build on that foundation later in the series.

References

Intercom, Fin AI Agent overview and customer stories
OpenAI, A Practical Guide to Building Agents
Anthropic, Building Effective Agents
Anthropic, How we built Claude Code auto mode: a safer way to skip permissions
Anthropic, How we built our multi-agent research system