When a single agent wins
Running the decomposition framework on ecommerce support, and landing on one agent. When the answer flips to multi-agent, and where the framework strained.
A framework that always says "build the sophisticated thing" is not a framework. It is a sales pitch. The real test of the decomposition methodology from Part 2 is whether it will tell you not to split when splitting is the wrong call.
So let's run it, without bias, on a domain that is structurally different from the ATS recruiting agent we decomposed in Part 2: a consumer ecommerce support agent. The interesting question is whether the methodology lands where the industry has already empirically converged (Intercom Fin, Gorgias, Zendesk AI) on a single agent.
Key takeaways
- Run on consumer ecommerce support, the framework lands on a single agent, matching the Intercom Fin pattern from Part 1.
- Zero split criteria fire decisively; one (instruction-set bloat) is marginal. No evidence, no split.
- Even a single agent makes real placement decisions, exercising three of the four patterns; placement is topology-independent.
- Same five limits as the Part 2 recruiting agent, opposite verdict: the framework reads the evidence, it does not pattern-match.
- Change the shape (B2B, regulated, marketplace) and the framework flips to multi-agent.
The domain
An AI support agent for a mid-market consumer ecommerce site: the workload that human Tier-1 support handled before automation. It is consumer-facing, single-issue-per-session typical, and high-volume. This is the Intercom Fin pattern from Part 1: a single agent that switches roles based on conversation context, with orchestration living implicitly inside one model's reasoning.
The ATS recruiting agent from Part 2 is the foil here. That domain has 15+ workflows, FCRA- and TCPA-regulated tools, bulk-write blast radius, and compound "advance these, then schedule them, but skip conflicts" utterances. Multiple split criteria fire, permissions divergence decisively, and a coordinator-plus-specialists topology is justified. Ecommerce support looks superficially similar ("a support agent with lots of workflows") and resolves completely differently. That contrast is the whole point.
The inputs
Applying the pre-work from Part 2, #1. Expand each input to see what the domain actually looks like.
Workload inventory: ~10 clusters, ~25 sub-workflows
- Order operations: status, modification, cancellation
- Returns and refunds
- Product information
- Account management
- Promo and pricing
- Shipping issues
- Pre-purchase advice
- Policy questions
- Escalation to a human
- Fraud and dispute intake
~10 clusters, ~25 sub-workflows. Vocabulary stays consistent throughout: orders, products, customers.
Tool surface: 10 tools, 5 of them Type 2
- Order management: Type 2 (orders have lifecycles)
- Returns management: Type 2
- Payments / refunds: Type 2 boundary (the refund record has a lifecycle)
- Inventory: Type 1
- Customer profile / CRM: Type 1
- Knowledge base: Type 1
- Shipping carrier: Type 2 (tracking lifecycle)
- Communications: Type 1
- Loyalty: Type 1
- Human handoff: Type 1 (creates a Type 2 ticket elsewhere)
User patterns: consumer, single-issue, short
- Consumers with variable technical literacy
- Single-issue sessions, typically 1–10 turns
- High emotional valence in problem cases
- Compound intent is less common than recruiting, but it happens ("cancel my order and refund the discount I lost")
- Anonymous → identified mid-session is common
- High volume, so cost per interaction matters more than per-user delight
Constraints: PCI, privacy, fraud, cost, latency
- PCI: card data must not flow through prompts or logs
- Privacy: GDPR and CCPA
- Fraud and chargeback risk on refunds and address changes
- Brand voice
- Cost-sensitive: high volume, low margin
- Latency-sensitive
- Multi-channel: web chat, email, sometimes SMS
Does anything force a split?
The decision flow from Part 2 says: try a single agent, and only escalate if a named limit fires. Here is each limit, evaluated honestly against the inputs.
| Limit | Fires? | Evidence |
|---|---|---|
| Permissions / policy divergence | No (mostly) | All workflows operate on the same customer’s data with similar permission shape. Refunds have higher blast radius but are policy-capped. Fraud needs stricter auth, but auth is a precondition, not a separate permission surface. |
| Instruction-set bloat | Marginal | 10 clusters, ~25 workflows, consistent vocabulary. Manageable in one system prompt with policy templates. |
| Context saturation | No | A customer’s order history is small (often <10 active). KB retrieval is RAG-scoped, not full-context load. |
| Parallel work | No | Customers rarely need parallel multi-system reads. An order lookup plus a tracking call is sequential and trivially fast. |
| Trace legibility | Adequate | Most decisions are simple lookups or single-policy applications. Audit needs are modest at this scale. |
Zero criteria fire decisively. One, instruction-set bloat, is marginal. Per the heuristic, the evidence to justify multi-agent simply is not here.
Now put it beside the ATS recruiting agent from Part 2. Same five limits, opposite verdict, on the strength of the domain's shape alone:
| Limit | Ecommerce support | ATS recruiting |
|---|---|---|
| Permissions / policy divergence | No (mostly) | Yes, strongly |
| Instruction-set bloat | Marginal | Yes |
| Context saturation | No | Moderate |
| Parallel work | No | Yes |
| Trace legibility | Adequate | Needed |
Same nominal job description, "a support agent with lots of workflows," and the framework reads the evidence to one answer here and the opposite there. That is the discrimination it is supposed to do.
Verdict: single agent
A single self-orchestrating agent, with the workflow clusters available via tools and a comprehensive system prompt. This matches the Intercom Fin architecture exactly: the framework independently lands on what the industry has already validated. Here is why each alternative loses:
| Topology | Why it is not the right choice here |
|---|---|
| Router + specialists | Workflows are not independent enough: a return naturally pulls in order, customer, and shipping data. A flat router would route each turn somewhere different and lose continuity. |
| Coordinator + specialists | No multi-step workflows that pause/resume across many turns, no transition gates spanning specialists, no compound intent beyond what one capable model handles. A coordinator buys complexity without solving a problem. |
| Peer agents with handoffs | The domains are not peers; they are sub-areas of a single domain (customer support). |
| Plan-then-execute | Customer issues are conversational and exploratory; they do not support pre-computed plans. |
The shape is the mirror image of the Part 2 recruiting agent: one reasoning node instead of a coordinator over specialists, with the tools hanging directly off it. The agent self-orchestrates the cross-tool sequences a coordinator would otherwise own.
- Agent
- Support agent → Orders
- Support agent → Returns
- Support agent → Payments
- Support agent → Knowledge
- Support agent → Shipping
- Tool
- Tool
- Tool
- Tool
- Tool
The framework's value here is what it prevents: over-engineering. A team excited about multi-agent could easily land on a "specialist per cluster" architecture and pay multi-agent's token premium (Part 1's reminder that multi-agent systems run several times the token cost of a single agent) without a single failure mode to justify it.
Cross-cutting concerns still need placement
A single agent does not mean a single decision. Even here, every cross-cutting concern needs a deliberate placement. Three of Part 2's four patterns show up; the fourth, coordinator policy, collapses into embedding the rule in the prompt when there is only one agent.
| Concern | Placement | Why |
|---|---|---|
| Identity verification | Embedded (prompt gate) | Auth state is part of ephemeral session state; the agent refuses to act before verification. |
| Brand voice / tone | Embedded | Single specialist, stable rule, low audit need. The canonical embedded case. |
| Refund policy (window, caps) | Embedded as policy variables | Stable rules; variables live in config and template into the prompt. |
| Fraud screening | Embedded for v1 | Trigger for auditor extraction: when fraud rules need independent versioning, cross-incident correlation, or an immutable trail. |
| PII handling / PCI | Externally encapsulated | Payment data is tokenized before it reaches the agent; logging redacts PII before persistence. |
| Escalation rules | Embedded + handoff tool | Rules are stable; the actual handoff is a Type 1 tool call. |
| Channel-specific behavior | Embedded (channel as prompt variable) | The channel context is injected; the agent adapts output length and formality. |
Five of the ten tools are Type 2. In a single-agent system the agent itself orchestrates cross-tool sequences: a return that triggers a refund that triggers a restock notification. In a multi-agent system the coordinator would. The work does not disappear; its owner changes.
When the answer flips
Here is the proof the framework discriminates rather than pattern-matching "ecommerce → single agent." Change the shape of the business and the same methodology returns a different topology. Flip through the variants:
Multi-user accounts, contracts, NET-30 terms. Permissions diverge (procurement vs. finance vs. end-user roles), and contract management is a workflow genuinely distinct from order ops. → Router or coordinator + specialists.
Pharmacy, alcohol, firearms, financial products. Regulatory weight (age verification, prescription verification, KYC) creates real permission divergence. → Coordinator + specialists, with the regulated workflow isolated.
Consumer-to-consumer or consumer-to-merchant. Two personas with genuinely different permissions, escalation paths, and fraud surfaces. → Coordinator + specialists.
Same nominal domain, different shapes, different answers. That is what a discriminating framework looks like: it reasons about the inputs instead of matching a label.
What held, and where it strained
Honest stress-testing means reporting what strained, not just what held. One nuance surfaced.
What held up better than expected: authentication works as a meta-gate even in a single-agent system (it lives in the prompt's reasoning); multi-channel does not fragment the topology (it is a Q5 output-formatting concern, not a Q3 routing concern); high emotional valence changes the prompt, not the architecture; and PCI is a clean external-encapsulation case. The framework generalized to a structurally different domain and produced sensible results; its real value was the over-engineering it talked us out of.
The series in one line each
- Part 1: every agent has an orchestration layer answering five questions; name it or the model improvises it.
- Part 2: decompose on evidence; single agent first, permissions divergence is the strongest split signal.
- Part 3: the framework will tell you NOT to split when splitting is wrong, and flips when the shape changes.
This series covered the decision: what shape, and why. It deliberately stopped short of the machinery that comes after the boundaries are drawn: handoff protocols, failure recovery and fallback, the implementation of HITL and transition gates, observability of orchestration decisions, and cross-session continuity. Each is a substantial topic in its own right. Get the decomposition decision right first; the machinery is far easier to build on top of boundaries you can defend.
References
- Intercom, Fin AI Agent overview and customer stories
- OpenAI, A Practical Guide to Building Agents
- Anthropic, Building Effective Agents
- Anthropic, How we built our multi-agent research system