6 паттернов оркестрации · LangGraph vs CrewAI · MCP + A2A · MAST observability · production runbook
Demo с одним Agent в Cursor работает — production требует parallel research, tool isolation и human approval gates под общим token budget. Monolithic agent упирается в context limits, jack-of-all-trades drift, zero concurrency и single point of failure. Гайд для AI-инженеров и tech leads, переходящих на Multi-Agent Systems (MAS): шесть orchestration patterns, матрица LangGraph vs CrewAI vs AutoGen, protocol stack MCP + A2A, 6-step production runbook (PostgresSaver, HITL interrupts, circuit breakers), данные MAST по 1 642 traces, pitfalls и decision tree на 2026.
Monolithic agent — один system prompt, один tool list, один thread — демо собирается за день. Под реальной нагрузкой он становится bottleneck. Внутренний бенчмарк Google Agent Bake-Off (MLflow production guide 2026) показал: decomposed multi-agent архитектура сократила время обработки с 60 до 10 минут — 6x speedup. Отдельно исследование AdaptOrch (2026) формально доказало: orchestration topology объясняет на 12–23% больше variance в task success, чем смена underlying model.
Перед выбором framework зафиксируйте structural limits, которые вынуждают split на MAS.
Context window saturation: Research, code, logs и tool outputs копятся в одном thread. Retrieval quality падает; agent забывает constraints, заданные десять turns назад.
Jack-of-all-trades prompting: Одна persona не может одновременно excel в SQL tuning, legal review и UI copy. Instruction interference поднимает hallucination rate.
No true concurrency: Sequential tool calls блокируют друг друга. Независимые subtasks (scrape трёх sites, run трёх test suites) сжигают wall-clock time.
Single point of failure: Один bad tool result или runaway loop убивает всю session. Нет isolation domain для retries и rollbacks.
Opaque cost attribution: Finance не может ответить, какой step сжёг tokens. Без per-agent budgets один verbose researcher agent drains monthly cap.
Topology beats model. AdaptOrch: orchestration structure drives 12–23% больше outcome variance, чем model choice — проектируйте graph до upgrade GPT tiers.
Multi-Agent System (MAS) — координированный набор LLM-powered agents, которые share state, delegate subtasks и expose specialized capabilities. Каждый agent — не просто prompt variant, а bounded runtime со своими tools, memory scope и termination policy.
| Trait | Meaning в LLM agents | Production signal |
|---|---|---|
| Autonomy | Выбирает next action без per-step human input | Нужны guardrails: max iterations, budget caps |
| Reactivity | Реагирует на tool results и peer messages | Structured message schema, не только free text |
| Proactivity | Initiates subtasks при incomplete goals | Runaway loops без supervisor checks |
| Social ability | Delegates и negotiates с другими agents | A2A discovery и clear handoff contracts |
| Topology | Control flow | Best for | Risk |
|---|---|---|---|
| Centralized | Один orchestrator routes все messages | Predictable audit trails, strict policy | Orchestrator context bloat; SPOF at router |
| Decentralized | Peers message напрямую; нет single boss | Resilient swarms, emergent collaboration | Hard to debug; termination не guaranteed |
| Hierarchical | Supervisor delegates workers; workers report up | Enterprise workflows с approval tiers | Supervisor prompt complexity; latency stacking |
Большинство production stacks 2026 default на hierarchical с thin centralized router для auth и budget enforcement — hybrid первой и третьей строки.
Patterns composable. Customer-support stack может использовать supervisor, fan-out к parallel researchers, pipeline synthesis в writer. Выбирайте minimum pattern set под dependency structure — покрывает 95%+ production scenarios.
Stages в fixed order: ingest → analyze → draft → review. State через shared graph node. Ideal когда каждый step зависит от prior output (ETL, report generation). LangGraph — linear StateGraph с typed state reducers.
builder = StateGraph(PipelineState)
builder.add_node("retriever", retrieval_agent)
builder.add_node("analyzer", analysis_agent)
builder.add_node("writer", writer_agent)
builder.add_edge(START, "retriever")
builder.add_edge("retriever", "analyzer")
builder.add_edge("analyzer", "writer")
builder.add_edge("writer", END)
pipeline = builder.compile()
Orchestrator spawns N independent branches, collector aggregates. Total latency = max(T1..Tn), не sum. LangGraph Send API dispatches dynamic worker nodes; reducer merges outputs. Multi-source research, ensemble voting, shard-level code review.
from langgraph.types import Send
def fan_out(state):
return [Send("research_worker", {"query": q}) for q in state["queries"]]
def fan_in(state):
return {"report": synthesize(state["worker_results"])}
Supervisor classifies intent, routes к specialists (coder, DBA, reviewer). Добавьте keyword fast path: regex или embedding match на high-confidence intents skips LLM routing call — <1ms на FAQ-style queries.
Agents hand off conversation control peer-to-peer без central coordinator. AutoGen excels: multi-round negotiation (code review, proposal evaluation). Caveat: high non-determinism — в production чаще ship как hierarchical.
Agents read/write shared artifact store (blackboard), не direct messaging. Planner posts goals; specialists append sections. Long-running async tasks (hours–days), heterogeneous services разных teams.
Real systems combine patterns: hierarchical supervisor → parallel fan-out для research → sequential pipeline для packaging. Явно mark sync vs async segments до написания кода.
| Pattern | Concurrency | Debuggability | Typical framework |
|---|---|---|---|
| Sequential Pipeline | Low | High | LangGraph, CrewAI sequential |
| Fan-out / Fan-in | High | Medium | LangGraph Send |
| Supervisor-Worker | Medium | High | LangGraph, CrewAI hierarchical |
| Swarm | Medium | Low | AutoGen, Swarm SDK |
| Blackboard | Medium | Medium | Custom + shared store |
| Hybrid | Variable | Medium | LangGraph (most common) |
Все три ship production users в 2026, но optimize под разные control styles. Match framework к topology, не brand affinity.
| Dimension | LangGraph | CrewAI | AutoGen |
|---|---|---|---|
| Mental model | Stateful directed graph | Role-based crew with tasks | Conversable agents + handoffs |
| State persistence | First-class checkpoints (PostgresSaver) | Memory backends, less graph-native | Chat history per agent |
| Human-in-the-loop | Native interrupt() nodes | Task-level human input hooks | UserProxyAgent pattern |
| Parallelism | Send API, subgraphs | Async task execution | Group chat parallelism |
| Production readiness | Regulated industries, long workflows | Rapid crew prototypes | Exploratory multi-agent chat |
| Watch out | Steeper graph DSL learning curve | Less fine-grained control at scale | Non-deterministic handoff chains |
Need durable checkpoints + HITL approval gates? → LangGraph.
Need demo crew за afternoon с readable role YAML? → CrewAI.
Need open-ended agent-to-agent negotiation? → AutoGen (или Swarm).
Need graph control и chat handoffs? → LangGraph orchestrator wrapping AutoGen workers.
LangGraph — default для regulated industries и long-running systems: deterministic graph execution, native state persistence, LangSmith tracing. CrewAI и AutoGen reach production, но требуют больше custom work.
Tool integration и agent collaboration — разные problems. 2026 stacks treat их как two-layer protocol cake: vertical tool access снизу, horizontal agent delegation сверху. Оба под governance Linux Foundation Agentic AI Foundation.
| Layer | Protocol | Connects | Analogy |
|---|---|---|---|
| Vertical | MCP (Model Context Protocol) | Agent ↔ tools, data, prompts | USB-C для tool discovery |
| Horizontal | A2A (Agent-to-Agent) | Agent ↔ agent delegation | HTTP для service mesh |
Каждый agent publishes Agent Card — JSON с capabilities, input schemas, endpoint URLs. Peers вызывают discover_and_delegate для routing subtasks без hard-coded agent lists. A2A v1.0 (early 2026): 50+ partners — Atlassian, Salesforce, SAP.
{
"name": "sql-analyst-agent",
"description": "Read-only Postgres analysis and explain plans",
"url": "https://agents.internal/a2a/sql-analyst",
"capabilities": ["query", "explain", "schema-introspect"],
"input_schema": {
"type": "object",
"properties": { "question": { "type": "string" } }
}
}
async def discover_and_delegate(task: str, registry: AgentRegistry):
card = await registry.find_best_match(task)
if not card:
raise NoAgentError(task)
payload = {"task": task, "caller": "supervisor-01"}
return await a2a_client.send(card.url, payload)
MCP handles tools/list inside каждого agent; A2A handles, какой agent owns task. Vertical layer — в нашем MCP protocol guide.
Demos используют in-memory state. Production needs crash recovery, human approval на high-risk actions и cost ceilings. Четыре primitives покрывают большинство teams до custom infra.
MAX_ITERATIONS = 25
class ProductionGuardrails:
def __init__(self, budget: TokenBudgetManager, breaker: CircuitBreaker):
self.budget = budget
self.breaker = breaker
self.iterations = 0
def before_step(self, agent_id: str, est_tokens: int):
self.iterations += 1
if self.iterations > MAX_ITERATIONS:
raise RunawayLoopError()
self.budget.charge(agent_id, est_tokens)
self.breaker.check()
Draw graph on paper first: Mark sync edges, parallel branches, HITL interrupt points до LangGraph nodes.
Wire PostgresSaver: Managed Postgres для checkpoints; verify resume после process kill.
Register MCP tools per agent: Least-privilege tool subsets; never share one mega tool list.
Add interrupt nodes: Gate deploy, delete, payment, PII-export tools за human approval.
Enable TokenBudgetManager + CircuitBreaker: Per-agent daily caps; alert at 80% burn rate.
Ship observability before features: OpenTelemetry spans per agent step; dashboard CORE_METRICS до agent #7.
Chaos drill: Kill worker mid-graph, restart, confirm PostgresSaver resumes с last checkpoint без duplicate side effects.
Нельзя fix то, что нельзя attribute. MAST study проанализировала 1 642 multi-agent execution traces: failure modes cluster predictably — большинство design issues, не model IQ gaps. 57% организаций уже run agents в production, но только 8% finished observability implementation — dashboards green (HTTP 200), output wrong.
Wrap каждый agent invocation в OpenTelemetry spans: agent_id, parent_span, tool_name, token_in/out, latency_ms. Export в existing APM. Minimum dashboard — CORE_METRICS:
| Metric | Why it matters |
|---|---|
| task_success_rate | End-to-end goal completion, не per-step accuracy (target: >85%) |
| tokens_per_success | Cost efficiency; spikes reveal runaway loops |
| p95_agent_latency | Pinpoints slow specialist или tool (target: <30s e2e) |
| handoff_error_rate | A2A schema mismatches и dropped messages |
| hitl_queue_depth | Approval bottlenecks blocking graph progress |
Добавьте LLM-as-Judge на sample traces: evaluator agent scores goal alignment и factual consistency. Offline для regression tests, не inline на каждый request (cost).
def traced_agent_call(agent_name: str, task: dict, correlation_id: str = None):
if not correlation_id:
correlation_id = str(uuid.uuid4())
with tracer.start_as_current_span(f"agent.{agent_name}") as span:
span.set_attribute("agent.name", agent_name)
span.set_attribute("correlation.id", correlation_id)
result = agent_registry[agent_name].run(task)
span.set_attribute("tokens_used", result.get("tokens", 0))
return result
Context pollution: Workers return full raw HTML dumps upstream. Truncate, summarize или store в blackboard; pass handles, не payloads. Hallucination Agent A становится «фактом» для B и C — все HTTP 200.
Runaway loops: Agents re-delegate indefinitely. Enforce MAX_ITERATIONS, per-edge visit counts, supervisor stop tokens. Bill $0.02 → $47 за один task.
Over-engineering: Fifteen agents для three-step workflow. Sweet spot production — 3–8 agents, если domains не truly isolated.
Demo-to-prod gap: In-memory state, no budgets, no input guardrails. Wrap graphs с ProductionGuardrails до customer exposure.
Parallel branch sync: Fan-in runs до finish всех branches. LangGraph: defer=True на supervisor node — explicit synchronization barrier.
builder.add_node("supervisor", supervisor_node, defer=True)
Expensive mistake: Adding agents для fix prompt issues. Tune specialist prompts и handoff schemas до spawn нового node. Treat каждый handoff как versioned API — schema validation и confidence thresholds на boundaries.
AdaptOrch доказал: topology choice важнее model swap. Используйте decision tree до commit на framework — снижает risk over-engineering и wrong pattern lock-in.
Strict sequential dependencies между steps? Да → subtasks concurrent? Нет → Sequential Pipeline. Да → Hybrid: Pipeline + Fan-out.
Один agent имеет decision authority? Да → scale требует sub-teams? Нет → Supervisor-Worker. Да → Hierarchical (Supervisors of Supervisors).
Task long-running async (hours–days)? Да → Blackboard Architecture. Нет → continue.
Agent count ≤ 5 и termination well-defined? Да → Swarm с hard round/time limits. Нет → refactor в Hierarchical.
Need crash-safe resume? Да → LangGraph + PostgresSaver. Нет → CrewAI rapid path.
Cross-team agent discovery? Да → Agent Cards + A2A. Tools only → MCP per agent.
Start с sequential pipeline. Add agents только при measurable evidence: context overflow, concurrency requirements, sub-agent needing independent upgrade. Discipline beats creativity в production graphs.
Laptop-hosted agents sleep при lid close, lack process supervision для long LangGraph checkpoints, struggle с macOS-native toolchains (Xcode, Keychain). Pure Linux VPS — stateless API workers; не iOS build farms. Команды, running multi-agent graphs 24/7 alongside MCP tool servers, нуждаются в predictable uptime. VpsMesh Mac Mini M4 cloud rental bundles uptime, remote KVM и predictable monthly OpEx. Цены аренды Mac Mini M4, центр помощи, оформить заказ — one-month pilot для валидации checkpoint latency и token burn до commit orchestration stack.
Большинство production systems land между 3 и 8 specialized agents. Меньше трёх — orchestration overhead редко оправдан; больше восьми — over-engineering без clear domain boundaries и per-agent observability. Start с supervisor plus two workers, measure tokens_per_success, split только когда context одного agent consistently overflows.
MCP — vertical layer: каждый agent connects к tools и data via tools/list и JSON Schema descriptors. A2A — horizontal layer: agents discover peers через Agent Cards и delegate subtasks. MCP inside каждого agent; A2A between agents. Vertical layer — MCP guide; delegation patterns — Section 05 этого гайда.
Не всегда. Stateless LangGraph workers и remote MCP over HTTP+SSE — на Linux cloud VMs. Когда agents depend на macOS toolchains, Xcode builds, Keychain secrets или нужны uninterrupted checkpoint sessions — rented Mac Mini M4 lower friction, чем laptop sleep cycles. One-month trial для checkpoint latency и token burn. Цены аренды Mac Mini M4, центр помощи, оформить заказ.