Мульти-агентная архитектура на практике: паттерны, фреймворки и продакшен (гайд 2026)

6 паттернов оркестрации · LangGraph vs CrewAI · MCP + A2A · MAST observability · production runbook

Мульти-агентная архитектура на практике: паттерны, фреймворки и продакшен (гайд 2026)

Demo с одним Agent в Cursor работает — production требует parallel research, tool isolation и human approval gates под общим token budget. Monolithic agent упирается в context limits, jack-of-all-trades drift, zero concurrency и single point of failure. Гайд для AI-инженеров и tech leads, переходящих на Multi-Agent Systems (MAS): шесть orchestration patterns, матрица LangGraph vs CrewAI vs AutoGen, protocol stack MCP + A2A, 6-step production runbook (PostgresSaver, HITL interrupts, circuit breakers), данные MAST по 1 642 traces, pitfalls и decision tree на 2026.

01

Почему один Agent перестаёт масштабироваться в production

Monolithic agent — один system prompt, один tool list, один thread — демо собирается за день. Под реальной нагрузкой он становится bottleneck. Внутренний бенчмарк Google Agent Bake-Off (MLflow production guide 2026) показал: decomposed multi-agent архитектура сократила время обработки с 60 до 10 минут6x speedup. Отдельно исследование AdaptOrch (2026) формально доказало: orchestration topology объясняет на 12–23% больше variance в task success, чем смена underlying model.

Перед выбором framework зафиксируйте structural limits, которые вынуждают split на MAS.

  1. 01

    Context window saturation: Research, code, logs и tool outputs копятся в одном thread. Retrieval quality падает; agent забывает constraints, заданные десять turns назад.

  2. 02

    Jack-of-all-trades prompting: Одна persona не может одновременно excel в SQL tuning, legal review и UI copy. Instruction interference поднимает hallucination rate.

  3. 03

    No true concurrency: Sequential tool calls блокируют друг друга. Независимые subtasks (scrape трёх sites, run трёх test suites) сжигают wall-clock time.

  4. 04

    Single point of failure: Один bad tool result или runaway loop убивает всю session. Нет isolation domain для retries и rollbacks.

  5. 05

    Opaque cost attribution: Finance не может ответить, какой step сжёг tokens. Без per-agent budgets один verbose researcher agent drains monthly cap.

Topology beats model. AdaptOrch: orchestration structure drives 12–23% больше outcome variance, чем model choice — проектируйте graph до upgrade GPT tiers.

02

MAS fundamentals: traits агента и control topologies

Multi-Agent System (MAS) — координированный набор LLM-powered agents, которые share state, delegate subtasks и expose specialized capabilities. Каждый agent — не просто prompt variant, а bounded runtime со своими tools, memory scope и termination policy.

Core agent traits

TraitMeaning в LLM agentsProduction signal
AutonomyВыбирает next action без per-step human inputНужны guardrails: max iterations, budget caps
ReactivityРеагирует на tool results и peer messagesStructured message schema, не только free text
ProactivityInitiates subtasks при incomplete goalsRunaway loops без supervisor checks
Social abilityDelegates и negotiates с другими agentsA2A discovery и clear handoff contracts

Три control topologies

TopologyControl flowBest forRisk
CentralizedОдин orchestrator routes все messagesPredictable audit trails, strict policyOrchestrator context bloat; SPOF at router
DecentralizedPeers message напрямую; нет single bossResilient swarms, emergent collaborationHard to debug; termination не guaranteed
HierarchicalSupervisor delegates workers; workers report upEnterprise workflows с approval tiersSupervisor prompt complexity; latency stacking

Большинство production stacks 2026 default на hierarchical с thin centralized router для auth и budget enforcement — hybrid первой и третьей строки.

03

Шесть orchestration design patterns

Patterns composable. Customer-support stack может использовать supervisor, fan-out к parallel researchers, pipeline synthesis в writer. Выбирайте minimum pattern set под dependency structure — покрывает 95%+ production scenarios.

1. Sequential Pipeline

Stages в fixed order: ingest → analyze → draft → review. State через shared graph node. Ideal когда каждый step зависит от prior output (ETL, report generation). LangGraph — linear StateGraph с typed state reducers.

python · sequential pipeline
builder = StateGraph(PipelineState)
builder.add_node("retriever", retrieval_agent)
builder.add_node("analyzer", analysis_agent)
builder.add_node("writer", writer_agent)
builder.add_edge(START, "retriever")
builder.add_edge("retriever", "analyzer")
builder.add_edge("analyzer", "writer")
builder.add_edge("writer", END)
pipeline = builder.compile()

2. Parallel Fan-out / Fan-in

Orchestrator spawns N independent branches, collector aggregates. Total latency = max(T1..Tn), не sum. LangGraph Send API dispatches dynamic worker nodes; reducer merges outputs. Multi-source research, ensemble voting, shard-level code review.

python · LangGraph Send fan-out
from langgraph.types import Send

def fan_out(state):
    return [Send("research_worker", {"query": q}) for q in state["queries"]]

def fan_in(state):
    return {"report": synthesize(state["worker_results"])}

3. Hierarchical Supervisor-Worker

Supervisor classifies intent, routes к specialists (coder, DBA, reviewer). Добавьте keyword fast path: regex или embedding match на high-confidence intents skips LLM routing call — <1ms на FAQ-style queries.

4. Swarm (AutoGen-style)

Agents hand off conversation control peer-to-peer без central coordinator. AutoGen excels: multi-round negotiation (code review, proposal evaluation). Caveat: high non-determinism — в production чаще ship как hierarchical.

5. Blackboard

Agents read/write shared artifact store (blackboard), не direct messaging. Planner posts goals; specialists append sections. Long-running async tasks (hours–days), heterogeneous services разных teams.

6. Hybrid

Real systems combine patterns: hierarchical supervisor → parallel fan-out для research → sequential pipeline для packaging. Явно mark sync vs async segments до написания кода.

PatternConcurrencyDebuggabilityTypical framework
Sequential PipelineLowHighLangGraph, CrewAI sequential
Fan-out / Fan-inHighMediumLangGraph Send
Supervisor-WorkerMediumHighLangGraph, CrewAI hierarchical
SwarmMediumLowAutoGen, Swarm SDK
BlackboardMediumMediumCustom + shared store
HybridVariableMediumLangGraph (most common)
04

Framework matrix: LangGraph vs CrewAI vs AutoGen

Все три ship production users в 2026, но optimize под разные control styles. Match framework к topology, не brand affinity.

DimensionLangGraphCrewAIAutoGen
Mental modelStateful directed graphRole-based crew with tasksConversable agents + handoffs
State persistenceFirst-class checkpoints (PostgresSaver)Memory backends, less graph-nativeChat history per agent
Human-in-the-loopNative interrupt() nodesTask-level human input hooksUserProxyAgent pattern
ParallelismSend API, subgraphsAsync task executionGroup chat parallelism
Production readinessRegulated industries, long workflowsRapid crew prototypesExploratory multi-agent chat
Watch outSteeper graph DSL learning curveLess fine-grained control at scaleNon-deterministic handoff chains

Decision guide

  1. A

    Need durable checkpoints + HITL approval gates? → LangGraph.

  2. B

    Need demo crew за afternoon с readable role YAML? → CrewAI.

  3. C

    Need open-ended agent-to-agent negotiation? → AutoGen (или Swarm).

  4. D

    Need graph control и chat handoffs? → LangGraph orchestrator wrapping AutoGen workers.

LangGraph — default для regulated industries и long-running systems: deterministic graph execution, native state persistence, LangSmith tracing. CrewAI и AutoGen reach production, но требуют больше custom work.

05

MCP + A2A: dual protocol layer

Tool integration и agent collaboration — разные problems. 2026 stacks treat их как two-layer protocol cake: vertical tool access снизу, horizontal agent delegation сверху. Оба под governance Linux Foundation Agentic AI Foundation.

LayerProtocolConnectsAnalogy
VerticalMCP (Model Context Protocol)Agent ↔ tools, data, promptsUSB-C для tool discovery
HorizontalA2A (Agent-to-Agent)Agent ↔ agent delegationHTTP для service mesh

Каждый agent publishes Agent Card — JSON с capabilities, input schemas, endpoint URLs. Peers вызывают discover_and_delegate для routing subtasks без hard-coded agent lists. A2A v1.0 (early 2026): 50+ partners — Atlassian, Salesforce, SAP.

json · Agent Card
{
  "name": "sql-analyst-agent",
  "description": "Read-only Postgres analysis and explain plans",
  "url": "https://agents.internal/a2a/sql-analyst",
  "capabilities": ["query", "explain", "schema-introspect"],
  "input_schema": {
    "type": "object",
    "properties": { "question": { "type": "string" } }
  }
}
python · discover_and_delegate
async def discover_and_delegate(task: str, registry: AgentRegistry):
    card = await registry.find_best_match(task)
    if not card:
        raise NoAgentError(task)
    payload = {"task": task, "caller": "supervisor-01"}
    return await a2a_client.send(card.url, payload)

MCP handles tools/list inside каждого agent; A2A handles, какой agent owns task. Vertical layer — в нашем MCP protocol guide.

06

Production engineering: checkpoints, HITL, guardrails

Demos используют in-memory state. Production needs crash recovery, human approval на high-risk actions и cost ceilings. Четыре primitives покрывают большинство teams до custom infra.

Core production primitives

  • PostgresSaver: LangGraph checkpoints в Postgres — workers survive restarts, time-travel debugging.
  • interrupt() HITL: Pause graph перед destructive tools; resume после Slack или dashboard approval.
  • CircuitBreaker: Trip после N consecutive tool failures; fail fast вместо burning tokens на dead dependency.
  • TokenBudgetManager: Per-agent и per-run token ceilings; hard-stop или downgrade model при budget exhaust.
python · production guardrails sketch
MAX_ITERATIONS = 25

class ProductionGuardrails:
    def __init__(self, budget: TokenBudgetManager, breaker: CircuitBreaker):
        self.budget = budget
        self.breaker = breaker
        self.iterations = 0

    def before_step(self, agent_id: str, est_tokens: int):
        self.iterations += 1
        if self.iterations > MAX_ITERATIONS:
            raise RunawayLoopError()
        self.budget.charge(agent_id, est_tokens)
        self.breaker.check()

6-step production runbook

  1. 01

    Draw graph on paper first: Mark sync edges, parallel branches, HITL interrupt points до LangGraph nodes.

  2. 02

    Wire PostgresSaver: Managed Postgres для checkpoints; verify resume после process kill.

  3. 03

    Register MCP tools per agent: Least-privilege tool subsets; never share one mega tool list.

  4. 04

    Add interrupt nodes: Gate deploy, delete, payment, PII-export tools за human approval.

  5. 05

    Enable TokenBudgetManager + CircuitBreaker: Per-agent daily caps; alert at 80% burn rate.

  6. 06

    Ship observability before features: OpenTelemetry spans per agent step; dashboard CORE_METRICS до agent #7.

Tip

Chaos drill: Kill worker mid-graph, restart, confirm PostgresSaver resumes с last checkpoint без duplicate side effects.

07

Observability: MAST traces, OpenTelemetry, LLM-as-Judge

Нельзя fix то, что нельзя attribute. MAST study проанализировала 1 642 multi-agent execution traces: failure modes cluster predictably — большинство design issues, не model IQ gaps. 57% организаций уже run agents в production, но только 8% finished observability implementation — dashboards green (HTTP 200), output wrong.

MAST failure breakdown

  • 41.77% — system design flaws (wrong topology, missing handoff contracts, missing termination)
  • 36.94% — inter-agent misalignment (context lost at handoffs, hallucination становится ground truth)
  • 21.30% — verification gaps (premature termination, incomplete validation)

Instrumentation stack

Wrap каждый agent invocation в OpenTelemetry spans: agent_id, parent_span, tool_name, token_in/out, latency_ms. Export в existing APM. Minimum dashboard — CORE_METRICS:

MetricWhy it matters
task_success_rateEnd-to-end goal completion, не per-step accuracy (target: >85%)
tokens_per_successCost efficiency; spikes reveal runaway loops
p95_agent_latencyPinpoints slow specialist или tool (target: <30s e2e)
handoff_error_rateA2A schema mismatches и dropped messages
hitl_queue_depthApproval bottlenecks blocking graph progress

Добавьте LLM-as-Judge на sample traces: evaluator agent scores goal alignment и factual consistency. Offline для regression tests, не inline на каждый request (cost).

python · traced_agent_call
def traced_agent_call(agent_name: str, task: dict, correlation_id: str = None):
    if not correlation_id:
        correlation_id = str(uuid.uuid4())
    with tracer.start_as_current_span(f"agent.{agent_name}") as span:
        span.set_attribute("agent.name", agent_name)
        span.set_attribute("correlation.id", correlation_id)
        result = agent_registry[agent_name].run(task)
        span.set_attribute("tokens_used", result.get("tokens", 0))
        return result
08

Pitfalls: что ломает demo-to-prod migrations

  1. 01

    Context pollution: Workers return full raw HTML dumps upstream. Truncate, summarize или store в blackboard; pass handles, не payloads. Hallucination Agent A становится «фактом» для B и C — все HTTP 200.

  2. 02

    Runaway loops: Agents re-delegate indefinitely. Enforce MAX_ITERATIONS, per-edge visit counts, supervisor stop tokens. Bill $0.02 → $47 за один task.

  3. 03

    Over-engineering: Fifteen agents для three-step workflow. Sweet spot production — 3–8 agents, если domains не truly isolated.

  4. 04

    Demo-to-prod gap: In-memory state, no budgets, no input guardrails. Wrap graphs с ProductionGuardrails до customer exposure.

  5. 05

    Parallel branch sync: Fan-in runs до finish всех branches. LangGraph: defer=True на supervisor node — explicit synchronization barrier.

python · defer parallel sync
builder.add_node("supervisor", supervisor_node, defer=True)
Alert

Expensive mistake: Adding agents для fix prompt issues. Tune specialist prompts и handoff schemas до spawn нового node. Treat каждый handoff как versioned API — schema validation и confidence thresholds на boundaries.

09

Decision framework: topology selection tree

AdaptOrch доказал: topology choice важнее model swap. Используйте decision tree до commit на framework — снижает risk over-engineering и wrong pattern lock-in.

  1. ?

    Strict sequential dependencies между steps? Да → subtasks concurrent? Нет → Sequential Pipeline. Да → Hybrid: Pipeline + Fan-out.

  2. ?

    Один agent имеет decision authority? Да → scale требует sub-teams? Нет → Supervisor-Worker. Да → Hierarchical (Supervisors of Supervisors).

  3. ?

    Task long-running async (hours–days)? Да → Blackboard Architecture. Нет → continue.

  4. ?

    Agent count ≤ 5 и termination well-defined? Да → Swarm с hard round/time limits. Нет → refactor в Hierarchical.

  5. ?

    Need crash-safe resume? Да → LangGraph + PostgresSaver. Нет → CrewAI rapid path.

  6. ?

    Cross-team agent discovery? Да → Agent Cards + A2A. Tools only → MCP per agent.

Start с sequential pipeline. Add agents только при measurable evidence: context overflow, concurrency requirements, sub-agent needing independent upgrade. Discipline beats creativity в production graphs.

10

Итоги, hard data и тренды 2026

Пять ключевых выводов

  • 1. Orchestration topology объясняет больше outcome variance (12–23%), чем model swaps — design first.
  • 2. Шесть patterns покрывают большинство production graphs; hybrids — norm, не smell.
  • 3. MCP vertical + A2A horizontal — emerging standard protocol stack под AAIF governance.
  • 4. MAST: 41.77% failures — system design; 49-point gap «agents in prod» vs «observability done» — где случаются $47K bills.
  • 5. Cap agents at 3–8, cap iterations, cap tokens — guardrails beat bigger prompts.

Citable hard data

  • Agent Bake-Off: Multi-agent teams — 10 min vs 60 min (6x) на internal Google benchmark.
  • AdaptOrch (arXiv 2602.16873): Topology drives 12–23% больше variance, чем LLM selection.
  • MAST (1 642 traces): 41.77% system design, 36.94% misalignment, 21.30% verification gaps.
  • Observability gap: 57% orgs с agents в prod, 8% finished observability.

Тренды 2026

  • Federated orchestration: Sub-orchestrators разных teams share learned routing policies — federated learning для control logic.
  • Multimodal workers: Vision и audio specialists в existing supervisor graphs.
  • Adaptive topology: Runtime rewire fan-out width по task characteristics (AdaptOrch direction).
  • EU AI Act compliance: Complete decision audit trails, HITL evidence, risk-tiered tool access — hard requirement, не nice-to-have.

Laptop-hosted agents sleep при lid close, lack process supervision для long LangGraph checkpoints, struggle с macOS-native toolchains (Xcode, Keychain). Pure Linux VPS — stateless API workers; не iOS build farms. Команды, running multi-agent graphs 24/7 alongside MCP tool servers, нуждаются в predictable uptime. VpsMesh Mac Mini M4 cloud rental bundles uptime, remote KVM и predictable monthly OpEx. Цены аренды Mac Mini M4, центр помощи, оформить заказ — one-month pilot для валидации checkpoint latency и token burn до commit orchestration stack.

FAQ

Три вопроса перед переходом на multi-agent

Большинство production systems land между 3 и 8 specialized agents. Меньше трёх — orchestration overhead редко оправдан; больше восьми — over-engineering без clear domain boundaries и per-agent observability. Start с supervisor plus two workers, measure tokens_per_success, split только когда context одного agent consistently overflows.

MCP — vertical layer: каждый agent connects к tools и data via tools/list и JSON Schema descriptors. A2A — horizontal layer: agents discover peers через Agent Cards и delegate subtasks. MCP inside каждого agent; A2A between agents. Vertical layer — MCP guide; delegation patterns — Section 05 этого гайда.

Не всегда. Stateless LangGraph workers и remote MCP over HTTP+SSE — на Linux cloud VMs. Когда agents depend на macOS toolchains, Xcode builds, Keychain secrets или нужны uninterrupted checkpoint sessions — rented Mac Mini M4 lower friction, чем laptop sleep cycles. One-month trial для checkpoint latency и token burn. Цены аренды Mac Mini M4, центр помощи, оформить заказ.