Чем MCP отличается от A2A в multi-agent стеке?

MCP — vertical layer: каждый агент подключается к tools и data через tools/list и JSON Schema. A2A — horizontal layer: агенты discover peers через Agent Cards и делегируют subtasks. Вместе — двухслойный protocol stack, аналог TCP/HTTP для agent internet.

Нужен ли Mac Mini для multi-agent систем 24/7?

Не всегда. Stateless LangGraph workers и remote MCP over HTTP+SSE живут на Linux cloud. Когда агенты завязаны на macOS toolchains, Xcode, Keychain или нужны uninterrupted checkpoint sessions — аренда Mac Mini M4 даёт predictable uptime. Начните с one-month trial для валидации checkpoint latency и token burn.

Мульти-агентная архитектура на практике: паттерны, фреймворки и продакшен (гайд 2026)

Q: Сколько агентов нужно в production multi-agent системе?

Большинство production-систем укладываются в 3–8 специализированных агентов. Меньше трёх — orchestration overhead редко оправдан; больше восьми — признак over-engineering, если нет чётких domain boundaries и per-agent observability.

Почему один Agent перестаёт масштабироваться в production

Monolithic agent — один system prompt, один tool list, один thread — демо собирается за день. Под реальной нагрузкой он становится bottleneck. Внутренний бенчмарк Google Agent Bake-Off (MLflow production guide 2026) показал: decomposed multi-agent архитектура сократила время обработки с 60 до 10 минут — 6x speedup. Отдельно исследование AdaptOrch (2026) формально доказало: orchestration topology объясняет на 12–23% больше variance в task success, чем смена underlying model.

Перед выбором framework зафиксируйте structural limits, которые вынуждают split на MAS.

01
Context window saturation: Research, code, logs и tool outputs копятся в одном thread. Retrieval quality падает; agent забывает constraints, заданные десять turns назад.
02
Jack-of-all-trades prompting: Одна persona не может одновременно excel в SQL tuning, legal review и UI copy. Instruction interference поднимает hallucination rate.
03
No true concurrency: Sequential tool calls блокируют друг друга. Независимые subtasks (scrape трёх sites, run трёх test suites) сжигают wall-clock time.
04
Single point of failure: Один bad tool result или runaway loop убивает всю session. Нет isolation domain для retries и rollbacks.
05
Opaque cost attribution: Finance не может ответить, какой step сжёг tokens. Без per-agent budgets один verbose researcher agent drains monthly cap.

Topology beats model. AdaptOrch: orchestration structure drives 12–23% больше outcome variance, чем model choice — проектируйте graph до upgrade GPT tiers.

MAS fundamentals: traits агента и control topologies

Multi-Agent System (MAS) — координированный набор LLM-powered agents, которые share state, delegate subtasks и expose specialized capabilities. Каждый agent — не просто prompt variant, а bounded runtime со своими tools, memory scope и termination policy.

Core agent traits

Trait	Meaning в LLM agents	Production signal
Autonomy	Выбирает next action без per-step human input	Нужны guardrails: max iterations, budget caps
Reactivity	Реагирует на tool results и peer messages	Structured message schema, не только free text
Proactivity	Initiates subtasks при incomplete goals	Runaway loops без supervisor checks
Social ability	Delegates и negotiates с другими agents	A2A discovery и clear handoff contracts

Три control topologies

Topology	Control flow	Best for	Risk
Centralized	Один orchestrator routes все messages	Predictable audit trails, strict policy	Orchestrator context bloat; SPOF at router
Decentralized	Peers message напрямую; нет single boss	Resilient swarms, emergent collaboration	Hard to debug; termination не guaranteed
Hierarchical	Supervisor delegates workers; workers report up	Enterprise workflows с approval tiers	Supervisor prompt complexity; latency stacking

Большинство production stacks 2026 default на hierarchical с thin centralized router для auth и budget enforcement — hybrid первой и третьей строки.

Шесть orchestration design patterns

Patterns composable. Customer-support stack может использовать supervisor, fan-out к parallel researchers, pipeline synthesis в writer. Выбирайте minimum pattern set под dependency structure — покрывает 95%+ production scenarios.

1. Sequential Pipeline

Stages в fixed order: ingest → analyze → draft → review. State через shared graph node. Ideal когда каждый step зависит от prior output (ETL, report generation). LangGraph — linear StateGraph с typed state reducers.

python · sequential pipeline

builder = StateGraph(PipelineState)
builder.add_node("retriever", retrieval_agent)
builder.add_node("analyzer", analysis_agent)
builder.add_node("writer", writer_agent)
builder.add_edge(START, "retriever")
builder.add_edge("retriever", "analyzer")
builder.add_edge("analyzer", "writer")
builder.add_edge("writer", END)
pipeline = builder.compile()

2. Parallel Fan-out / Fan-in

Orchestrator spawns N independent branches, collector aggregates. Total latency = max(T1..Tn), не sum. LangGraph Send API dispatches dynamic worker nodes; reducer merges outputs. Multi-source research, ensemble voting, shard-level code review.

python · LangGraph Send fan-out

from langgraph.types import Send

def fan_out(state):
    return [Send("research_worker", {"query": q}) for q in state["queries"]]

def fan_in(state):
    return {"report": synthesize(state["worker_results"])}

3. Hierarchical Supervisor-Worker

Supervisor classifies intent, routes к specialists (coder, DBA, reviewer). Добавьте keyword fast path: regex или embedding match на high-confidence intents skips LLM routing call — <1ms на FAQ-style queries.

4. Swarm (AutoGen-style)

Agents hand off conversation control peer-to-peer без central coordinator. AutoGen excels: multi-round negotiation (code review, proposal evaluation). Caveat: high non-determinism — в production чаще ship как hierarchical.

5. Blackboard

Agents read/write shared artifact store (blackboard), не direct messaging. Planner posts goals; specialists append sections. Long-running async tasks (hours–days), heterogeneous services разных teams.

6. Hybrid

Real systems combine patterns: hierarchical supervisor → parallel fan-out для research → sequential pipeline для packaging. Явно mark sync vs async segments до написания кода.

Pattern	Concurrency	Debuggability	Typical framework
Sequential Pipeline	Low	High	LangGraph, CrewAI sequential
Fan-out / Fan-in	High	Medium	LangGraph Send
Supervisor-Worker	Medium	High	LangGraph, CrewAI hierarchical
Swarm	Medium	Low	AutoGen, Swarm SDK
Blackboard	Medium	Medium	Custom + shared store
Hybrid	Variable	Medium	LangGraph (most common)

Framework matrix: LangGraph vs CrewAI vs AutoGen

Все три ship production users в 2026, но optimize под разные control styles. Match framework к topology, не brand affinity.

Dimension	LangGraph	CrewAI	AutoGen
Mental model	Stateful directed graph	Role-based crew with tasks	Conversable agents + handoffs
State persistence	First-class checkpoints (PostgresSaver)	Memory backends, less graph-native	Chat history per agent
Human-in-the-loop	Native `interrupt()` nodes	Task-level human input hooks	UserProxyAgent pattern
Parallelism	Send API, subgraphs	Async task execution	Group chat parallelism
Production readiness	Regulated industries, long workflows	Rapid crew prototypes	Exploratory multi-agent chat
Watch out	Steeper graph DSL learning curve	Less fine-grained control at scale	Non-deterministic handoff chains

Decision guide

A
Need durable checkpoints + HITL approval gates? → LangGraph.
B
Need demo crew за afternoon с readable role YAML? → CrewAI.
C
Need open-ended agent-to-agent negotiation? → AutoGen (или Swarm).
D
Need graph control и chat handoffs? → LangGraph orchestrator wrapping AutoGen workers.

LangGraph — default для regulated industries и long-running systems: deterministic graph execution, native state persistence, LangSmith tracing. CrewAI и AutoGen reach production, но требуют больше custom work.

MCP + A2A: dual protocol layer

Tool integration и agent collaboration — разные problems. 2026 stacks treat их как two-layer protocol cake: vertical tool access снизу, horizontal agent delegation сверху. Оба под governance Linux Foundation Agentic AI Foundation.

Layer	Protocol	Connects	Analogy
Vertical	MCP (Model Context Protocol)	Agent ↔ tools, data, prompts	USB-C для tool discovery
Horizontal	A2A (Agent-to-Agent)	Agent ↔ agent delegation	HTTP для service mesh

Каждый agent publishes Agent Card — JSON с capabilities, input schemas, endpoint URLs. Peers вызывают discover_and_delegate для routing subtasks без hard-coded agent lists. A2A v1.0 (early 2026): 50+ partners — Atlassian, Salesforce, SAP.

json · Agent Card

{
  "name": "sql-analyst-agent",
  "description": "Read-only Postgres analysis and explain plans",
  "url": "https://agents.internal/a2a/sql-analyst",
  "capabilities": ["query", "explain", "schema-introspect"],
  "input_schema": {
    "type": "object",
    "properties": { "question": { "type": "string" } }
  }
}

python · discover_and_delegate

async def discover_and_delegate(task: str, registry: AgentRegistry):
    card = await registry.find_best_match(task)
    if not card:
        raise NoAgentError(task)
    payload = {"task": task, "caller": "supervisor-01"}
    return await a2a_client.send(card.url, payload)

MCP handles tools/list inside каждого agent; A2A handles, какой agent owns task. Vertical layer — в нашем MCP protocol guide.

Production engineering: checkpoints, HITL, guardrails

Demos используют in-memory state. Production needs crash recovery, human approval на high-risk actions и cost ceilings. Четыре primitives покрывают большинство teams до custom infra.

Core production primitives

PostgresSaver: LangGraph checkpoints в Postgres — workers survive restarts, time-travel debugging.
interrupt() HITL: Pause graph перед destructive tools; resume после Slack или dashboard approval.
CircuitBreaker: Trip после N consecutive tool failures; fail fast вместо burning tokens на dead dependency.
TokenBudgetManager: Per-agent и per-run token ceilings; hard-stop или downgrade model при budget exhaust.

python · production guardrails sketch

MAX_ITERATIONS = 25

class ProductionGuardrails:
    def __init__(self, budget: TokenBudgetManager, breaker: CircuitBreaker):
        self.budget = budget
        self.breaker = breaker
        self.iterations = 0

    def before_step(self, agent_id: str, est_tokens: int):
        self.iterations += 1
        if self.iterations > MAX_ITERATIONS:
            raise RunawayLoopError()
        self.budget.charge(agent_id, est_tokens)
        self.breaker.check()

6-step production runbook

01
Draw graph on paper first: Mark sync edges, parallel branches, HITL interrupt points до LangGraph nodes.
02
Wire PostgresSaver: Managed Postgres для checkpoints; verify resume после process kill.
03
Register MCP tools per agent: Least-privilege tool subsets; never share one mega tool list.
04
Add interrupt nodes: Gate deploy, delete, payment, PII-export tools за human approval.
05
Enable TokenBudgetManager + CircuitBreaker: Per-agent daily caps; alert at 80% burn rate.
06
Ship observability before features: OpenTelemetry spans per agent step; dashboard CORE_METRICS до agent #7.

Tip

Chaos drill: Kill worker mid-graph, restart, confirm PostgresSaver resumes с last checkpoint без duplicate side effects.

Observability: MAST traces, OpenTelemetry, LLM-as-Judge

Нельзя fix то, что нельзя attribute. MAST study проанализировала 1 642 multi-agent execution traces: failure modes cluster predictably — большинство design issues, не model IQ gaps. 57% организаций уже run agents в production, но только 8% finished observability implementation — dashboards green (HTTP 200), output wrong.

MAST failure breakdown

41.77% — system design flaws (wrong topology, missing handoff contracts, missing termination)
36.94% — inter-agent misalignment (context lost at handoffs, hallucination становится ground truth)
21.30% — verification gaps (premature termination, incomplete validation)

Instrumentation stack

Wrap каждый agent invocation в OpenTelemetry spans: agent_id, parent_span, tool_name, token_in/out, latency_ms. Export в existing APM. Minimum dashboard — CORE_METRICS:

Metric	Why it matters
task_success_rate	End-to-end goal completion, не per-step accuracy (target: >85%)
tokens_per_success	Cost efficiency; spikes reveal runaway loops
p95_agent_latency	Pinpoints slow specialist или tool (target: <30s e2e)
handoff_error_rate	A2A schema mismatches и dropped messages
hitl_queue_depth	Approval bottlenecks blocking graph progress

Добавьте LLM-as-Judge на sample traces: evaluator agent scores goal alignment и factual consistency. Offline для regression tests, не inline на каждый request (cost).

python · traced_agent_call

def traced_agent_call(agent_name: str, task: dict, correlation_id: str = None):
    if not correlation_id:
        correlation_id = str(uuid.uuid4())
    with tracer.start_as_current_span(f"agent.{agent_name}") as span:
        span.set_attribute("agent.name", agent_name)
        span.set_attribute("correlation.id", correlation_id)
        result = agent_registry[agent_name].run(task)
        span.set_attribute("tokens_used", result.get("tokens", 0))
        return result

Pitfalls: что ломает demo-to-prod migrations

01
Context pollution: Workers return full raw HTML dumps upstream. Truncate, summarize или store в blackboard; pass handles, не payloads. Hallucination Agent A становится «фактом» для B и C — все HTTP 200.
02
Runaway loops: Agents re-delegate indefinitely. Enforce MAX_ITERATIONS, per-edge visit counts, supervisor stop tokens. Bill $0.02 → $47 за один task.
03
Over-engineering: Fifteen agents для three-step workflow. Sweet spot production — 3–8 agents, если domains не truly isolated.
04
Demo-to-prod gap: In-memory state, no budgets, no input guardrails. Wrap graphs с ProductionGuardrails до customer exposure.
05
Parallel branch sync: Fan-in runs до finish всех branches. LangGraph: defer=True на supervisor node — explicit synchronization barrier.

python · defer parallel sync

builder.add_node("supervisor", supervisor_node, defer=True)

Alert

Expensive mistake: Adding agents для fix prompt issues. Tune specialist prompts и handoff schemas до spawn нового node. Treat каждый handoff как versioned API — schema validation и confidence thresholds на boundaries.

Decision framework: topology selection tree

AdaptOrch доказал: topology choice важнее model swap. Используйте decision tree до commit на framework — снижает risk over-engineering и wrong pattern lock-in.

?
Strict sequential dependencies между steps? Да → subtasks concurrent? Нет → Sequential Pipeline. Да → Hybrid: Pipeline + Fan-out.
?
Один agent имеет decision authority? Да → scale требует sub-teams? Нет → Supervisor-Worker. Да → Hierarchical (Supervisors of Supervisors).
?
Task long-running async (hours–days)? Да → Blackboard Architecture. Нет → continue.
?
Agent count ≤ 5 и termination well-defined? Да → Swarm с hard round/time limits. Нет → refactor в Hierarchical.
?
Need crash-safe resume? Да → LangGraph + PostgresSaver. Нет → CrewAI rapid path.
?
Cross-team agent discovery? Да → Agent Cards + A2A. Tools only → MCP per agent.

Start с sequential pipeline. Add agents только при measurable evidence: context overflow, concurrency requirements, sub-agent needing independent upgrade. Discipline beats creativity в production graphs.

Итоги, hard data и тренды 2026

Пять ключевых выводов

1. Orchestration topology объясняет больше outcome variance (12–23%), чем model swaps — design first.
2. Шесть patterns покрывают большинство production graphs; hybrids — norm, не smell.
3. MCP vertical + A2A horizontal — emerging standard protocol stack под AAIF governance.
4. MAST: 41.77% failures — system design; 49-point gap «agents in prod» vs «observability done» — где случаются $47K bills.
5. Cap agents at 3–8, cap iterations, cap tokens — guardrails beat bigger prompts.

Citable hard data

Agent Bake-Off: Multi-agent teams — 10 min vs 60 min (6x) на internal Google benchmark.
AdaptOrch (arXiv 2602.16873): Topology drives 12–23% больше variance, чем LLM selection.
MAST (1 642 traces): 41.77% system design, 36.94% misalignment, 21.30% verification gaps.
Observability gap: 57% orgs с agents в prod, 8% finished observability.

Тренды 2026

Federated orchestration: Sub-orchestrators разных teams share learned routing policies — federated learning для control logic.
Multimodal workers: Vision и audio specialists в existing supervisor graphs.
Adaptive topology: Runtime rewire fan-out width по task characteristics (AdaptOrch direction).
EU AI Act compliance: Complete decision audit trails, HITL evidence, risk-tiered tool access — hard requirement, не nice-to-have.

Laptop-hosted agents sleep при lid close, lack process supervision для long LangGraph checkpoints, struggle с macOS-native toolchains (Xcode, Keychain). Pure Linux VPS — stateless API workers; не iOS build farms. Команды, running multi-agent graphs 24/7 alongside MCP tool servers, нуждаются в predictable uptime. VpsMesh Mac Mini M4 cloud rental bundles uptime, remote KVM и predictable monthly OpEx. Цены аренды Mac Mini M4, центр помощи, оформить заказ — one-month pilot для валидации checkpoint latency и token burn до commit orchestration stack.

FAQ

Три вопроса перед переходом на multi-agent

Большинство production systems land между 3 и 8 specialized agents. Меньше трёх — orchestration overhead редко оправдан; больше восьми — over-engineering без clear domain boundaries и per-agent observability. Start с supervisor plus two workers, measure tokens_per_success, split только когда context одного agent consistently overflows.

MCP — vertical layer: каждый agent connects к tools и data via tools/list и JSON Schema descriptors. A2A — horizontal layer: agents discover peers через Agent Cards и delegate subtasks. MCP inside каждого agent; A2A between agents. Vertical layer — MCP guide; delegation patterns — Section 05 этого гайда.

Не всегда. Stateless LangGraph workers и remote MCP over HTTP+SSE — на Linux cloud VMs. Когда agents depend на macOS toolchains, Xcode builds, Keychain secrets или нужны uninterrupted checkpoint sessions — rented Mac Mini M4 lower friction, чем laptop sleep cycles. One-month trial для checkpoint latency и token burn. Цены аренды Mac Mini M4, центр помощи, оформить заказ.