The Definitive Guide to Agentic Design Patterns in 2026


- Premium Results
- Publish articles on SitePoint
- Daily curated jobs
- Learning Paths
- Discounts to dev tools
7 Day Free Trial. Cancel Anytime.
Agentic design patterns have moved from research curiosity to production necessity. This guide covers six core design patterns, provides working code for each, and offers a decision framework for choosing the right pattern for a given problem.
Table of Contents
- Why 2026 Is the Year of the Agent
- From Prompt Engineering to Flow Engineering: What Actually Changed
- The Core Agentic Design Patterns: A Taxonomy
- Prerequisites
- Pattern 1: Reflection (Self-Critique Loops)
- Pattern 2: Tool Use (Grounding Agents in the Real World)
- Pattern 3: Planning (Decompose, Then Execute)
- Pattern 4: Multi-Agent Collaboration
- Pattern 5: Orchestrator-Worker (Dynamic Task Decomposition)
- Pattern 6: Evaluator-Optimizer (Test-Driven Agent Development)
- Interactive Decision Tree: Which Agent Pattern Do You Need?
- Production Considerations for Agentic Systems
- What's Coming Next: The 2026 to 2027 Horizon
- Think in Flows, Not Prompts
Why 2026 Is the Year of the Agent
Agentic design patterns have moved from research curiosity to production necessity. Chain-of-thought prompting gave way to ReAct-style interleaved reasoning, which researchers and framework authors then extended toward fully autonomous agent loops capable of planning, executing, reflecting, and recovering without human intervention at every step. In 2026, the frameworks underpinning these systems, particularly current stable releases of LangGraph and LangGraph.js, have reached stable semver releases and handle production workloads across teams running dozens of concurrent agent instances.
This guide is for experienced developers who are building or evaluating agentic systems. It covers six core design patterns, provides working code for each, and offers a decision framework for choosing the right pattern for a given problem. The central thesis is straightforward: mastering a handful of composable design patterns matters far more than mastering any single framework. Frameworks change. Patterns persist.
Mastering a handful of composable design patterns matters far more than mastering any single framework. Frameworks change. Patterns persist.
From Prompt Engineering to Flow Engineering: What Actually Changed
Why Prompt Engineering Hit Its Ceiling
Single-turn prompt optimization is inherently fragile. A carefully crafted prompt can produce excellent results on one input distribution and fail catastrophically on another. Prompts lack state, offer no mechanism for error recovery, and cannot adapt their behavior based on intermediate results. Enterprise teams deploying AI between 2024 and 2025 saw the same outcome repeatedly: accuracy plateaued even as prompt length doubled, and teams spent weeks tuning instructions that remained brittle in production.
The fundamental limitation is architectural. Optimizing the content of an LLM call is useful but insufficient when the real challenge is deciding what calls to make, in what order, with what data, and what to do when things go wrong.
What Flow Engineering Actually Means
Flow engineering is the discipline of designing the control flow, state transitions, and decision boundaries around LLM calls rather than optimizing the calls themselves. It treats agent construction as a software architecture problem. The questions shift from "How do I phrase this prompt?" to "What is the state machine governing this agent's behavior?" and "Where are the decision points, fallback paths, and termination conditions?"
Frameworks like LangGraph encode flow engineering as first-class abstractions. Developers define agents as directed graphs with typed state, conditional edges, checkpointing for persistence and resumability, and explicit interrupt points for human oversight. This is not prompt craft. This is software engineering applied to stochastic computation.
Implications for Teams and Roles
The emergence of "agent architect" as a distinct role reflects this shift. The skill set required combines traditional software engineering fundamentals, including state management, error handling, concurrency control, and observability, with an understanding of LLM capabilities and limitations. Prompt tricks still matter, but flow design has overtaken them as the highest-leverage work.
The Core Agentic Design Patterns: A Taxonomy
This guide covers six canonical patterns. They are not mutually exclusive; production systems routinely compose two or three together. Each pattern addresses a different aspect of agent behavior, from self-correction to multi-agent coordination.
| Pattern | Complexity | Primary Use Case | Key Risk |
|---|---|---|---|
| Reflection | Low | Self-correction | Infinite loops |
| Tool Use | Low to Medium | External integration | Tool misuse |
| Planning | Medium | Multi-step tasks | Plan drift |
| Multi-Agent | High | Complex workflows | Coordination overhead |
| Orchestrator-Worker | High | Dynamic subtasking | Bottleneck at orchestrator |
| Evaluator-Optimizer | Medium to High | Quality-critical output | Cost amplification |
These patterns layer naturally. You can add reflection to any agent, and tool use appears in almost every production system. Planning and orchestration provide higher-level structure, while evaluation ensures output quality.
Prerequisites
All Python examples below assume the following:
pip install langgraph langchain-openai langchain-core
export OPENAI_API_KEY="sk-..."
For TypeScript examples:
npm install @langchain/langgraph @langchain/openai @langchain/core zod
Ensure "type": "module" is set in your package.json if using top-level await in TypeScript/Node.js, or wrap invocation code in an async IIFE.
All examples use gpt-4o as the model; any sufficiently capable chat model may be substituted.
The ChatOpenAI client is instantiated at module level in the examples below to benefit from connection pooling and avoid unnecessary overhead in loops. This is the recommended practice for production deployments.
Pattern 1: Reflection (Self-Critique Loops)
How It Works
The reflection pattern is the simplest agentic loop: generate output, evaluate that output against criteria, and either accept or revise. The agent becomes its own reviewer. This pattern typically produces passing output within two to three iterations rather than requiring manual revision, making it well suited for code generation, long-form writing, and structured data extraction where quality can be assessed programmatically or by the LLM itself.
Implementation with LangGraph (Python)
The following LangGraph graph implements a reflection loop with a generate node, a critique node, and a conditional edge that either loops back for revision or exits based on a quality score.
import os
import re
import logging
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
if not os.environ.get("OPENAI_API_KEY"):
raise EnvironmentError("OPENAI_API_KEY must be set before importing this module.")
_llm = ChatOpenAI(model="gpt-4o", timeout=30)
def extract_score(text: str) -> int:
"""
Match 'score: N', 'rating: N', or 'N/10' before
falling back to first standalone integer.
"""
patterns = [
r'(?:score|rating)[:\s]+([1-9]|10)\b',
r'\b([1-9]|10)\s*/\s*10\b',
r'\b([1-9]|10)\b',
]
for pattern in patterns:
match = re.search(pattern, text, re.IGNORECASE)
if match:
return int(match.group(1))
logging.warning("Score extraction failed; defaulting to 0. Response: %s", text[:100])
return 0
class ReflectionState(TypedDict):
task: str
draft: str
critique: str
score: int
iteration: int
def generate(state: ReflectionState) -> dict:
feedback = f"
Feedback: {state['critique']}" if state["critique"] else ""
msg = _llm.invoke(f"Write a response for: {state['task']}{feedback}")
return {"draft": msg.content, "iteration": state["iteration"] + 1}
def critique(state: ReflectionState) -> dict:
msg = _llm.invoke(
f"Score this draft 1-10 and provide critique.
Draft: {state['draft']}"
)
score = extract_score(msg.content)
return {"critique": msg.content, "score": score}
def should_continue(state: ReflectionState) -> str:
if state["score"] >= 8 or state["iteration"] >= 3:
return END
return "generate"
graph = StateGraph(ReflectionState)
graph.add_node("generate", generate)
graph.add_node("critique", critique)
graph.set_entry_point("generate")
graph.add_edge("generate", "critique")
graph.add_conditional_edges("critique", should_continue, {END: END, "generate": "generate"})
app = graph.compile()
result = app.invoke({
"task": "Write a summary of the benefits of containerization",
"draft": "",
"critique": "",
"score": 0,
"iteration": 0
})
This demonstrates typed state via TypedDict, conditional edges for loop control, and explicit termination criteria (score threshold and maximum iteration cap). The extract_score function uses a prioritized sequence of regex patterns to avoid false positives from ordinal numbers or other incidental digits in the LLM response.
Pitfalls and Guardrails
Without a maximum iteration cap, reflection loops can cycle indefinitely, burning tokens without improving output. In the author's experience, returns diminish after two to three iterations for code generation and summarization tasks, though analytical or research-heavy domains sometimes benefit from a fourth pass. Every loop iteration multiplies cost linearly. Teams should implement token budget tracking per reflection cycle and set hard caps on iteration count.
Pattern 2: Tool Use (Grounding Agents in the Real World)
Anatomy of a Tool-Using Agent
Tool use follows a four-phase cycle: define available tools with structured schemas, let the LLM select and parameterize a tool call, invoke the tool, and integrate results back into the conversation. The 2026 state of the art includes structured tool calling with schema validation across OpenAI, Anthropic, and capable open-source models. This means the LLM returns structured JSON matching a defined schema rather than free-text instructions that must be parsed.
Building a Tool-Augmented Agent in TypeScript
The following TypeScript example uses LangGraph.js to define two tools, bind them to an LLM, and execute a tool-use loop with automatic retry on malformed calls.
import { StateGraph, END, Annotation } from "@langchain/langgraph";
import { ChatOpenAI } from "@langchain/openai";
import { tool } from "@langchain/core/tools";
import { z } from "zod";
import { ToolNode } from "@langchain/langgraph/prebuilt";
import { BaseMessage } from "@langchain/core/messages";
const webSearch = tool(async ({ query }: { query: string }) => {
return `Search results for: ${query} — [simulated results]`;
}, {
name: "web_search",
description: "Search the web",
schema: z.object({ query: z.string() })
});
// ⚠️ SECURITY: This tool enforces a SELECT-only allowlist. Never pass LLM-generated SQL
// directly to a real database without parameterized queries, a read-only DB user, and
// an allowlist of permitted query patterns.
const ALLOWED_SQL_PATTERN = /^SELECT\s+[\w\s,.*()]+\s+FROM\s+\w+(\s+WHERE\s+[\w\s=<>!'"]+)?;?$/i;
const dbQuery = tool(async ({ sql }: { sql: string }) => {
if (!ALLOWED_SQL_PATTERN.test(sql.trim())) {
return "Error: Only simple SELECT queries are permitted.";
}
// In production, pass sql to a parameterized query layer, not string interpolation.
// e.g., db.query("SELECT * FROM table WHERE id = $1", [sanitizedId])
return `DB result for: ${sql} — [simulated rows]`;
}, {
name: "db_query",
description: "Query the database (SELECT only)",
schema: z.object({ sql: z.string().max(500) })
});
const tools = [webSearch, dbQuery];
const model = new ChatOpenAI({
model: "gpt-4o",
timeout: 30_000,
}).bindTools(tools);
const toolNode = new ToolNode(tools);
const AgentAnnotation = Annotation.Root({
messages: Annotation<BaseMessage[]>({
reducer: (a, b) => a.concat(b),
default: () => [],
}),
});
async function agent(state: typeof AgentAnnotation.State) {
const response = await model.invoke(state.messages);
return { messages: [response] };
}
function shouldUseTool(state: typeof AgentAnnotation.State): string {
if (!state.messages || state.messages.length === 0) {
return END;
}
const last = state.messages[state.messages.length - 1];
return last?.tool_calls?.length > 0 ? "tools" : END;
}
const graph = new StateGraph(AgentAnnotation)
.addNode("agent", agent)
.addNode("tools", toolNode)
.setEntryPoint("agent")
.addConditionalEdges("agent", shouldUseTool, { tools: "tools", [END]: END })
.addEdge("tools", "agent");
const app = graph.compile();
(async () => {
const result = await app.invoke({
messages: [{
role: "user",
content: "Find recent trends in AI agents and check our database for related projects"
}]
});
console.log(result);
})();
This demonstrates tool schema definition using Zod, binding tools to the LLM via bindTools, and the standard agent-tool loop pattern with conditional routing. The dbQuery tool enforces a SELECT-only allowlist to prevent SQL injection from LLM-generated queries. The shouldUseTool function guards against empty message arrays to prevent runtime crashes.
Tool Selection at Scale
When an agent has access to 50 or more tools, passing all schemas in every request becomes impractical due to context window limits. Anecdotally, selection accuracy degrades noticeably past this threshold as the model struggles to distinguish between similar tool descriptions. You address this by embedding tool descriptions, retrieving the top-k relevant tools based on the current query, and presenting only those to the LLM. Dynamic tool loading, where tools register and deregister based on task context, further reduces noise and improves selection precision.
Pattern 3: Planning (Decompose, Then Execute)
Plan-and-Execute vs. ReAct
ReAct interleaves reasoning and action at each step: think, act, observe, repeat. Plan-and-execute separates the phases: first generate a complete plan, then execute steps sequentially, with replanning triggered only on failure. ReAct excels at exploratory tasks where the next step depends on observations. Plan-and-execute is superior for well-defined multi-step tasks where upfront decomposition reduces wasted LLM calls.
Implementing Plan-and-Execute with LangGraph
import os
import re
from typing import TypedDict, List, Optional
from pydantic import BaseModel
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
if not os.environ.get("OPENAI_API_KEY"):
raise EnvironmentError("OPENAI_API_KEY must be set before importing this module.")
_llm = ChatOpenAI(model="gpt-4o", timeout=30)
class Step(BaseModel):
description: str
status: str = "pending"
result: Optional[str] = None
class PlanState(TypedDict):
objective: str
steps: List[dict]
current_index: int
final_result: str
def planner(state: PlanState) -> dict:
msg = _llm.invoke(
f"Break this objective into 3-5 concrete steps. "
f"Return ONLY a numbered list, one step per line, no preamble.
"
f"Objective: {state['objective']}"
)
steps = [
{
"description": re.sub(r'^\d+[\.\)]\s*', '', line.strip()),
"status": "pending",
"result": None,
}
for line in msg.content.strip().split("
")
if re.match(r'^\d+[\.\)]', line.strip())
]
if not steps:
raise ValueError(f"Planner returned no parseable steps: {msg.content[:200]}")
return {"steps": steps, "current_index": 0}
def executor(state: PlanState) -> dict:
idx = state["current_index"]
step = state["steps"][idx]
context = "
".join(s["result"] for s in state["steps"][:idx] if s["result"])
steps = list(state["steps"])
try:
msg = _llm.invoke(f"Execute this step: {step['description']}
Prior context: {context}")
steps[idx] = {**step, "status": "done", "result": msg.content}
except Exception as e:
steps[idx] = {**step, "status": "failed", "result": str(e)}
return {"steps": steps, "current_index": idx + 1}
def replanner(state: PlanState) -> dict:
completed = [s for s in state["steps"] if s["status"] == "done"]
formatted = "
".join(
f"- {s['description']}: {s['result'][:100]}" for s in completed
)
msg = _llm.invoke(
f"Revise remaining steps given progress:
{formatted}
"
f"Objective: {state['objective']}"
)
new_steps = [
{
"description": re.sub(r'^\d+[\.\)]\s*', '', l.strip()),
"status": "pending",
"result": None,
}
for l in msg.content.strip().split("
")
if l.strip() and re.match(r'^\d+[\.\)]', l.strip())
]
all_steps = completed + new_steps
return {
"steps": all_steps,
"current_index": len(completed),
}
def route_after_execute(state: PlanState) -> str:
idx = state["current_index"]
if idx >= len(state["steps"]):
return "aggregator"
if idx == 0:
return "executor"
last_step = state["steps"][idx - 1]
if last_step.get("status") == "failed":
return "replanner"
return "executor"
def aggregator(state: PlanState) -> dict:
"""Synthesizes all completed step results into final_result."""
context = "
".join(
f"Step {i+1}: {s['result']}"
for i, s in enumerate(state["steps"])
if s["status"] == "done" and s["result"]
)
msg = _llm.invoke(
f"Synthesize these results for objective '{state['objective']}':
{context}"
)
return {"final_result": msg.content}
graph = StateGraph(PlanState)
graph.add_node("planner", planner)
graph.add_node("executor", executor)
graph.add_node("replanner", replanner)
graph.add_node("aggregator", aggregator)
graph.set_entry_point("planner")
graph.add_edge("planner", "executor")
graph.add_conditional_edges("executor", route_after_execute,
{"aggregator": "aggregator", "replanner": "replanner", "executor": "executor"})
graph.add_edge("replanner", "executor")
graph.add_edge("aggregator", END)
app = graph.compile()
This demonstrates structured plan output with validation (only numbered lines are accepted as steps), sequential step execution with prior context accumulation, dynamic replanning on failure with correct index reset to the first new pending step, and a final aggregation node that populates final_result. The executor wraps the LLM call in a try/except block so that failures set the step status to "failed", enabling the replanner branch to activate.
Handling Plan Drift and Recovery
Plans degrade as execution context diverges from initial assumptions. Effective strategies include replanning thresholds (trigger replanning after N consecutive step failures or when step output contradicts plan assumptions), human-in-the-loop checkpoints at critical plan stages using LangGraph's interrupt mechanism, and plan versioning to enable rollback to earlier plan states. What does "contradicts plan assumptions" look like in practice? Usually, the executor's output references entities or constraints the planner never accounted for, a signal you can detect by comparing step output against the original objective with a lightweight LLM check.
Pattern 4: Multi-Agent Collaboration
Topologies: Peer-to-Peer, Hierarchical, and Debate
Three topologies dominate. Peer-to-peer agents share state and contribute to a common output, a natural fit for collaborative writing or pair programming scenarios. Unlike peer-to-peer, hierarchical delegation introduces a manager agent that assigns tasks to specialists, trading flexibility for tighter control over subtask allocation. Adversarial debate takes a different approach entirely: it pits agents against each other to stress-test reasoning, which tends to surface blind spots that collaborative topologies miss.
Multi-Agent Chat with LangGraph
import os
from typing import TypedDict, List, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
import operator
if not os.environ.get("OPENAI_API_KEY"):
raise EnvironmentError("OPENAI_API_KEY must be set before importing this module.")
_llm = ChatOpenAI(model="gpt-4o", timeout=30)
class MultiAgentState(TypedDict):
messages: Annotated[List[dict], operator.add]
turn_count: int
consensus: bool
def researcher(state: MultiAgentState) -> dict:
history = "
".join(f"{m['role']}: {m['content']}" for m in state["messages"])
msg = _llm.invoke(f"You are a Researcher. Provide analysis.
History:
{history}")
return {
"messages": [{"role": "researcher", "content": msg.content}],
"turn_count": state["turn_count"] + 1
}
def critic(state: MultiAgentState) -> dict:
history = "
".join(f"{m['role']}: {m['content']}" for m in state["messages"])
msg = _llm.invoke(f"You are a Critic. Challenge weaknesses and suggest improvements.
History:
{history}")
return {
"messages": [{"role": "critic", "content": msg.content}],
"turn_count": state["turn_count"] + 1
}
def supervisor(state: MultiAgentState) -> dict:
history = "
".join(f"{m['role']}: {m['content']}" for m in state["messages"][-4:])
msg = _llm.invoke(f"Have these agents reached consensus? Reply YES or NO.
{history}")
consensus = "YES" in msg.content.upper()
return {"consensus": consensus}
def route_supervisor(state: MultiAgentState) -> str:
if state["consensus"] or state["turn_count"] >= 6:
return END
return "researcher"
graph = StateGraph(MultiAgentState)
graph.add_node("researcher", researcher)
graph.add_node("critic", critic)
graph.add_node("supervisor", supervisor)
graph.set_entry_point("researcher")
graph.add_edge("researcher", "critic")
graph.add_edge("critic", "supervisor")
graph.add_conditional_edges("supervisor", route_supervisor, {END: END, "researcher": "researcher"})
app = graph.compile()
result = app.invoke({
"messages": [{"role": "user", "content": "Analyze the trade-offs of microservices vs monoliths"}],
"turn_count": 0,
"consensus": False
})
This demonstrates multiple agent nodes with distinct roles, message passing via shared annotated state using operator.add for accumulation, role-attributed history so agents can distinguish their own prior output from other agents' output, and a supervisor-driven termination condition.
Coordination Challenges
State consistency is the primary challenge in multi-agent systems. Message ordering must be deterministic; without it, agents can diverge or deadlock. A single typed state object, as LangGraph implements, eliminates message-ordering races within a single process and gives you stronger consistency than message-passing architectures for most production use cases. Deadlock prevention requires explicit turn limits and timeout mechanisms. LangGraph's sequential node execution within a single graph guarantees ordering; true concurrency across separate processes would require additional synchronization.
Pattern 5: Orchestrator-Worker (Dynamic Task Decomposition)
How It Differs from Static Planning
Where the planning pattern creates a fixed plan upfront, the orchestrator-worker pattern dynamically spawns and delegates subtasks based on evolving context. This addresses the same autonomy goals as early autonomous agent experiments, which teams now implement reliably using LangGraph's typed state and Send API for dynamic graph branching at runtime.
Implementation Sketch
import os
from typing import TypedDict, List, Annotated
from langgraph.graph import StateGraph, END
from langgraph.types import Send
from langchain_openai import ChatOpenAI
import operator
if not os.environ.get("OPENAI_API_KEY"):
raise EnvironmentError("OPENAI_API_KEY must be set before importing this module.")
_llm = ChatOpenAI(model="gpt-4o", timeout=30)
class OrchestratorState(TypedDict):
request: str
subtasks: List[str]
results: Annotated[List[str], operator.add]
class WorkerState(TypedDict):
subtask: str
def orchestrator(state: OrchestratorState) -> dict:
msg = _llm.invoke(f"Decompose into parallel subtasks (one per line):
{state['request']}")
subtasks = [line.strip() for line in msg.content.strip().split("
") if line.strip()]
if not subtasks:
return {"subtasks": [], "results": ["No subtasks generated"]}
return {"subtasks": subtasks}
def dispatch(state: OrchestratorState) -> list:
"""Returns a list of Send objects for LangGraph's fan-out mechanism."""
if not state["subtasks"]:
return [Send("aggregator", state)]
return [Send("worker", {"subtask": task}) for task in state["subtasks"]]
def worker(state: WorkerState) -> dict:
msg = _llm.invoke(f"Complete this subtask thoroughly:
{state['subtask']}")
return {"results": [msg.content]}
def aggregator(state: OrchestratorState) -> dict:
combined = "
---
".join(state["results"])
msg = _llm.invoke(f"Synthesize these results into a coherent response:
{combined}")
return {"results": [msg.content]}
graph = StateGraph(OrchestratorState)
graph.add_node("orchestrator", orchestrator)
graph.add_node("worker", worker)
graph.add_node("aggregator", aggregator)
graph.set_entry_point("orchestrator")
# Send-based fan-out: dispatch returns Send objects that route to "worker" nodes.
# After all workers complete, execution continues to "aggregator".
graph.add_conditional_edges("orchestrator", dispatch, then="aggregator")
graph.add_edge("aggregator", END)
app = graph.compile()
This demonstrates dynamic graph branching with Send, parallel worker execution, and result aggregation. Note that the dispatch function returns a list of Send objects, and the then parameter specifies where execution continues after all spawned workers complete. The orchestrator guards against empty subtask lists to prevent undefined fan-out behavior.
Designing agent control flow is now the highest-leverage skill in AI engineering.
When to Use (and When Not To)
The orchestrator-worker pattern cuts wall-clock time by running N heterogeneous subtasks in parallel rather than sequentially. It is overkill for linear workflows where steps depend on prior outputs. Parallel LLM calls multiply cost; latency is bounded by the slowest parallel worker rather than summed across workers. Before adopting this pattern, ask: do the subtasks actually differ in nature, and does the latency reduction justify paying for concurrent calls?
Pattern 6: Evaluator-Optimizer (Test-Driven Agent Development)
The Pattern
This pattern separates the "doer" agent from the "judge" agent. The evaluator uses rubrics, reference outputs, or an LLM-as-judge approach to score output. The optimizer adjusts strategy, prompt, or parameters based on evaluation feedback. It is the agentic equivalent of test-driven development.
Practical Application
import os
import re
import logging
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
if not os.environ.get("OPENAI_API_KEY"):
raise EnvironmentError("OPENAI_API_KEY must be set before importing this module.")
_llm = ChatOpenAI(model="gpt-4o", timeout=30)
def extract_score(text: str) -> int:
"""
Match 'score: N', 'rating: N', or 'N/10' before
falling back to first standalone integer.
"""
patterns = [
r'(?:score|rating)[:\s]+([1-9]|10)\b',
r'\b([1-9]|10)\s*/\s*10\b',
r'\b([1-9]|10)\b',
]
for pattern in patterns:
match = re.search(pattern, text, re.IGNORECASE)
if match:
return int(match.group(1))
logging.warning("Score extraction failed; defaulting to 0. Response: %s", text[:100])
return 0
class EvalState(TypedDict):
task: str
output: str
score: int
feedback: str
iteration: int
def generate(state: EvalState) -> dict:
context = f"
Feedback to address: {state['feedback']}" if state["feedback"] else ""
msg = _llm.invoke(f"{state['task']}{context}")
return {"output": msg.content, "iteration": state.get("iteration", 0) + 1}
def evaluate(state: EvalState) -> dict:
msg = _llm.invoke(
f"Score 1-10 on accuracy, clarity, completeness. Give a single integer score and feedback.
"
f"Output: {state['output']}"
)
score = extract_score(msg.content)
return {"score": score, "feedback": msg.content}
def route(state: EvalState) -> str:
return END if state["score"] >= 8 or state.get("iteration", 0) >= 3 else "generate"
graph = StateGraph(EvalState)
graph.add_node("generate", generate)
graph.add_node("evaluate", evaluate)
graph.set_entry_point("generate")
graph.add_edge("generate", "evaluate")
graph.add_conditional_edges("evaluate", route, {END: END, "generate": "generate"})
app = graph.compile()
result = app.invoke({
"task": "Explain the benefits of containerization",
"output": "",
"score": 0,
"feedback": "",
"iteration": 0
})
This demonstrates LLM-as-judge with structured scoring using the prioritized extract_score function, conditional routing based on a score threshold, and an iteration cap that prevents infinite loops if the evaluator consistently scores below the threshold.
Interactive Decision Tree: Which Agent Pattern Do You Need?
How to Use the Decision Tree
The following decision logic guides pattern selection:
- Is the task single-step or multi-step? If single-step, consider Tool Use (when external data is required) or Reflection (when the task is self-contained and quality is the bottleneck).
- Does the task require multiple perspectives or roles? If yes, use Multi-Agent collaboration or the debate topology.
- For multi-step tasks, consider whether decomposition is known upfront. If it is, Planning gives you predictable execution. If subtasks emerge dynamically during execution, reach for Orchestrator-Worker.
- Is output quality the primary constraint? If yes, layer in Evaluator-Optimizer.
For non-interactive contexts, a static flowchart following this logic serves as a fallback. Each leaf node maps directly to the corresponding pattern section above.
Combining Patterns
Real-world systems rarely use a single pattern in isolation. A production research agent might combine Orchestrator-Worker for task decomposition, Reflection within each worker for self-correction, and Tool Use for grounding outputs in external data. Start with the simplest pattern that addresses the core problem, then layer additional patterns only when a specific failure mode demands it. Over-engineering agent architectures introduces coordination complexity that can outweigh the benefits.
Production Considerations for Agentic Systems
Observability and Debugging
Traditional logging fails for non-deterministic, multi-step agent flows because the same input can produce different execution paths. LangSmith (a commercial SaaS product; free tier available) provides trace-level visibility into every LLM call, tool invocation, and state transition within a LangGraph execution. Enabling it transmits LLM inputs and outputs to LangChain servers; review data retention policies before use with sensitive data. Phoenix offers similar capabilities for open-source stacks. Custom tracing that captures full state snapshots at each node is essential for debugging production agent failures.
Cost Control
Every agent loop iteration incurs LLM costs. Token budgets per agent loop, circuit breakers that terminate execution after exceeding cost thresholds, and semantic caching of repeated tool calls or LLM responses are the primary levers. Without these, a reflection loop or multi-agent debate can silently exhaust budgets.
Human-in-the-Loop Checkpoints
LangGraph's interrupt and resume model enables approval gates at any node in the graph. This matters most for high-stakes workflows where you need to bound agent autonomy. Designing graceful degradation, where an agent can pause, present its current state to a human, and resume from that exact checkpoint, turns brittle autonomous systems into collaborative ones.
Testing Agentic Workflows
The shift from unit testing prompts to integration testing flows is fundamental. Test agent behavior end-to-end with benchmark datasets and regression suites rather than asserting on individual prompt outputs. This means checking final outcomes, execution paths, and resource consumption across representative inputs.
What's Coming Next: The 2026 to 2027 Horizon
Agent protocol standards are converging faster than expected. OpenAI's Agents SDK, Google's Agent2Agent (A2A) protocol, and Anthropic's Model Context Protocol (MCP) are moving toward interoperable standards. Memory-augmented agents with persistent, queryable long-term memory are moving from experimental to practical. Agent-to-agent marketplaces and composable agent APIs will enable assembling agent capabilities from third-party providers. Open-weight models continue to drive down per-call costs, with per-million-token pricing dropping by roughly an order of magnitude since early 2024, making agentic patterns viable for broader use cases.
Think in Flows, Not Prompts
The six patterns covered here, Reflection, Tool Use, Planning, Multi-Agent Collaboration, Orchestrator-Worker, and Evaluator-Optimizer, form a complete toolkit for building agentic systems. Each addresses a distinct class of problems, and they compose naturally for complex use cases. Designing agent control flow is now the highest-leverage skill in AI engineering. The decision tree above maps any task to the right starting pattern. Pick one, build a prototype, and iterate from there.
Start with the simplest pattern that addresses the core problem, then layer additional patterns only when a specific failure mode demands it.