Terminal-Based Agent Engineering: The 'Claude Code' Workflow


- Premium Results
- Publish articles on SitePoint
- Daily curated jobs
- Learning Paths
- Discounts to dev tools
7 Day Free Trial. Cancel Anytime.
After a wave of chat UIs, copilot sidebars, and IDE plugins, terminal-based agent engineering is pulling the center of gravity back to the command line. The pattern Claude Code reveals is reproducible — any developer with access to an LLM API that supports tool-use can engineer this workflow from scratch.
Table of Contents
- Why the Terminal Is the Natural Habitat for AI Agents
- Anatomy of a Terminal-Based Agent Loop
- Defining Your Tool Surface
- Execution, Sandboxing, and the Trust Boundary
- Building a Working CLI Agent, End to End
- Extending the Pattern: Where to Go Next
Why the Terminal Is the Natural Habitat for AI Agents
After a wave of chat UIs, copilot sidebars, and IDE plugins, terminal-based agent engineering is pulling the center of gravity back to the command line. The reason is structural, not nostalgic. Terminals offer what GUI wrappers cannot: unrestricted filesystem access, process spawning, piping between tools, and native composability with the Unix toolchains developers already use. A chat window can suggest a shell command. A terminal agent can execute it, read the output, and decide what to do next.
A chat window can suggest a shell command. A terminal agent can execute it, read the output, and decide what to do next.
Anthropic's release of Claude Code makes this architectural shift concrete. Claude Code is not just another chatbot in a terminal emulator. It implements an agentic read-eval-print loop where the model reasons, invokes structured tools (file reads, writes, shell commands), observes results, and continues, all within a persistent CLI session. The pattern it reveals is reproducible. Any developer with access to an LLM API that supports tool-use can engineer this workflow from scratch, without waiting for a vendor's packaged tool.
Anatomy of a Terminal-Based Agent Loop
The Core REPL Pattern
The classical REPL (read-eval-print loop) adapts cleanly to agent orchestration. The cycle becomes: Prompt, LLM Call, Tool Resolution, Execution, Observation, Loop. The user provides input, and the model reasons over it. Instead of returning only text, it emits structured tool_use blocks specifying which tool to call and with what arguments. The agent runtime executes those tools locally, collects the results, and appends them to the conversation as tool_result messages before re-querying the model. This continues until the model responds with plain text rather than another tool invocation.
Tool-Use as the Orchestration Layer
Modern LLM APIs from Anthropic and OpenAI expose structured tool-use (sometimes called function-calling) as a first-class response type. In the function-calling model described here, this is the critical differentiator between a chatbot that happens to run in a terminal and a genuine agent. Without structured tool-use, the model can only suggest actions as text. With it, the model emits machine-parseable instructions that the runtime can dispatch deterministically.
Here is a minimal Python agent loop skeleton using the Anthropic SDK. This is a structural skeleton only — execute_tool and tools are defined in the complete script in the next section. Do not run this snippet standalone.
import anthropic
client = anthropic.Anthropic()
tools = [] # defined in the complete script below
messages = []
system_prompt = "You are a CLI coding agent. Use tools to help the user."
while True:
user_input = input("
> ")
if user_input.lower() in ("exit", "quit"):
break
messages.append({"role": "user", "content": user_input})
while True:
response = client.messages.create(
model="claude-sonnet-4-5", # Check https://docs.anthropic.com/en/docs/about-claude/models for the latest identifier
max_tokens=4096,
system=system_prompt,
tools=tools,
messages=messages,
)
messages.append({"role": "assistant", "content": response.content})
tool_calls = [b for b in response.content if b.type == "tool_use"]
if not tool_calls:
final_texts = [b for b in response.content if hasattr(b, "text")]
for block in final_texts:
print(f"
{block.text}")
if not final_texts:
print("
(no response)")
break
tool_results = []
for tc in tool_calls:
result = execute_tool(tc.name, tc.input) # defined in the complete script
tool_results.append(
{"type": "tool_result", "tool_use_id": tc.id, "content": result}
)
messages.append({"role": "user", "content": tool_results})
The inner while True loop is key. It allows the model to chain multiple tool calls before producing a final text response, which is exactly how multi-step reasoning works in practice.
Defining Your Tool Surface
Essential Tools for a Code Agent
A minimum viable code agent needs five tools: read_file, write_file, run_shell, list_directory, and search_files. This set covers navigation, inspection, mutation, and arbitrary command execution. Keep the tool surface small for two reasons: safety (fewer capabilities means a smaller blast radius) and token efficiency (each tool schema adds roughly 100-300 tokens to every request, and the model processes every definition on every call).
Tool Schema Design
You define tools as JSON schemas that the LLM reasons about at inference time. Rich descriptions help the model call tools correctly; overly verbose descriptions waste tokens. The balance is to describe what the tool does, what each parameter means, and any constraints.
tools = [
{
"name": "read_file",
"description": "Read the contents of a file at the given path.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Relative or absolute file path."}
},
"required": ["path"],
},
},
{
"name": "write_file",
"description": "Write content to a file, creating or overwriting it.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path to write to."},
"content": {"type": "string", "description": "Full file content."},
},
"required": ["path", "content"],
},
},
{
"name": "run_shell",
"description": "Execute a shell command and return stdout/stderr.",
"input_schema": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "Shell command to execute."}
},
"required": ["command"],
},
},
{
"name": "list_directory",
"description": "List files and subdirectories at the given path. Path defaults to current directory if omitted.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Directory path. Defaults to current directory if omitted."}
},
"required": [],
},
},
{
"name": "search_files",
"description": "Search for a regex pattern across files in a directory tree.",
"input_schema": {
"type": "object",
"properties": {
"pattern": {"type": "string", "description": "Regex pattern to search for."},
"path": {"type": "string", "description": "Root directory to search."},
},
"required": ["pattern"],
},
},
]
Note: the "default" field in JSON Schema is advisory only. The Anthropic API does not inject default values into model-generated arguments. Always use args.get('path', '.') in your dispatch code to handle cases where the model omits optional parameters.
Execution, Sandboxing, and the Trust Boundary
Running Shell Commands Safely
The hardest design decision in a terminal agent is the trust boundary for shell execution. Auto-executing every command the model emits is powerful but dangerous. Claude Code addresses this with permission prompts and allowlists for known-safe commands. A practical middle ground for custom agents is a human-in-the-loop confirmation gate: show the command, ask the user to approve, and only then execute. The same principle applies to file writes — any destructive action should require explicit user confirmation.
Beyond the confirmation gate, the implementation below uses shell=False by default via shlex.split(), falling back to shell=True only when the command contains shell metacharacters (pipes, redirects, etc.). This limits the shell interpreter's involvement to cases where it is genuinely needed.
You also need to manage the context window. Long-running sessions accumulate tool outputs that consume the context window rapidly, especially when shell commands dump large amounts of text. Practical strategies include truncating tool output to a fixed character limit, implementing summarization checkpoints where the model condenses the conversation so far, and maintaining a sliding window that always pins the system prompt. The complete script below implements truncation and a sliding-window eviction policy.
import subprocess
import shlex
MAX_OUTPUT = 10000
def run_shell_confirmed(command: str, timeout: int = 30) -> str:
print(f"
⚠ Agent wants to run: {command}")
confirm = input("Allow? [y/N]: ").strip().lower()
if confirm != "y":
return "Command rejected by user."
try:
# Use shell=False where possible; fall back to shell=True only for
# pipes, redirects, and other shell metacharacters.
if any(ch in command for ch in ("|", ">", "<", ";", "&", "$", "`")):
result = subprocess.run(
command, shell=True, capture_output=True,
text=True, timeout=timeout
)
else:
result = subprocess.run(
shlex.split(command), shell=False, capture_output=True,
text=True, timeout=timeout
)
output = result.stdout + result.stderr
except subprocess.TimeoutExpired:
return f"Command timed out after {timeout} seconds."
if len(output) > MAX_OUTPUT:
output = output[:MAX_OUTPUT] + f"
... [truncated to {MAX_OUTPUT} chars]"
return output or "(no output)"
⚠ Warning: When shell=True is used (for commands containing pipes, redirects, or other metacharacters), the command string is passed directly to the shell interpreter. The human confirmation gate is the primary safeguard. Always review the full command before approving.
The 30-second timeout prevents runaway processes. Adjust timeout based on expected command duration; 30 seconds is appropriate for interactive one-liners but too short for build or install commands. The truncation cap of 10,000 characters keeps tool results from blowing out the context window while still providing enough output for the model to reason about.
Building a Working CLI Agent, End to End
Project Setup and Dependencies
Requirements are minimal: Python 3.11+, the anthropic SDK (pip install anthropic>=0.40.0; check the SDK changelog for breaking changes), and a valid ANTHROPIC_API_KEY set as an environment variable. The script assumes a Unix-like environment (Linux or macOS); Windows users should use WSL or Git Bash, since the search_files tool depends on grep. The entire agent fits in a single file.
Set the following environment variables before running:
export ANTHROPIC_API_KEY="your-key-here"
# Optionally override the model identifier:
export CLAUDE_MODEL="claude-sonnet-4-5"
The Complete Agent Script
import os, sys, shutil, subprocess, shlex, pathlib, logging
import anthropic
logging.basicConfig(level=logging.WARNING)
if not os.environ.get("ANTHROPIC_API_KEY"):
sys.exit("Error: ANTHROPIC_API_KEY environment variable not set.")
client = anthropic.Anthropic()
MAX_OUTPUT = 10000
MAX_MESSAGES = 40 # sliding window size; tune to model context budget
MODEL = os.environ.get("CLAUDE_MODEL", "claude-sonnet-4-5")
SESSION_ROOT = pathlib.Path(os.getcwd()).resolve()
SYSTEM = """You are a terminal-based coding agent. Use tools to explore and modify
the user's project. Be concise. Ask before making destructive changes."""
tools = [
{"name": "read_file", "description": "Read a file's contents.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
{"name": "write_file", "description": "Write content to a file.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
{"name": "run_shell", "description": "Run a shell command.",
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
{"name": "list_directory", "description": "List directory contents. Path defaults to current directory if omitted.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": []}},
{"name": "search_files", "description": "Regex search across files.",
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}, "path": {"type": "string"}}, "required": ["pattern"]}},
]
def _safe_path(raw: str) -> pathlib.Path:
"""Resolve a path and ensure it stays within SESSION_ROOT."""
resolved = (SESSION_ROOT / raw).resolve()
if not str(resolved).startswith(str(SESSION_ROOT)):
raise PermissionError(
f"Path '{raw}' escapes session root '{SESSION_ROOT}'"
)
return resolved
def execute_tool(name: str, args: dict) -> str:
try:
if name == "read_file":
safe = _safe_path(args["path"])
with safe.open(encoding="utf-8", errors="replace") as f:
return f.read()[:MAX_OUTPUT]
elif name == "write_file":
safe = _safe_path(args["path"])
print(f"
⚠ Write {len(args['content'])} chars to: {safe}")
if input("Allow? [y/N]: ").strip().lower() != "y":
return "Rejected by user."
safe.parent.mkdir(parents=True, exist_ok=True)
with safe.open("w", encoding="utf-8") as f:
f.write(args["content"])
return f"Wrote {len(args['content'])} chars to {safe}"
elif name == "run_shell":
command = args["command"]
print(f"
⚠ Run: {command}")
if input("Allow? [y/N]: ").strip().lower() != "y":
return "Rejected by user."
# Use shell=False where possible; fall back to shell=True only for
# pipes, redirects, and other shell metacharacters.
if any(ch in command for ch in ("|", ">", "<", ";", "&", "$", "`")):
r = subprocess.run(
command, shell=True, capture_output=True,
text=True, timeout=30
)
else:
r = subprocess.run(
shlex.split(command), shell=False, capture_output=True,
text=True, timeout=30
)
out = (r.stdout + r.stderr)[:MAX_OUTPUT]
return out or "(no output)"
elif name == "list_directory":
safe = _safe_path(args.get("path", "."))
return "
".join(os.listdir(str(safe)))
elif name == "search_files":
if not shutil.which("grep"):
return "Error: grep not found. On Windows, install Git Bash or use WSL."
search_root = str(_safe_path(args.get("path", ".")))
r = subprocess.run(
["grep", "-rn", "-E", args["pattern"], search_root],
capture_output=True, text=True, timeout=30
)
if r.returncode == 2:
return f"grep error:
{r.stderr[:MAX_OUTPUT]}"
return (r.stdout or "No matches.")[:MAX_OUTPUT]
else:
return f"Error: unknown tool '{name}'"
except subprocess.TimeoutExpired:
return "Error: command timed out after 30 seconds."
except PermissionError as e:
return f"Error: permission denied — {e}"
except FileNotFoundError as e:
return f"Error: file not found — {e}"
except KeyboardInterrupt:
raise # Propagate for clean session exit
except Exception as e:
logging.exception("Unexpected error in tool '%s' with args %s", name, args)
return f"Error: unexpected failure — {type(e).__name__}: {e}"
def _evict_messages(messages: list) -> list:
"""
Retain the last MAX_MESSAGES messages.
Always ensure the returned list starts with a 'user' role message
for Anthropic API compliance.
"""
if len(messages) <= MAX_MESSAGES:
return messages
tail = messages[-MAX_MESSAGES:]
# Ensure tail starts with 'user' role for API compliance
while tail and tail[0]["role"] != "user":
tail = tail[1:]
return tail
def main():
messages = []
system = SYSTEM
memory_file = "AGENT.md"
if os.path.exists(memory_file):
with open(memory_file, encoding="utf-8") as f:
project_context = f.read()
# Append project context to the system prompt rather than injecting
# synthetic message turns, which avoids API message-order issues.
system = SYSTEM + f"
# Project Context (AGENT.md)
{project_context}"
print("CLI Agent ready. Type 'exit' to quit.
")
while True:
user_input = input("> ")
if user_input.lower() in ("exit", "quit"):
break
messages.append({"role": "user", "content": user_input})
while True:
messages = _evict_messages(messages)
response = client.messages.create(
model=MODEL, max_tokens=4096,
system=system, tools=tools, messages=messages,
)
messages.append({"role": "assistant", "content": response.content})
tool_calls = [b for b in response.content if b.type == "tool_use"]
if not tool_calls:
final_texts = [b for b in response.content if hasattr(b, "text")]
if final_texts:
for block in final_texts:
print(f"
{block.text}")
else:
print("
(no response)")
break
results = []
for tc in tool_calls:
print(f"
🔧 {tc.name}({tc.input})")
result = execute_tool(tc.name, tc.input)
results.append({"type": "tool_result", "tool_use_id": tc.id, "content": result})
messages.append({"role": "user", "content": results})
if __name__ == "__main__":
main()
Running It and What to Expect
A typical session: the user asks the agent to find all TODO comments. The agent calls search_files with pattern TODO, returns matching lines, and summarizes. The user then asks it to refactor a specific function, and the agent chains read_file, reasons about the code, then calls write_file with the updated version, prompting for confirmation before writing. This kind of multi-step tool chaining works well with Claude Sonnet in chains under roughly five tool calls. Where it breaks down: complex multi-step reasoning across many files, and sessions exceeding around 50 tool calls where accumulated tool output approaches the context window limit. The sliding-window eviction policy mitigates this but does not eliminate it entirely for extremely long sessions.
This kind of multi-step tool chaining works well with Claude Sonnet in chains under roughly five tool calls. Where it breaks down: complex multi-step reasoning across many files, and sessions exceeding around 50 tool calls where accumulated tool output approaches the context window limit.
Extending the Pattern: Where to Go Next
Adding Persistent Memory
The script above already demonstrates the simplest form of persistent memory: an AGENT.md file that the script loads at startup and appends to the system prompt. This mirrors Claude Code's CLAUDE.md convention. Teams can store project conventions, architectural decisions, and known constraints in this file so the agent starts every session with relevant context.
Multi-Model and Local LLM Support
Because the agent loop is decoupled from the model provider, swapping the Anthropic client for any OpenAI-compatible API (Ollama, vLLM, llama.cpp's server mode) requires two changes: replace the anthropic.Anthropic() constructor and adapt the response content-block parsing. The trade-off is real, though: smaller open-weight models frequently emit malformed JSON tool calls or reference nonexistent tools. Evaluate any candidate model against your specific tool schemas before production use.
MCP and External Tool Servers
The Model Context Protocol (MCP) is an open standard for externalizing tool surfaces, with published SDKs and adoption across multiple LLM providers and tooling ecosystems. Rather than hardcoding tool definitions in the agent script, MCP servers advertise available tools dynamically. This would let the agent above discover and use tools from any MCP-compliant server without code changes.
The terminal agent loop is a primitive, like the Unix pipe. Five tools and a compact Python script produce a working agent that reads, writes, searches, and executes across an entire codebase. Everything beyond that is extension, not reinvention.