Terminal-Based Agent Engineering: The 'Claude Code' Workflow

SitePoint Team

Published in

AI·Programming·

February 23, 2026

Share this article

Terminal-Based Agent Engineering: The 'Claude Code' Workflow

SitePoint Premium

Stay Relevant and Grow Your Career in Tech

Premium Results
Publish articles on SitePoint
Daily curated jobs
Learning Paths
Discounts to dev tools

Start Free Trial

7 Day Free Trial. Cancel Anytime.

After a wave of chat UIs, copilot sidebars, and IDE plugins, terminal-based agent engineering is pulling the center of gravity back to the command line. The pattern Claude Code reveals is reproducible — any developer with access to an LLM API that supports tool-use can engineer this workflow from scratch.

Why the Terminal Is the Natural Habitat for AI Agents
Anatomy of a Terminal-Based Agent Loop
Defining Your Tool Surface
Execution, Sandboxing, and the Trust Boundary
Building a Working CLI Agent, End to End
Extending the Pattern: Where to Go Next

Why the Terminal Is the Natural Habitat for AI Agents

After a wave of chat UIs, copilot sidebars, and IDE plugins, terminal-based agent engineering is pulling the center of gravity back to the command line. The reason is structural, not nostalgic. Terminals offer what GUI wrappers cannot: unrestricted filesystem access, process spawning, piping between tools, and native composability with the Unix toolchains developers already use. A chat window can suggest a shell command. A terminal agent can execute it, read the output, and decide what to do next.

A chat window can suggest a shell command. A terminal agent can execute it, read the output, and decide what to do next.

Anthropic's release of Claude Code makes this architectural shift concrete. Claude Code is not just another chatbot in a terminal emulator. It implements an agentic read-eval-print loop where the model reasons, invokes structured tools (file reads, writes, shell commands), observes results, and continues, all within a persistent CLI session. The pattern it reveals is reproducible. Any developer with access to an LLM API that supports tool-use can engineer this workflow from scratch, without waiting for a vendor's packaged tool.

Anatomy of a Terminal-Based Agent Loop

The Core REPL Pattern

The classical REPL (read-eval-print loop) adapts cleanly to agent orchestration. The cycle becomes: Prompt, LLM Call, Tool Resolution, Execution, Observation, Loop. The user provides input, and the model reasons over it. Instead of returning only text, it emits structured tool_use blocks specifying which tool to call and with what arguments. The agent runtime executes those tools locally, collects the results, and appends them to the conversation as tool_result messages before re-querying the model. This continues until the model responds with plain text rather than another tool invocation.

Tool-Use as the Orchestration Layer

Modern LLM APIs from Anthropic and OpenAI expose structured tool-use (sometimes called function-calling) as a first-class response type. In the function-calling model described here, this is the critical differentiator between a chatbot that happens to run in a terminal and a genuine agent. Without structured tool-use, the model can only suggest actions as text. With it, the model emits machine-parseable instructions that the runtime can dispatch deterministically.

Here is a minimal Python agent loop skeleton using the Anthropic SDK. This is a structural skeleton only — execute_tool and tools are defined in the complete script in the next section. Do not run this snippet standalone.

import anthropic

client = anthropic.Anthropic()
tools = []  # defined in the complete script below
messages = []
system_prompt = "You are a CLI coding agent. Use tools to help the user."

while True:
    user_input = input("
> ")
    if user_input.lower() in ("exit", "quit"):
        break
    messages.append({"role": "user", "content": user_input})

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-5",  # Check https://docs.anthropic.com/en/docs/about-claude/models for the latest identifier
            max_tokens=4096,
            system=system_prompt,
            tools=tools,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": response.content})

        tool_calls = [b for b in response.content if b.type == "tool_use"]
        if not tool_calls:
            final_texts = [b for b in response.content if hasattr(b, "text")]
            for block in final_texts:
                print(f"
{block.text}")
            if not final_texts:
                print("
(no response)")
            break

        tool_results = []
        for tc in tool_calls:
            result = execute_tool(tc.name, tc.input)  # defined in the complete script
            tool_results.append(
                {"type": "tool_result", "tool_use_id": tc.id, "content": result}
            )
        messages.append({"role": "user", "content": tool_results})

The inner while True loop is key. It allows the model to chain multiple tool calls before producing a final text response, which is exactly how multi-step reasoning works in practice.

Defining Your Tool Surface

Essential Tools for a Code Agent

A minimum viable code agent needs five tools: read_file, write_file, run_shell, list_directory, and search_files. This set covers navigation, inspection, mutation, and arbitrary command execution. Keep the tool surface small for two reasons: safety (fewer capabilities means a smaller blast radius) and token efficiency (each tool schema adds roughly 100-300 tokens to every request, and the model processes every definition on every call).

Tool Schema Design

You define tools as JSON schemas that the LLM reasons about at inference time. Rich descriptions help the model call tools correctly; overly verbose descriptions waste tokens. The balance is to describe what the tool does, what each parameter means, and any constraints.

tools = [
    {
        "name": "read_file",
        "description": "Read the contents of a file at the given path.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "Relative or absolute file path."}
            },
            "required": ["path"],
        },
    },
    {
        "name": "write_file",
        "description": "Write content to a file, creating or overwriting it.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "File path to write to."},
                "content": {"type": "string", "description": "Full file content."},
            },
            "required": ["path", "content"],
        },
    },
    {
        "name": "run_shell",
        "description": "Execute a shell command and return stdout/stderr.",
        "input_schema": {
            "type": "object",
            "properties": {
                "command": {"type": "string", "description": "Shell command to execute."}
            },
            "required": ["command"],
        },
    },
    {
        "name": "list_directory",
        "description": "List files and subdirectories at the given path. Path defaults to current directory if omitted.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {"type": "string", "description": "Directory path. Defaults to current directory if omitted."}
            },
            "required": [],
        },
    },
    {
        "name": "search_files",
        "description": "Search for a regex pattern across files in a directory tree.",
        "input_schema": {
            "type": "object",
            "properties": {
                "pattern": {"type": "string", "description": "Regex pattern to search for."},
                "path": {"type": "string", "description": "Root directory to search."},
            },
            "required": ["pattern"],
        },
    },
]

Note: the "default" field in JSON Schema is advisory only. The Anthropic API does not inject default values into model-generated arguments. Always use args.get('path', '.') in your dispatch code to handle cases where the model omits optional parameters.

Execution, Sandboxing, and the Trust Boundary

Running Shell Commands Safely

The hardest design decision in a terminal agent is the trust boundary for shell execution. Auto-executing every command the model emits is powerful but dangerous. Claude Code addresses this with permission prompts and allowlists for known-safe commands. A practical middle ground for custom agents is a human-in-the-loop confirmation gate: show the command, ask the user to approve, and only then execute. The same principle applies to file writes — any destructive action should require explicit user confirmation.

Beyond the confirmation gate, the implementation below uses shell=False by default via shlex.split(), falling back to shell=True only when the command contains shell metacharacters (pipes, redirects, etc.). This limits the shell interpreter's involvement to cases where it is genuinely needed.

You also need to manage the context window. Long-running sessions accumulate tool outputs that consume the context window rapidly, especially when shell commands dump large amounts of text. Practical strategies include truncating tool output to a fixed character limit, implementing summarization checkpoints where the model condenses the conversation so far, and maintaining a sliding window that always pins the system prompt. The complete script below implements truncation and a sliding-window eviction policy.

import subprocess
import shlex

MAX_OUTPUT = 10000

def run_shell_confirmed(command: str, timeout: int = 30) -> str:
    print(f"
⚠ Agent wants to run: {command}")
    confirm = input("Allow? [y/N]: ").strip().lower()
    if confirm != "y":
        return "Command rejected by user."
    try:
        # Use shell=False where possible; fall back to shell=True only for
        # pipes, redirects, and other shell metacharacters.
        if any(ch in command for ch in ("|", ">", "<", ";", "&", "$", "`")):
            result = subprocess.run(
                command, shell=True, capture_output=True,
                text=True, timeout=timeout
            )
        else:
            result = subprocess.run(
                shlex.split(command), shell=False, capture_output=True,
                text=True, timeout=timeout
            )
        output = result.stdout + result.stderr
    except subprocess.TimeoutExpired:
        return f"Command timed out after {timeout} seconds."
    if len(output) > MAX_OUTPUT:
        output = output[:MAX_OUTPUT] + f"
... [truncated to {MAX_OUTPUT} chars]"
    return output or "(no output)"

⚠ Warning: When shell=True is used (for commands containing pipes, redirects, or other metacharacters), the command string is passed directly to the shell interpreter. The human confirmation gate is the primary safeguard. Always review the full command before approving.

The 30-second timeout prevents runaway processes. Adjust timeout based on expected command duration; 30 seconds is appropriate for interactive one-liners but too short for build or install commands. The truncation cap of 10,000 characters keeps tool results from blowing out the context window while still providing enough output for the model to reason about.

Building a Working CLI Agent, End to End

Project Setup and Dependencies

Requirements are minimal: Python 3.11+, the anthropic SDK (pip install anthropic>=0.40.0; check the SDK changelog for breaking changes), and a valid ANTHROPIC_API_KEY set as an environment variable. The script assumes a Unix-like environment (Linux or macOS); Windows users should use WSL or Git Bash, since the search_files tool depends on grep. The entire agent fits in a single file.

Set the following environment variables before running:

export ANTHROPIC_API_KEY="your-key-here"
# Optionally override the model identifier:
export CLAUDE_MODEL="claude-sonnet-4-5"

The Complete Agent Script

import os, sys, shutil, subprocess, shlex, pathlib, logging
import anthropic

logging.basicConfig(level=logging.WARNING)

if not os.environ.get("ANTHROPIC_API_KEY"):
    sys.exit("Error: ANTHROPIC_API_KEY environment variable not set.")

client = anthropic.Anthropic()
MAX_OUTPUT = 10000
MAX_MESSAGES = 40  # sliding window size; tune to model context budget
MODEL = os.environ.get("CLAUDE_MODEL", "claude-sonnet-4-5")
SESSION_ROOT = pathlib.Path(os.getcwd()).resolve()

SYSTEM = """You are a terminal-based coding agent. Use tools to explore and modify
the user's project. Be concise. Ask before making destructive changes."""

tools = [
    {"name": "read_file", "description": "Read a file's contents.",
     "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
    {"name": "write_file", "description": "Write content to a file.",
     "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
    {"name": "run_shell", "description": "Run a shell command.",
     "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
    {"name": "list_directory", "description": "List directory contents. Path defaults to current directory if omitted.",
     "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": []}},
    {"name": "search_files", "description": "Regex search across files.",
     "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}, "path": {"type": "string"}}, "required": ["pattern"]}},
]


def _safe_path(raw: str) -> pathlib.Path:
    """Resolve a path and ensure it stays within SESSION_ROOT."""
    resolved = (SESSION_ROOT / raw).resolve()
    if not str(resolved).startswith(str(SESSION_ROOT)):
        raise PermissionError(
            f"Path '{raw}' escapes session root '{SESSION_ROOT}'"
        )
    return resolved


def execute_tool(name: str, args: dict) -> str:
    try:
        if name == "read_file":
            safe = _safe_path(args["path"])
            with safe.open(encoding="utf-8", errors="replace") as f:
                return f.read()[:MAX_OUTPUT]

        elif name == "write_file":
            safe = _safe_path(args["path"])
            print(f"
⚠ Write {len(args['content'])} chars to: {safe}")
            if input("Allow? [y/N]: ").strip().lower() != "y":
                return "Rejected by user."
            safe.parent.mkdir(parents=True, exist_ok=True)
            with safe.open("w", encoding="utf-8") as f:
                f.write(args["content"])
            return f"Wrote {len(args['content'])} chars to {safe}"

        elif name == "run_shell":
            command = args["command"]
            print(f"
⚠ Run: {command}")
            if input("Allow? [y/N]: ").strip().lower() != "y":
                return "Rejected by user."
            # Use shell=False where possible; fall back to shell=True only for
            # pipes, redirects, and other shell metacharacters.
            if any(ch in command for ch in ("|", ">", "<", ";", "&", "$", "`")):
                r = subprocess.run(
                    command, shell=True, capture_output=True,
                    text=True, timeout=30
                )
            else:
                r = subprocess.run(
                    shlex.split(command), shell=False, capture_output=True,
                    text=True, timeout=30
                )
            out = (r.stdout + r.stderr)[:MAX_OUTPUT]
            return out or "(no output)"

        elif name == "list_directory":
            safe = _safe_path(args.get("path", "."))
            return "
".join(os.listdir(str(safe)))

        elif name == "search_files":
            if not shutil.which("grep"):
                return "Error: grep not found. On Windows, install Git Bash or use WSL."
            search_root = str(_safe_path(args.get("path", ".")))
            r = subprocess.run(
                ["grep", "-rn", "-E", args["pattern"], search_root],
                capture_output=True, text=True, timeout=30
            )
            if r.returncode == 2:
                return f"grep error:
{r.stderr[:MAX_OUTPUT]}"
            return (r.stdout or "No matches.")[:MAX_OUTPUT]

        else:
            return f"Error: unknown tool '{name}'"

    except subprocess.TimeoutExpired:
        return "Error: command timed out after 30 seconds."
    except PermissionError as e:
        return f"Error: permission denied — {e}"
    except FileNotFoundError as e:
        return f"Error: file not found — {e}"
    except KeyboardInterrupt:
        raise  # Propagate for clean session exit
    except Exception as e:
        logging.exception("Unexpected error in tool '%s' with args %s", name, args)
        return f"Error: unexpected failure — {type(e).__name__}: {e}"


def _evict_messages(messages: list) -> list:
    """
    Retain the last MAX_MESSAGES messages.
    Always ensure the returned list starts with a 'user' role message
    for Anthropic API compliance.
    """
    if len(messages) <= MAX_MESSAGES:
        return messages
    tail = messages[-MAX_MESSAGES:]
    # Ensure tail starts with 'user' role for API compliance
    while tail and tail[0]["role"] != "user":
        tail = tail[1:]
    return tail


def main():
    messages = []
    system = SYSTEM

    memory_file = "AGENT.md"
    if os.path.exists(memory_file):
        with open(memory_file, encoding="utf-8") as f:
            project_context = f.read()
        # Append project context to the system prompt rather than injecting
        # synthetic message turns, which avoids API message-order issues.
        system = SYSTEM + f"

# Project Context (AGENT.md)
{project_context}"

    print("CLI Agent ready. Type 'exit' to quit.
")

    while True:
        user_input = input("> ")
        if user_input.lower() in ("exit", "quit"):
            break
        messages.append({"role": "user", "content": user_input})

        while True:
            messages = _evict_messages(messages)
            response = client.messages.create(
                model=MODEL, max_tokens=4096,
                system=system, tools=tools, messages=messages,
            )
            messages.append({"role": "assistant", "content": response.content})

            tool_calls = [b for b in response.content if b.type == "tool_use"]
            if not tool_calls:
                final_texts = [b for b in response.content if hasattr(b, "text")]
                if final_texts:
                    for block in final_texts:
                        print(f"
{block.text}")
                else:
                    print("
(no response)")
                break

            results = []
            for tc in tool_calls:
                print(f"
🔧 {tc.name}({tc.input})")
                result = execute_tool(tc.name, tc.input)
                results.append({"type": "tool_result", "tool_use_id": tc.id, "content": result})
            messages.append({"role": "user", "content": results})

if __name__ == "__main__":
    main()

Running It and What to Expect

A typical session: the user asks the agent to find all TODO comments. The agent calls search_files with pattern TODO, returns matching lines, and summarizes. The user then asks it to refactor a specific function, and the agent chains read_file, reasons about the code, then calls write_file with the updated version, prompting for confirmation before writing. This kind of multi-step tool chaining works well with Claude Sonnet in chains under roughly five tool calls. Where it breaks down: complex multi-step reasoning across many files, and sessions exceeding around 50 tool calls where accumulated tool output approaches the context window limit. The sliding-window eviction policy mitigates this but does not eliminate it entirely for extremely long sessions.

This kind of multi-step tool chaining works well with Claude Sonnet in chains under roughly five tool calls. Where it breaks down: complex multi-step reasoning across many files, and sessions exceeding around 50 tool calls where accumulated tool output approaches the context window limit.

Extending the Pattern: Where to Go Next

Adding Persistent Memory

The script above already demonstrates the simplest form of persistent memory: an AGENT.md file that the script loads at startup and appends to the system prompt. This mirrors Claude Code's CLAUDE.md convention. Teams can store project conventions, architectural decisions, and known constraints in this file so the agent starts every session with relevant context.

Multi-Model and Local LLM Support

Because the agent loop is decoupled from the model provider, swapping the Anthropic client for any OpenAI-compatible API (Ollama, vLLM, llama.cpp's server mode) requires two changes: replace the anthropic.Anthropic() constructor and adapt the response content-block parsing. The trade-off is real, though: smaller open-weight models frequently emit malformed JSON tool calls or reference nonexistent tools. Evaluate any candidate model against your specific tool schemas before production use.

MCP and External Tool Servers

The Model Context Protocol (MCP) is an open standard for externalizing tool surfaces, with published SDKs and adoption across multiple LLM providers and tooling ecosystems. Rather than hardcoding tool definitions in the agent script, MCP servers advertise available tools dynamically. This would let the agent above discover and use tools from any MCP-compliant server without code changes.

The terminal agent loop is a primitive, like the Unix pipe. Five tools and a compact Python script produce a working agent that reads, writes, searches, and executes across an entire codebase. Everything beyond that is extension, not reinvention.

Terminal-Based Agent Engineering: The 'Claude Code' Workflow

Terminal-Based Agent Engineering: The 'Claude Code' Workflow

Table of Contents

Why the Terminal Is the Natural Habitat for AI Agents

Anatomy of a Terminal-Based Agent Loop

The Core REPL Pattern

Tool-Use as the Orchestration Layer

Defining Your Tool Surface

Essential Tools for a Code Agent

Tool Schema Design

Execution, Sandboxing, and the Trust Boundary

Running Shell Commands Safely

Building a Working CLI Agent, End to End

Project Setup and Dependencies

The Complete Agent Script

Running It and What to Expect

Extending the Pattern: Where to Go Next

Adding Persistent Memory

Multi-Model and Local LLM Support

MCP and External Tool Servers

Further Reading

Social Engineering 2.0: The 'Talking to Strangers' Vulnerability

Game Dev Without An Engine: The 2025/2026 Renaissance

NIST vs Global Science: The Impact of Foreign Scientist Restrictions