AI & ML

Security Patterns for Autonomous Agents: Lessons from Pentagi

· 5 min read
SitePoint Premium
Stay Relevant and Grow Your Career in Tech
  • Premium Results
  • Publish articles on SitePoint
  • Daily curated jobs
  • Learning Paths
  • Discounts to dev tools
Start Free Trial

7 Day Free Trial. Cancel Anytime.

Autonomous AI agents are no longer confined to chat windows. They execute code, call APIs, modify filesystems, and chain together multi-step workflows without waiting for human approval at each stage — and this fundamentally changes the threat model.

Table of Contents

Why Autonomous Agents Are a Security Inflection Point

Autonomous AI agents are no longer confined to chat windows. They execute code, call APIs, modify filesystems, and chain together multi-step workflows without waiting for human approval at each stage. This autonomy lets agents chain filesystem writes, API calls, and shell commands without per-step human review, and it fundamentally changes the threat model. A misconfigured agent with shell access can escalate privileges, exfiltrate sensitive data, or pivot across an internal network within a single agent loop iteration. Local agent infrastructure demands containment strategies that match the capabilities being granted.

The OWASP Top 10 for LLM Applications (2025 edition) highlights risks like insecure output handling, excessive agency, and improper access control as critical vulnerabilities in LLM-powered systems. These aren't theoretical. They describe exactly what happens when an agent's execution environment lacks proper boundaries.

Pentagi, an open-source AI-powered penetration testing agent from vxcontrol, is a useful reference because it's an agent designed to attack — its containment patterns are tested against adversarial behavior. It's an agent designed to hack, built to probe networks and exploit vulnerabilities autonomously. Yet its architecture treats self-containment as a first-class concern. The patterns it employs for sandboxing, permission scoping, human oversight, and audit logging transfer directly to any team shipping autonomous agents to production.

A misconfigured agent with shell access can escalate privileges, exfiltrate sensitive data, or pivot across an internal network within a single agent loop iteration.

What Is Pentagi and Why It Matters for Agent Security

Pentagi (GitHub: vxcontrol/pentagi) is a fully autonomous penetration testing agent that uses LLM orchestration to plan and execute security assessments. Its stack combines a Go backend, a React frontend for monitoring, an LLM planning layer, and a Docker-based execution environment where all offensive operations run. The architecture is deliberately modular: the LLM generates a plan, tasks flow into a queue, a sandboxed executor carries them out, and results feed back into a reporting layer.

Its danger is exactly what makes it the ideal reference architecture for agent security. A pen-testing agent must have access to network scanning tools, exploit frameworks, and shell execution. If any of those capabilities leak outside their intended boundaries, the agent becomes a threat to its own host. Pentagi's design proves that even the most hazardous agent use case can be contained through architecture rather than afterthought.

Pentagi's Core Architecture at a Glance

Pentagi decomposes its workload across specialized sub-agents, each scoped to a narrow set of capabilities. Based on the project's published architecture, a searcher agent handles web lookups and OSINT. A coder agent writes scripts and tooling. An installer agent manages package dependencies. A pentester agent runs the actual offensive operations. Each sub-agent has access only to the tools required for its role, and the LLM planner orchestrates their sequencing without granting any single component a flat, unrestricted tool namespace. Refer to the Pentagi repository for current sub-agent definitions and tool scopes, as these may evolve across releases.

Pattern 1: Container-Based Sandboxing for Agent Execution

Pentagi runs all offensive operations inside isolated Docker containers. This is the single most critical containment boundary: regardless of what the agent does inside the container (scanning ports, running exploits, writing to disk), the container's namespace limits the blast radius. Filesystem isolation prevents the agent from reaching host resources. Network segmentation controls which targets are reachable. Resource limits prevent a runaway process from consuming host CPU or memory.

For any autonomous agent, treat this pattern as a hard requirement. Without container isolation, a prompt-injected agent running as host root can read /etc/shadow and pivot to adjacent services. Mount only what the agent needs, and drop all Linux capabilities the agent doesn't require. Enforce a read-only root filesystem where possible — this single constraint blocks an entire class of persistence attacks where a compromised agent writes backdoors into its own image layers. Restrict network access to the minimum viable scope.

Prerequisites: This pattern requires Docker Engine 20.10+ and Docker Compose CLI v2+ (docker compose, not legacy docker-compose v1). Resource limits under the deploy key are silently ignored by legacy Compose v1 in non-Swarm mode — no error is raised, and no limits are applied. If you must support Compose v1, use top-level mem_limit and cpus keys instead. The tmpfs size enforcement and Linux capability semantics described below are Linux-host specific; Docker Desktop on macOS and Windows may silently ignore tmpfs size limits and behave differently with capability restrictions.

Implementing a Sandboxed Agent Executor

The following Docker Compose configuration demonstrates a locked-down execution container modeled on patterns found in Pentagi's deployment architecture:

services:
  agent-executor:
    image: agent-sandbox:1.0.0  # Build locally from ./Dockerfile.sandbox; no public image available
    user: "65534:65534"          # Run as nobody:nobody — never as root
    read_only: true
    tmpfs:
      - /tmp:size=100M
    cap_drop:
      - ALL
    cap_add:
      - NET_RAW    # Only if raw-socket scanning is required; verify with your specific tools — some modes also require NET_ADMIN or a custom seccomp profile
    security_opt:
      - no-new-privileges:true
      - seccomp:/etc/docker/seccomp/agent-executor.json  # Required: custom seccomp profile restricting syscalls beyond capability drops
    networks:
      - agent-isolated
    volumes:
      - ./task-input:/data/input:ro
      - ./task-output:/data/output
    restart: on-failure:3        # Explicit restart policy; Docker's default 'no' is invisible to operators
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
    environment:
      - AGENT_ROLE=executor
      - ALLOWED_TARGETS=10.0.1.0/24
      # Do not place secrets (API keys, tokens) in environment variables.
      # Environment variables are visible via `docker inspect` to any process with Docker socket access.
      # Use Docker secrets or an external vault for sensitive values.

networks:
  agent-isolated:
    driver: bridge
    internal: true   # Blocks external internet egress; does NOT restrict inter-container traffic on this network

Key decisions here: cap_drop: ALL removes every Linux capability, then only the specific ones needed are re-added. Capability drops restrict high-level privileges but do not restrict individual syscalls. The seccomp security option applies a custom seccomp profile that restricts the set of permitted syscalls — pair this with cap_drop: ALL for defense-in-depth. Without a seccomp profile, NET_RAW plus Docker's default permissive syscall filter still permits approximately 300+ syscalls, including primitives relevant to container escape on unpatched kernels. The user: "65534:65534" directive ensures the container process runs as the nobody user rather than root; even with no-new-privileges, running as root inside the container grants root-level write access to writable layers and maximizes the impact of any container escape. The internal: true network flag prevents the container from reaching the public internet, but it does not prevent containers on the same bridge network from communicating with each other. For lateral isolation between containers, use separate Docker networks or per-container network policies. The restart: on-failure:3 policy ensures a crashed executor restarts up to three times rather than silently disappearing. Resource limits cap CPU and memory to prevent denial-of-service against the host; these require Docker Compose CLI v2+ to take effect.

Without container isolation, a prompt-injected agent running as host root can read /etc/shadow and pivot to adjacent services.

Pattern 2: Scoped Tool Permissions and Least-Privilege Orchestration

Pentagi's sub-agent architecture enforces a strict principle: each agent role sees only the tools it needs. The searcher agent can query the web and read files. The coder agent can write and execute scripts. The pentester agent can run exploitation tools. None of them share a flat namespace where any agent can invoke any function.

This matters because LLM function calling is essentially a privilege boundary. When a model receives a list of available tools, it will use whatever is available to accomplish its goal. A flat tool namespace means a coding agent can accidentally (or through prompt injection) invoke execute_shell with destructive commands. Scoping tool visibility per role eliminates this class of failure entirely.

Defining Permission Boundaries per Agent Role

// File: internal/agentperm/permissions.go
package agentperm

import (
	"log/slog"
	"sync"
)

// registry maps each role to a set of allowed tools.
// Initialized once in init() and protected by mu for safe concurrent reads.
// Uses map[string]struct{} for O(1) tool lookups with minimal memory.
var (
	mu       sync.RWMutex
	registry map[string]map[string]struct{}
)

func init() {
	registry = map[string]map[string]struct{}{
		"searcher":  toSet("web_search", "read_file", "dns_lookup"),
		"coder":     toSet("write_file", "execute_script", "read_file"),
		"pentester": toSet("nmap_scan", "run_exploit", "read_file"),
	}
}

func toSet(tools ...string) map[string]struct{} {
	s := make(map[string]struct{}, len(tools))
	for _, t := range tools {
		s[t] = struct{}{}
	}
	return s
}

// GetToolsForRole returns a snapshot slice of allowed tools for the given role.
// Safe for concurrent use. Unknown roles return nil (fail-closed).
func GetToolsForRole(role string) []string {
	mu.RLock()
	defer mu.RUnlock()
	toolSet, ok := registry[role]
	if !ok {
		slog.Warn("unknown role requested tools; returning nil (fail-closed)",
			"role", role)
		return nil
	}
	out := make([]string, 0, len(toolSet))
	for t := range toolSet {
		out = append(out, t)
	}
	return out
}

// IsToolAllowed checks whether a specific tool is permitted for the given role.
// O(1) per lookup after map initialization. Safe for concurrent use.
func IsToolAllowed(role, tool string) bool {
	mu.RLock()
	defer mu.RUnlock()
	toolSet, ok := registry[role]
	if !ok {
		return false
	}
	_, allowed := toolSet[tool]
	return allowed
}

// GetAllRoles returns a list of all configured roles for operational introspection.
func GetAllRoles() []string {
	mu.RLock()
	defer mu.RUnlock()
	roles := make([]string, 0, len(registry))
	for r := range registry {
		roles = append(roles, r)
	}
	return roles
}

This is a partial implementation; integrate it into your package structure and module (e.g., yourmodule/internal/agentperm). The critical detail: unknown roles return nil, meaning zero tools. This fail-closed default prevents a misconfigured or newly added agent role from silently inheriting broad access. The structured warning log ensures operations teams have visibility when an unrecognized role is encountered. A sync.RWMutex protects the registry so that concurrent goroutines can safely read tool permissions without data races. Every tool invocation should pass through IsToolAllowed before execution, and any violation should be logged and blocked.

Pattern 3: Human-in-the-Loop Gates and Approval Workflows

Pentagi's React frontend includes a flow screen and monitoring interface that lets operators observe agent actions in real time and intervene when necessary. This isn't just a convenience feature. For irreversible actions (deleting resources, escalating privileges, making external network calls), human approval should be a hard gate rather than an optional review.

The design challenge is preventing these gates from bottlenecking agent throughput. Async approval workflows solve this: the agent queues the high-risk action, continues working on non-blocked tasks, and resumes the gated task only after approval arrives.

Adding an Approval Gate to an Agent Pipeline

# File: agent/approval_gate.py
import asyncio
import json
import logging
import os
from typing import Awaitable, Callable

HIGH_RISK_TOOLS: frozenset[str] = frozenset({
    "execute_shell",
    "run_exploit",
    "delete_resource",
    "escalate_privileges",
})

_DEFAULT_TIMEOUT = int(os.getenv("APPROVAL_TIMEOUT_SECONDS", "300"))


async def approval_gate(
    agent_id: str,
    tool_name: str,
    parameters: dict,
    log_approval_request: Callable[[str, str, dict], Awaitable[str]],
    check_approval_status: Callable[[str], Awaitable[str]],
    timeout_seconds: int = _DEFAULT_TIMEOUT,
) -> bool:
    """
    Returns True if execution is approved, False if denied or timed out.

    Callers must provide two async callables:
        log_approval_request(agent_id, tool_name, parameters) -> request_id
            Persist the approval request and return a unique request ID.
        check_approval_status(request_id) -> 'approved' | 'denied' | 'pending'
            Query the current decision status for a request.

    Non-high-risk tools are auto-approved immediately.
    Timeout always returns False (fail-closed) — no exception is raised.
    """
    if tool_name not in HIGH_RISK_TOOLS:
        return True

    request_id = await log_approval_request(agent_id, tool_name, parameters)
    safe_params = json.dumps(parameters)  # Prevent log injection from parameter values
    logging.warning(
        "Approval required",
        extra={
            "agent_id": agent_id,
            "tool_name": tool_name,
            "request_id": request_id,
            "parameters": safe_params,
        },
    )

    # Poll for human decision (webhook or CLI-based)
    loop = asyncio.get_event_loop()
    deadline = loop.time() + timeout_seconds
    while loop.time() < deadline:
        decision = await check_approval_status(request_id)
        if decision == "approved":
            logging.info("Request approved", extra={"request_id": request_id})
            return True
        if decision == "denied":
            logging.info("Request denied", extra={"request_id": request_id})
            return False
        await asyncio.sleep(2)  # Cooperative yield; does not block the event loop

    logging.error(
        "Approval request timed out — denying (fail-closed)",
        extra={"request_id": request_id, "timeout_seconds": timeout_seconds},
    )
    return False  # Fail-closed: deny on timeout, never raise

This middleware intercepts any tool call in HIGH_RISK_TOOLS, logs the request with serialized parameters for audit, and blocks execution until a human responds or the timeout expires. The log_approval_request and check_approval_status callables are injected as parameters so that callers can provide their own persistence and query backends (database, message queue, REST API, etc.). The function is async so it yields cooperatively during polling — it will not block the event loop in asyncio-based agent frameworks. The timeout always returns False rather than raising an exception, ensuring the gate is truly fail-closed: any caller using if await approval_gate(...): will correctly deny execution on timeout without needing exception handling. The timeout default is configurable via the APPROVAL_TIMEOUT_SECONDS environment variable for deployment and testing flexibility. The polling approach shown here is suitable for prototyping; production systems would replace it with a webhook callback or message queue subscription.

Pattern 4: Audit Logging and Observability for Agent Actions

Pentagi logs every LLM prompt, tool invocation, and container action. This level of observability is essential because traditional logging doesn't account for agentic systems. An agent's behavior emerges from a chain of LLM reasoning steps, tool calls, and intermediate results. Without chain-of-thought traceability, you cannot reconstruct which LLM reasoning step triggered a destructive action.

Structured Logging Schema for Agent Actions

{
    "timestamp": "2025-01-15T14:32:01Z",
    "agent_id": "pentester-01",
    "session_id": "sess-abc123",
    "tool_called": "nmap_scan",
    "parameters": {"target": "10.0.1.5", "flags": "-sV -p 1-1000"},
    "result_summary": "3 open ports detected",
    "risk_score": 0.4,
    "approval_status": "auto-approved",
    "execution_time_ms": 12340,
    "container_id": "7f3a9b2c1d4e..."
}

Every field serves a purpose. session_id links actions across a multi-step task. risk_score is a float in the range [0.0, 1.0] that enables automated alerting thresholds — define a computation method appropriate to your threat model (e.g., based on tool category, target sensitivity, and parameter analysis) and validate that the value falls within range before logging. Configure alert thresholds accordingly (e.g., values above 0.7 trigger human review). container_id allows correlation with Docker-level logs. Enforce the schema at the middleware layer so that no agent action bypasses structured capture.

Without chain-of-thought traceability, you cannot reconstruct which LLM reasoning step triggered a destructive action.

Putting It All Together: A Security Checklist for Autonomous Agents

PatternThreat MitigatedImplementation Priority*Pentagi Reference
Container sandboxingHost compromise, lateral movementCriticalDocker-based executor with dropped capabilities
Scoped tool permissionsPrivilege escalation, prompt injectionCriticalRole-specific sub-agents with isolated tool sets
Human-in-the-loop gatesIrreversible destructive actionsHighFlow screen with operator intervention
Structured audit loggingUndetected misbehavior, forensic gapsHighFull prompt and action logging

*Critical = must exist before any production deployment. High = must exist before handling sensitive data or external networks.

These patterns follow OWASP LLM Top 10 guidance (particularly LLM09: Excessive Agency, per the 2025 edition), the NIST AI Risk Management Framework (specifically the Govern and Map functions outlined in NIST AI 100-1, 2023) and its emphasis on governable AI systems, and Anthropic's published recommendations for constraining agent autonomy through layered controls.

Security as a First-Class Architectural Concern

Sandboxing, permission scoping, approval gates, and audit logging are all architectural decisions that constrain an agent's blast radius regardless of what the LLM decides to do at runtime. Pentagi demonstrates that even an agent purpose-built for offensive security operations can be contained when you build these patterns in from the start. Start by auditing your agent's Docker Compose file against the configuration in Pattern 1, then verify your tool registry enforces fail-closed defaults per Pattern 2. The question isn't whether an agent will attempt something unexpected. It's whether the architecture limits the damage when it does.