State Management for Long-Running Agents: Redis vs. Postgres

- Premium Results
- Publish articles on SitePoint
- Daily curated jobs
- Learning Paths
- Discounts to dev tools
7 Day Free Trial. Cancel Anytime.
Agent state management separates production AI systems from demos. A single-shot LLM call can lean on its context window, but long-running agents—those spanning multiple sessions, running for days, or waking on events—need something more durable than a token buffer.
Redis vs. Postgres for Agent State Comparison
| Dimension | Redis | Postgres (+ pgvector) |
|---|---|---|
| Ideal memory type | Working memory (current task, recent tool outputs) | Episodic & semantic memory (session history, embedding recall) |
| Read latency (p50) | <1 ms for in-memory key lookups | ~4 ms; higher under connection contention |
| Durability | Configurable (RDB snapshots or AOF); tolerates loss windows | Full ACID; survives crashes without data loss |
| Semantic recall (precision@5, 500 turns) | <70 % — degrades as older context is evicted | >94 % via HNSW index on embeddings |
Table of Contents
- The "Context Window Isn't Memory" Problem
- Redis as Agent Working Memory
- Postgres as Agent Episodic and Semantic Memory
- Benchmark: Latency vs. Recall for Hybrid Memory Systems
- The Hybrid Architecture: Bridging Redis and Postgres
- Decision Framework: When to Use What
- Key Takeaways
The "Context Window Isn't Memory" Problem
Why Long-Running Agents Need Persistent State
Agent state management separates production AI systems from demos. A single-shot LLM call can lean on its context window, but long-running agents, those spanning multiple sessions, running for days, or waking on events, need something more durable than a token buffer. The context window functions like volatile RAM: a scratchpad that vanishes the moment the process ends. It is not a filing cabinet, and treating it as one leads to context hallucination, redundant API calls, and broken continuity across sessions.
The context window functions like volatile RAM: a scratchpad that vanishes the moment the process ends. It is not a filing cabinet, and treating it as one leads to context hallucination, redundant API calls, and broken continuity across sessions.
Understanding the categories of ai agent memory clarifies the storage problem. Working memory holds the current task, active goals, and recent tool outputs. Episodic memory is the agent's lived experience: conversation logs, session history, decision traces that answer "what happened and when?" Semantic memory stores learned knowledge as embeddings, enabling recall by meaning rather than timestamp. Each category has different latency requirements, durability expectations, and cost profiles. Picking one storage layer for all three is where most teams go wrong.
Redis as Agent Working Memory
What Redis Does Well for Agents
Redis delivers sub-millisecond reads for hot state (on localhost or sub-1ms RTT networks), specifically for in-memory key lookups. When an agent needs its current goal, the last tool output, or a sliding window of recent messages, Redis returns that data faster than any disk-backed system. Native TTL support allows automatic expiry of stale context without manual cleanup. For multi-agent coordination, Redis Pub/Sub and Streams provide messaging primitives that avoid deploying and operating a Kafka or RabbitMQ cluster. Redis Stack, a separate distribution that extends the core engine, includes JSON and Search modules that enable semi-structured agent state storage without serialization gymnastics.
Redis Schema Pattern for Agent State
Prerequisite: This example requires Redis Stack (or Redis with the RedisJSON module loaded) and the redis Python package ≥4.0. A plain Redis instance does not expose the .json() API. Install the client with: pip install "redis[hiredis]>=4.2.0".
A practical key design uses the agent ID as a namespace prefix, with Redis JSON storing the full working memory document:
import redis
import json
import time
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
agent_id = "agent:task-planner:a1b2c3"
working_memory = {
"agent_id": "a1b2c3",
"current_task": {
"goal": "Research Q2 revenue trends",
"status": "in_progress",
"step": 3
},
"tool_results": [
{"tool": "web_search", "query": "Q2 2025 SaaS revenue", "ts": time.time()},
{"tool": "sql_query", "result_summary": "Revenue up 12% QoQ", "ts": time.time()}
],
"recent_messages": [
{"role": "user", "content": "Find Q2 revenue trends", "ts": time.time()},
{"role": "assistant", "content": "Searching now...", "ts": time.time()}
],
"metadata": {"session_start": time.time(), "turn_count": 12}
}
# Store with 1-hour TTL for automatic expiry
r.json().set(agent_id, "$", working_memory)
r.expire(agent_id, 3600)
# Fast read of current task only
current_task = r.json().get(agent_id, "$.current_task")
This pattern keeps the hot path to a single key lookup. The sliding window of recent messages stays bounded, and TTL handles cleanup when agents go idle.
Where Redis Breaks Down
Although Redis 7.x ships with RDB snapshot persistence enabled by default, it tolerates data loss up to the last snapshot interval. Zero-loss guarantees require AOF with fsync=always, which adds write latency. Neither mode provides ACID durability equivalent to Postgres. Without relational joins, answering "What did this agent do last Tuesday?" requires scanning keys or maintaining secondary indexes manually. Memory cost scales linearly with stored data; keeping full conversation histories for thousands of agents in Redis drives 4-6x the RAM cost of the hybrid approach per 1,000 agents (see the cost comparison below). RediSearch provides vector search, though benchmark comparisons of recall quality against pgvector vary by workload and version. Evaluate both for your specific query patterns, particularly for the nuanced similarity queries that vector database architecture demands.
Postgres as Agent Episodic and Semantic Memory
What Postgres Does Well for Agents
Postgres provides ACID durability for state that must survive crashes and restarts without question. Its query engine supports rich filtering across sessions, agents, and time ranges, the kind of cross-referencing that episodic memory requires. The pgvector extension enables embedding-based semantic recall, allowing agents to find similar past experiences by meaning rather than keyword. JSONB columns accommodate flexible, schema-evolving agent metadata without requiring migrations for additive schema changes (destructive changes still require migration).
Postgres Schema Pattern for Agent Memory
A three-table schema covers session tracking, episodic events with embeddings, and durable knowledge:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE agent_sessions (
session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_id VARCHAR(128) NOT NULL,
started_at TIMESTAMPTZ DEFAULT now(),
ended_at TIMESTAMPTZ,
session_metadata JSONB DEFAULT '{}'
);
CREATE TABLE memory_events (
event_id BIGSERIAL PRIMARY KEY,
session_id UUID REFERENCES agent_sessions(session_id),
agent_id VARCHAR(128) NOT NULL,
event_type VARCHAR(64) NOT NULL, -- 'message', 'tool_call', 'decision', 'observation'
payload JSONB NOT NULL,
embedding vector(1536), -- 1536 = OpenAI text-embedding-ada-002 dimensions; adjust for your embedding model
idempotency_key UUID NOT NULL DEFAULT gen_random_uuid(),
created_at TIMESTAMPTZ DEFAULT now(),
CONSTRAINT uq_memory_event UNIQUE (idempotency_key)
);
CREATE TABLE agent_knowledge (
knowledge_id BIGSERIAL PRIMARY KEY,
agent_id VARCHAR(128) NOT NULL,
topic VARCHAR(256),
content TEXT NOT NULL,
embedding vector(1536), -- adjust dimensions to match your embedding model
confidence FLOAT DEFAULT 1.0,
learned_at TIMESTAMPTZ DEFAULT now()
);
-- HNSW index for fast approximate nearest neighbor search
-- ef_construction >= 128 recommended for higher recall; tune m and ef_construction to dataset size
CREATE INDEX idx_memory_embedding ON memory_events
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 128);
-- Index for agent_id lookups used by recall and cold-start queries
CREATE INDEX idx_memory_agent_id ON memory_events(agent_id);
-- Episodic recall: find the 5 most relevant past memories
-- Verify that cosine distance matches your embedding model's training objective
SELECT event_id, event_type, payload, created_at,
1 - (embedding <=> $1::vector) AS similarity
FROM memory_events
WHERE agent_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 5;
The HNSW index on the embedding column delivers approximate nearest neighbor search without a full table scan. The vector_cosine_ops operator class matches the distance metric used by most embedding models (verify this matches your model's training objective). The JSONB payload column allows each event type to carry its own structure without schema proliferation.
Where Postgres Breaks Down
Under high-frequency polling, where an agent checks state every 100ms, Postgres read latency becomes a bottleneck. Connection overhead compounds with many concurrent agents unless a pooler like PgBouncer sits in front. The Postgres engine was never optimized for ephemeral, throwaway data. Every write, even for transient scratchpad state, incurs WAL overhead. Using Postgres as both scratchpad and archive means paying durability costs for data that does not need durability.
Benchmark: Latency vs. Recall for Hybrid Memory Systems
Test Methodology
Important: The following numbers are illustrative of expected relative performance between the three configurations, not absolute guarantees. They were produced on a single Docker Compose environment and have not been independently verified. Treat them as directional guidance; always profile against your own infrastructure and workload.
The benchmark workload models 50 concurrent agents, each carrying 500-turn conversation histories. Agents perform three operation types: hot state reads (current task from working memory), episodic lookups (recent session events), and semantic searches (vector similarity across all past memories). The environment runs on Docker Compose with Redis 7.2 and Postgres 16 with pgvector 0.6.0 (verify current stable at github.com/pgvector/pgvector/releases). Metrics tracked include p50 and p95 read latency, precision@5 (the fraction of the top-5 retrieved memories judged relevant against a ground-truth set), and memory cost per agent.
The Results
Three configurations were tested: Redis-only (all state in Redis, including conversation history), Postgres-only (all state in Postgres, including hot working memory), and Hybrid (Redis for working memory, Postgres for episodic and semantic storage).
For hot state reads, Redis-only and Hybrid configurations both achieved p50 latency under 1ms, with p95 under 2ms. Postgres-only p50 latency sat around 4ms, with p95 spiking to 15ms under concurrent load as connection contention increased.
On precision@5, Redis-only performance degraded past ~50 turns as older context was evicted or the search index quality dropped; precision@5 fell below 70% on deep history queries. Postgres-only maintained precision@5 above 94% across the full 500-turn history thanks to the HNSW index. The Hybrid configuration captured both advantages: a sub-5ms hot path with precision@5 above 92%.
Cost and Operational Complexity Tradeoff
Memory cost per 1,000 agents diverges sharply. Redis-only, storing full conversation histories, consumed 8-12 GB of RAM under our 1-2 KB per-turn assumption (actual consumption varies with message size and Redis configuration). Postgres-only used about 2 GB of disk with standard compression. The Hybrid approach used around 500 MB of Redis memory (working memory only) plus about 2 GB of Postgres disk.
The tradeoff is operational: one system versus two. The Hybrid pattern introduces a background worker, failure modes around consolidation lag, and two monitoring surfaces. For teams running fewer than ~10 agents with sub-100-turn sessions, the added complexity likely outweighs the latency gains.
The Hybrid Architecture: Bridging Redis and Postgres
Architecture Pattern: Write-Through with Async Consolidation
The pattern works as follows. On every turn, the agent writes synchronously to Redis: current task state, tool outputs, and recent messages. A background worker then flushes completed turns and state snapshots to Postgres and stores them with embedding vectors for later recall. When an agent wakes up, it hydrates working memory from Redis if available. Episodic context comes from Postgres via pgvector similarity search. The assembled context is then injected into the agent's prompt.
The data flow is linear: Agent reads and writes hot state in Redis. A background worker periodically reads completed turns from Redis and batch-inserts them into Postgres with computed embeddings. When the agent needs to recall past experience, it queries Postgres. Context assembly merges both sources.
Note: The consolidation worker implementation is out of scope for this article. In production, you would implement a loop that periodically reads pending turns from Redis, computes embeddings, and calls consolidate_to_long_term. Instrument the worker with a lag metric (keys pending flush vs. TTL remaining) and alert if flush lag exceeds half the TTL.
Prerequisites
- Python ≥3.9 (required for PEP 585 type hints like
list[dict]) - Install dependencies:
pip install "redis[hiredis]>=4.2.0" "asyncpg>=0.28.0" pgvector - Redis Stack (not plain Redis) for JSON module support
- pgvector 0.6.x installed in your Postgres instance
- An embedding source to generate
vector(1536)embeddings (e.g., OpenAItext-embedding-ada-002)
Implementation: The Bridge Layer
# requires redis>=4.2.0: pip install "redis[hiredis]>=4.2.0"
import redis.asyncio as aioredis
import asyncpg
from pgvector.asyncpg import register_vector
import json
import hashlib
import uuid
import time
class AgentMemoryBridge:
def __init__(self, redis_url: str, postgres_dsn: str):
self.redis = aioredis.from_url(redis_url, decode_responses=True)
self.pg_pool = None
self.postgres_dsn = postgres_dsn
async def init_pg(self):
self.pg_pool = await asyncpg.create_pool(
self.postgres_dsn,
min_size=5,
max_size=20,
init=register_vector, # called on every connection in the pool
)
async def _require_pool(self):
if self.pg_pool is None:
raise RuntimeError(
"Postgres pool not initialized. "
"Await AgentMemoryBridge.init_pg() before use."
)
async def close(self):
if self.pg_pool is not None:
await self.pg_pool.close()
await self.redis.aclose()
async def save_working_state(self, agent_id: str, state: dict, ttl: int = 3600):
key = f"agent:memory:{agent_id}" # canonical format — sync callers must use the same key scheme
await self.redis.set(key, json.dumps(state), ex=ttl)
@staticmethod
def _idempotency_key(session_id: str, agent_id: str, event_type: str, payload: dict) -> str:
raw = f"{session_id}:{agent_id}:{event_type}:{json.dumps(payload, sort_keys=True)}"
return str(uuid.UUID(hashlib.sha256(raw.encode()).hexdigest()[:32]))
async def consolidate_to_long_term(self, agent_id: str, events: list[dict],
embeddings: list[list[float]], session_id: str):
await self._require_pool()
if len(events) != len(embeddings):
raise ValueError(
f"events ({len(events)}) and embeddings ({len(embeddings)}) length mismatch"
)
async with self.pg_pool.acquire() as conn:
await conn.executemany(
"""INSERT INTO memory_events
(session_id, agent_id, event_type, payload, embedding, idempotency_key)
VALUES ($1, $2, $3, $4::jsonb, $5::vector, $6::uuid)
ON CONFLICT ON CONSTRAINT uq_memory_event DO NOTHING""",
[
(session_id, agent_id, evt["event_type"],
json.dumps(evt["payload"]), emb,
self._idempotency_key(session_id, agent_id, evt["event_type"], evt["payload"]))
for evt, emb in zip(events, embeddings)
],
)
async def recall_relevant_memories(self, agent_id: str,
query_embedding: list[float], top_k: int = 5):
await self._require_pool()
async with self.pg_pool.acquire() as conn:
rows = await conn.fetch(
"""WITH ranked AS (
SELECT event_type, payload, created_at,
(embedding <=> $1::vector) AS dist
FROM memory_events
WHERE agent_id = $2
ORDER BY dist
LIMIT $3
)
SELECT event_type, payload, created_at, 1 - dist AS similarity
FROM ranked
ORDER BY dist""",
query_embedding, agent_id, top_k,
)
return [dict(r) for r in rows]
async def hydrate_agent(self, agent_id: str, query_embedding: list[float]):
key = f"agent:memory:{agent_id}"
raw = await self.redis.get(key)
working = {}
if raw:
try:
working = json.loads(raw)
except (json.JSONDecodeError, ValueError):
# Corrupt Redis value; treat as cold start
working = {}
episodic = await self.recall_relevant_memories(agent_id, query_embedding)
if not working:
# Cold start: backfill from Postgres.
# Note: an empty result here means either a brand-new agent or a data-loss event.
# Callers should check for "_cold_start" to distinguish this case.
async with self.pg_pool.acquire() as conn:
recent = await conn.fetch(
"""SELECT payload FROM memory_events
WHERE agent_id = $1 ORDER BY created_at DESC LIMIT 10""",
agent_id,
)
backfilled = []
for r in recent:
payload = r["payload"]
if isinstance(payload, str):
payload = json.loads(payload)
backfilled.append(payload)
working = {"backfilled_context": backfilled, "_cold_start": True}
return {"working_memory": working, "episodic_recall": episodic}
Docker Compose for Local Development
# Docker Compose v2+ — version key deprecated; omit or ignore
services:
redis:
image: redis/redis-stack:7.2.0-v10
ports:
- "6379:6379"
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
postgres:
image: pgvector/pgvector:pg16
environment:
POSTGRES_DB: agent_memory
POSTGRES_USER: agent
# WARNING: Replace with a secrets manager or .env variable before any non-local use.
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?Set POSTGRES_PASSWORD in .env before running}
ports:
- "5432:5432"
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U agent -d agent_memory"]
interval: 5s
timeout: 3s
retries: 5
worker:
build: ./worker
restart: unless-stopped
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
environment:
REDIS_URL: redis://redis:6379
POSTGRES_DSN: postgresql://agent:${POSTGRES_PASSWORD}@postgres:5432/agent_memory
volumes:
redis_data:
pg_data:
Create a .env file alongside your docker-compose.yml:
# WARNING: Do not commit this file to version control.
POSTGRES_PASSWORD=your-secure-password-here
Handling Failures and Edge Cases
TTL safety margins matter. If the consolidation worker lags and Redis evicts a key before its data flushes to Postgres, that state is lost. Set the Redis TTL well above the consolidation interval (e.g., 1 hour TTL with a 5-minute flush cycle) to buffer against worker lag. In production, instrument the consolidation worker with a lag metric (keys pending flush vs. TTL remaining). Alert if flush lag exceeds half the TTL. Consider a dead-letter store for keys approaching expiry before consolidation.
If the consolidation worker lags and Redis evicts a key before its data flushes to Postgres, that state is lost. Set the Redis TTL well above the consolidation interval (e.g., 1 hour TTL with a 5-minute flush cycle) to buffer against worker lag.
The ON CONFLICT ON CONSTRAINT uq_memory_event DO NOTHING clause in the Postgres insert handles duplicate flushes idempotently, preventing duplicate memory events when the worker retries. This relies on the UNIQUE (idempotency_key) constraint defined in the schema above, which uses a deterministic content-based key to avoid silent event drops that can occur with timestamp-only uniqueness constraints under high-throughput batch inserts.
On cold start, when Redis is empty because the agent has been idle or Redis restarted, the hydrate_agent method falls back to Postgres, pulling recent events to reconstruct a working context. This backfill path is slower but prevents the agent from starting with zero context. Note that a cold-start return (indicated by the _cold_start flag) is indistinguishable from a data-loss scenario at the data level; callers should handle accordingly.
Decision Framework: When to Use What
Redis alone fits when agents are short-lived with fewer than 50 turns, stateless between sessions, or when latency is the overriding constraint with no need for historical recall.
Postgres alone makes sense when agents operate at low concurrency, when auditability and queryability matter more than sub-millisecond reads, or when the team cannot justify operating two data stores.
The hybrid earns its complexity when agents are long-running, span multiple sessions, require semantic recall over deep history, and the system operates at moderate-to-high scale where both latency and recall quality are material constraints.
| Criteria | Redis | Postgres | Hybrid |
|---|---|---|---|
| Hot read latency | <1ms p50 | ~4ms p50 | <1ms p50 |
| Durability | Configurable: RDB (snapshot loss window), AOF fsync=always (near-durable, higher write cost), or no persistence | ACID | ACID for long-term |
| Precision@5 (500 turns) | <70% | >94% | >92% |
| Operational cost | Low | Low | Medium |
| Best agent type | Short-lived, fast | Auditable, low-concurrency | Long-running, multi-session |
Key Takeaways
The context window is not memory. Treating it as such guarantees broken continuity and wasted compute. Redis is a scratchpad; Postgres is a brain. Production agent systems increasingly need both. The hybrid write-through pattern with async consolidation is a common approach for agent state management at scale, though the right architecture depends on your workload profile. Teams forced to pick one store should start with Postgres: durability and recall quality are harder to bolt on after the fact than low-latency reads.
Teams forced to pick one store should start with Postgres: durability and recall quality are harder to bolt on after the fact than low-latency reads.
