Context Engineering: The New Prompt Engineering


- Premium Results
- Publish articles on SitePoint
- Daily curated jobs
- Learning Paths
- Discounts to dev tools
7 Day Free Trial. Cancel Anytime.
Static prompts served developers well when the task was single-turn and single-document. But the moment AI agents need to operate across files, ingest documentation, reason about recent code changes, and execute multi-step plans, prompt engineering as traditionally practiced collapses under its own weight. This tutorial walks through building a Node.js context engine that intelligently assembles files, documentation, and git diffs for an AI agent, with a React frontend that makes the assembled context transparent and debuggable.
Table of Contents
- Why Prompt Engineering Hit a Wall
- Context Engineering vs. Prompt Engineering: What Actually Changed
- Anatomy of a Context Payload
- Building a Context Engine in Node.js
- Integrating Context with an AI Agent (React + Node.js)
- Lessons from the Agent-Skills-for-Context-Engineering Repo
- Implementation Checklist: Your Context Engineering Playbook
- Common Pitfalls and How to Avoid Them
- What Matters Now: Assembling Context, Not Crafting Prompts
Why Prompt Engineering Hit a Wall
Static prompts served developers well when the task was single-turn and single-document: classify this text, summarize that paragraph, generate a React component from a description. But the moment AI agents need to operate across files, ingest documentation, reason about recent code changes, and execute multi-step plans, prompt engineering as traditionally practiced collapses under its own weight. Token windows have grown from 4K to 128K tokens or more, yet larger windows do not solve the fundamental problem. Dumping an entire codebase into a prompt introduces noise that degrades accuracy, wastes tokens on irrelevant content, and makes agent behavior unpredictable.
This is where context engineering enters the picture. Context engineering is the practice of dynamically curating, filtering, ranking, and assembling the right information at the right time for an AI agent. Rather than handcrafting a static prompt template, developers build systems that programmatically decide which files, docs, git diffs, and runtime state belong in a given context window, and which do not. Community repositories such as Agent-Skills-for-Context-Engineering on GitHub have helped codify patterns for skill-based context routing that treat context assembly as a first-class engineering concern.
This tutorial walks through building a Node.js context engine that intelligently assembles files, documentation, and git diffs for an AI agent, with a React frontend that makes the assembled context transparent and debuggable.
Context Engineering vs. Prompt Engineering: What Actually Changed
The Prompt Engineering Mental Model
Prompt engineering centers on crafting static templates: system prompts with persona instructions, few-shot examples to steer output format, chain-of-thought directives to improve reasoning. These techniques work well for single-turn, narrow tasks. A classification prompt with five examples and a clear instruction can achieve strong results. A chain-of-thought prompt can help a model reason through a math problem.
But the model breaks down when agents operate across multiple files, invoke tools, and execute multi-step plans. A static template cannot anticipate which of 200 source files matter for the current task. Few-shot examples cannot encode the state of a git repository, and chain-of-thought instructions add nothing when the agent lacks the relevant code to reason about in the first place.
The Context Engineering Mental Model
Context engineering treats the prompt not as a string to be written but as a payload to be assembled. The context is dynamic, constructed at runtime based on the current task, the state of the codebase, and the available knowledge base. Four operational pillars define the discipline: retrieval (finding relevant information), filtering (excluding noise), ranking (ordering by relevance), and compression (fitting within token limits). These pillars describe the operations the engine performs; the payload itself is organized into layers (instruction, knowledge, state, task) described in the next section. Give a perfectly prompted agent irrelevant context, and it will produce irrelevant output. The context window constrains the ceiling on agent performance more than the prompt wording does.
Give a perfectly prompted agent irrelevant context, and it will produce irrelevant output. The context window constrains the ceiling on agent performance more than the prompt wording does.
When to Use Which
Use static prompts for single LLM calls where the input is known at design time: UI copy generation, text classification, one-shot code generation. Switch to context engineering once the call requires runtime data beyond the user's input, such as repository state, documentation lookups, or multi-step tool use. The distinction applies most directly to agentic workflows where the set of relevant information is not known until execution begins.
Anatomy of a Context Payload
The Four Layers of Agent Context
Separate the context payload into four layers. The instruction layer carries the system prompt, persona definition, and behavioral constraints. The knowledge layer includes documentation, READMEs, API references, and any static information the agent may need. The state layer contains git diffs, file trees, runtime variables, and conversation history, the dynamic information that changes between invocations. The task layer holds the current user request and any decomposed sub-tasks.
Token Budgeting
Each layer competes for space within the token limit. Allocating token budgets per layer prevents any single category from monopolizing the window. The state layer is where most developers waste tokens, including entire file contents when only a diff or a function signature would suffice. A disciplined budget forces prioritization.
/**
* Context payload schema with token budget annotations.
* This is a documentation schema illustrating the structural contract
* for the context engine — not runtime-validated code.
*
* This tutorial uses a 4,000-token budget for simplicity. Production
* deployments should scale maxTokens to match the target model's
* context window (e.g., 128K for GPT-4o).
*/
const ContextPayload = {
instruction: {
content: '', // System prompt, persona, constraints
maxTokens: 600, // ~15% of a 4K budget
},
knowledge: {
items: [], // { path, content, tokens } — docs, READMEs, API refs
maxTokens: 1000, // ~25% of budget
},
state: {
diffs: [], // { file, hunks, tokens } — git diffs
files: [], // { path, content, tokens } — current file contents
variables: {}, // Runtime state, conversation history
maxTokens: 1600, // ~40% of budget
},
task: {
userRequest: '', // Original user prompt
subTasks: [], // Decomposed steps
maxTokens: 800, // ~20% of budget
},
};
The percentages above (15/25/40/20) serve as an illustrative starting allocation, not prescriptive guidance. The state layer receives the largest share because code agents depend most heavily on current repository state. These ratios should be adjusted based on task type, a pattern explored later in the skill-based routing discussion.
Building a Context Engine in Node.js
Prerequisites
Before starting, ensure the following:
- Node.js 18 LTS or later. Verify with
node --version. ThetiktokenWASM bindings andglob@10require Node.js ≥16; Node.js 18 LTS is recommended. - Git installed and accessible in your
PATH. OPENAI_API_KEYset in your environment. Export it directly (export OPENAI_API_KEY=sk-...) or load it from a.envfile via thedotenvpackage before starting the server.- A git repository with at least 3 commits (for the default diff range). Newly initialized or shallow-cloned repos need the commit-count guard shown in
diffs.js. - A
docs/directory containing at least one.mdfile (for the knowledge layer glob). Adjust the glob pattern if your project uses a different structure. - A
src/directory with JS/TS files (for the defaultfilePatterns).
Project Setup and Dependencies
The context engine relies on five packages: express for the HTTP server, and simple-git, glob, tiktoken, and openai for core context engineering operations.
{
"name": "context-engine",
"version": "1.0.0",
"type": "module",
"dependencies": {
"cors": "^2.8.5",
"express": "^4.18.2",
"glob": "^10.3.10",
"openai": "^4.52.0",
"simple-git": "^3.25.0",
"tiktoken": "^1.0.15"
},
"scripts": {
"start": "node src/server.js"
}
}
The src/ directory follows a simple structure: src/encoder.js for the shared tiktoken singleton, src/harvest.js for file harvesting, src/diffs.js for git diff extraction, src/assemble.js for context assembly, and src/server.js for the Express endpoint.
Shared Encoder Singleton
All modules that perform token counting share a single tiktoken encoder instance. This avoids allocating multiple WASM heaps and registering multiple process.on('exit') handlers.
// src/encoder.js
import { get_encoding } from 'tiktoken';
const encoder = get_encoding('cl100k_base');
// Register exactly once for the entire process
process.on('exit', () => encoder.free());
/**
* Counts tokens in a string, properly freeing the WASM-allocated
* Uint32Array returned by encoder.encode().
*/
export function countTokens(text) {
const arr = encoder.encode(text);
const length = arr.length;
arr.free();
return length;
}
export { encoder };
Harvesting File Context with Glob Patterns
Rather than including every file in the repository, the context engine selects files based on glob patterns scoped to the current task. A refactoring task targeting authentication middleware needs src/middleware/auth.* and related test files, not the entire component library.
// src/harvest.js
import { glob } from 'glob';
import { readFile } from 'fs/promises';
import path from 'path';
import { countTokens } from './encoder.js';
/**
* Harvests file contents matching glob patterns.
* Returns an array of { path, content, tokens } objects.
*/
export async function harvestFiles(patterns, options = {}) {
const { maxFileTokens = 500, cwd = process.cwd() } = options;
const paths = await glob(patterns, { cwd, nodir: true, absolute: true });
const results = [];
for (const filePath of paths) {
let content;
try {
content = await readFile(filePath, 'utf-8');
} catch (err) {
// Skip files that cannot be read (permissions, broken symlinks, etc.)
continue;
}
const tokens = countTokens(content);
// Skip files exceeding per-file token cap to prevent any
// single file from dominating the context budget
if (tokens <= maxFileTokens) {
results.push({
path: path.relative(cwd, filePath),
content,
tokens,
});
}
}
return results;
}
The maxFileTokens parameter acts as a per-file cap. The engine excludes files exceeding this limit entirely, a deliberate choice that forces the developer to either raise the cap for known large files or rely on summarization strategies.
Note on tiktoken: The get_encoding('cl100k_base') call returns the encoder used by GPT-4 and GPT-4o. The encoder object allocates WASM memory that must be released via .free() — the shared encoder.js module handles this with a single process.on('exit', ...) handler. The Uint32Array returned by each encoder.encode() call must also be freed after use to prevent WASM heap leaks; the countTokens helper handles this automatically.
Extracting Git Diffs as State Context
Git diffs are the highest-signal context for most code agent tasks. They capture what has changed recently, what is staged for commit, and what the developer is actively working on. For code review, refactoring, and bug fixing, this signal outweighs static file contents.
Git diffs are the highest-signal context for most code agent tasks. They capture what has changed recently, what is staged for commit, and what the developer is actively working on.
// src/diffs.js
import simpleGit from 'simple-git';
import { encoder, countTokens } from './encoder.js';
/**
* Extracts staged diffs and recent commit diffs.
* Returns structured objects with file, hunks, and token count.
*
* Note: The same file may appear in both staged and recent results.
* Downstream consumers should deduplicate by file name, preferring
* the staged version.
*/
export async function getRelevantDiffs(options = {}) {
const { cwd = process.cwd(), commitRange } = options;
const git = simpleGit(cwd, { timeout: { block: 5000 } });
// Guard against repos with fewer than 3 commits
let effectiveRange = commitRange;
if (!effectiveRange) {
const countRaw = await git.raw(['rev-list', '--count', 'HEAD']).catch(() => '0');
const count = parseInt(countRaw.trim(), 10);
if (count >= 3) {
effectiveRange = 'HEAD~3..HEAD';
} else if (count >= 1) {
effectiveRange = 'HEAD~1..HEAD';
} else {
effectiveRange = null;
}
}
const staged = await git.diff(['--staged']);
const recent = effectiveRange ? await git.diff([effectiveRange]) : '';
function parseDiffOutput(raw, source) {
if (!raw) return [];
return raw.split('diff --git').filter(Boolean).map((chunk) => {
// Greedy match handles spaces in filenames; trim handles trailing whitespace
const fileMatch = chunk.match(/a\/(.+) b\//);
const file = fileMatch ? fileMatch[1].trim() : 'unknown';
const tokens = countTokens(chunk);
return { file, hunks: chunk.trim(), tokens, source };
});
}
return [
...parseDiffOutput(staged, 'staged'),
...parseDiffOutput(recent, 'recent'),
];
}
The function distinguishes between staged and recent diffs via the source field. Weight staged changes higher than recent commit history during ranking: they signal immediate intent (as a heuristic, since accidentally staged files are an exception).
Important: getRelevantDiffs defaults cwd to process.cwd(), meaning it runs git operations against whatever directory the server process is started from. Always start the server from the root of the target repository, or pass cwd explicitly.
Assembling and Compressing the Payload
The assembler combines all four layers, scores each item by relevance, and fills the token budget using a greedy approach. The assembler tracks items that do not fit in an exclusion manifest, which is critical for debugging and the transparency UI built later.
// src/assemble.js
import { countTokens } from './encoder.js';
/**
* Scores a context item against task keywords.
* Returns a 0–1 relevance score.
*
* This is a simplified heuristic — it splits the user request on
* whitespace and filters to words longer than 2 characters. For
* production use, consider adding stopword removal or TF-IDF weighting.
*/
function scoreRelevance(item, taskKeywords) {
const text = (item.path || '') + ' ' + (item.content || item.hunks || '');
const lower = text.toLowerCase();
let matches = 0;
for (const kw of taskKeywords) {
if (lower.includes(kw.toLowerCase())) matches++;
}
// Boost staged diffs — they represent active intent
const sourceBoost = item.source === 'staged' ? 0.2 : 0;
return Math.min((matches / Math.max(taskKeywords.length, 1)) + sourceBoost, 1);
}
/**
* Assembles the final context string within a token budget.
* Returns { context, manifest } where manifest tracks included/excluded items.
*/
export function assembleContext({ instruction, knowledge, state, task, maxTokens = 4000 }) {
const taskKeywords = task.userRequest.split(/\s+/).filter((w) => w.length > 2);
const allItems = [
...knowledge.items.map((i) => ({ ...i, layer: 'knowledge' })),
...state.files.map((i) => ({ ...i, layer: 'state-file' })),
...state.diffs.map((i) => ({ ...i, layer: 'state-diff' })),
];
// Score and sort descending by relevance
allItems.forEach((item) => { item.score = scoreRelevance(item, taskKeywords); });
allItems.sort((a, b) => b.score - a.score);
const instructionTokens = countTokens(instruction.content);
const taskTokens = countTokens(task.userRequest);
let remaining = maxTokens - instructionTokens - taskTokens;
if (remaining < 0) {
throw new Error(
`Instruction (${instructionTokens}t) + task (${taskTokens}t) exceed maxTokens (${maxTokens}).`
);
}
const included = [];
const excluded = [];
for (const item of allItems) {
if (item.tokens <= remaining) {
included.push(item);
remaining -= item.tokens;
} else {
excluded.push({ ...item, reason: 'token_budget_exceeded' });
}
}
// Build context body (knowledge + state + task) separately from instruction
const contextBodyParts = [
...included.map((i) => `[${i.layer.toUpperCase()}: ${i.path || i.file}]
${i.content || i.hunks}`),
`[TASK]
${task.userRequest}`,
];
return {
context: contextBodyParts.join('
'),
instruction: instruction.content,
manifest: { included, excluded, totalTokens: maxTokens - remaining },
};
}
The scoring function is intentionally simple: keyword overlap plus a boost for staged diffs. More sophisticated approaches, such as embedding-based similarity, can replace it. Benchmark both for your specific workload before committing to either approach.
Integrating Context with an AI Agent (React + Node.js)
Server-Side Agent Endpoint
The Express endpoint ties the context engine together. It receives a user task, runs the harvesting and assembly pipeline, and streams the OpenAI response back to the client.
Security note: The endpoint below is a tutorial scaffold. Before deploying to any shared or production environment, add request authentication, validate and allowlist filePatterns against permitted base directories, and add rate limiting to control API costs.
// src/server.js
import express from 'express';
import cors from 'cors';
import path from 'path';
import OpenAI from 'openai';
import { harvestFiles } from './harvest.js';
import { getRelevantDiffs } from './diffs.js';
import { assembleContext } from './assemble.js';
const app = express();
app.use(cors({
origin: process.env.ALLOWED_ORIGIN || 'http://localhost:5173',
methods: ['POST'],
}));
app.use(express.json({ limit: '10kb' }));
if (!process.env.OPENAI_API_KEY) {
console.error('ERROR: OPENAI_API_KEY environment variable is not set.');
process.exit(1);
}
const openai = new OpenAI();
const REPO_ROOT = process.cwd();
const ALLOWED_PATTERN = /^[a-zA-Z0-9_\-\/.*{}!,]+$/;
function validatePatterns(patterns) {
if (!Array.isArray(patterns)) return false;
return patterns.every((p) => {
if (!ALLOWED_PATTERN.test(p)) return false;
// Resolve a sample path to confirm it stays within REPO_ROOT
const resolved = path.resolve(REPO_ROOT, p.replace(/[*{}!,]/g, ''));
return resolved.startsWith(REPO_ROOT);
});
}
app.post('/api/agent', async (req, res) => {
const { userRequest, filePatterns = ['src/**/*.{js,ts,jsx,tsx}'] } = req.body;
if (!userRequest || typeof userRequest !== 'string') {
return res.status(400).json({ error: 'userRequest is required and must be a string.' });
}
if (userRequest.length > 2000) {
return res.status(400).json({ error: 'userRequest exceeds maximum length.' });
}
if (!validatePatterns(filePatterns)) {
return res.status(400).json({ error: 'Invalid filePatterns.' });
}
const instruction = { content: 'You are a senior engineer. Refactor code precisely. Explain changes.' };
const knowledge = { items: await harvestFiles(['docs/**/*.md'], { maxFileTokens: 300 }) };
const files = await harvestFiles(filePatterns, { maxFileTokens: 500 });
const diffs = await getRelevantDiffs();
const state = { files, diffs, variables: {} };
const task = { userRequest, subTasks: [] };
const { context, instruction: systemPrompt, manifest } = assembleContext({
instruction, knowledge, state, task, maxTokens: 4000,
});
res.setHeader('Content-Type', 'text/event-stream');
// Send manifest as the first SSE event instead of an HTTP header
// (HTTP headers have size limits — typically 8–16KB — that large
// manifests would exceed, causing 502 errors or silent truncation).
res.write('data: ' + JSON.stringify({ type: 'manifest', payload: manifest }) + '
');
try {
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: context },
],
stream: true,
});
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content || '';
if (text) res.write(`data: ${JSON.stringify({ type: 'chunk', text })}
`);
}
} catch (err) {
console.error('OpenAI stream error:', err.message);
res.write(`data: ${JSON.stringify({ type: 'error', message: 'Stream failed.' })}
`);
} finally {
res.end();
}
});
app.listen(3001, () => console.log('Context engine running on :3001'));
The server sends the manifest as the first SSE event in the response stream, so the frontend can display exactly what the agent received and what was excluded. In production, log this manifest for observability.
React Frontend for Context Visibility
Transparency in what an agent sees is critical for trust and debugging. A simple React component renders the manifest, showing included files, per-layer token usage, and excluded items with their exclusion reasons.
To set up the frontend:
- Scaffold a React app:
npm create vite@latest context-ui -- --template reactandcd context-ui && npm install. - The app connects to the Express server at
http://localhost:3001/api/agentviafetchwith SSE streaming. The server already includes CORS middleware. - Parse each SSE
data:line: events withtype: 'manifest'provide the manifest object; events withtype: 'chunk'provide streamed response text.
import { useState } from 'react';
export function ContextInspector({ manifest }) {
const [showExcluded, setShowExcluded] = useState(false);
if (!manifest) return null;
const layerTokens = manifest.included.reduce((acc, item) => {
acc[item.layer] = (acc[item.layer] || 0) + item.tokens;
return acc;
}, {});
return (
<div className="context-inspector">
<h3>Context Manifest — {manifest.totalTokens} tokens used</h3>
<ul>
{Object.entries(layerTokens).map(([layer, tokens]) => (
<li key={layer}><strong>{layer}:</strong> {tokens} tokens</li>
))}
</ul>
<h4>Included ({manifest.included.length} items)</h4>
<ul>
{manifest.included.map((item) => (
<li key={item.path || item.file || item.layer + item.tokens}>
{item.path || item.file} — {item.tokens}t (score: {item.score.toFixed(2)})
</li>
))}
</ul>
<button onClick={() => setShowExcluded(!showExcluded)}>
{showExcluded ? 'Hide' : 'Show'} Excluded ({manifest.excluded.length})
</button>
{showExcluded && (
<ul>
{manifest.excluded.map((item) => (
<li key={item.path || item.file || item.layer + item.tokens}>
{item.path || item.file} — {item.reason}
</li>
))}
</ul>
)}
</div>
);
}
End-to-End Flow Walkthrough
When a user submits "Refactor the auth middleware to use JWT," the context engine matches glob patterns to select src/middleware/auth.js and related files, pulls recent auth-related diffs from the last three commits, and scores all items against the keywords "refactor," "auth," "middleware," and "JWT." The assembler fills the 4,000-token budget with the highest-scoring items. For a 50-file repo averaging 200 tokens per file, a naive full-repo dump would produce roughly 10,000 tokens. The curated context delivers around 4K tokens of focused, relevant material instead. The response is grounded in actual code, references real file paths, and proposes actionable changes.
Lessons from the Agent-Skills-for-Context-Engineering Repo
Key Patterns Worth Adopting
The Agent-Skills-for-Context-Engineering repository codifies several patterns that reduce hallucinated file paths and improve the accuracy of generated code.
Different task types require fundamentally different context strategies. A documentation task needs README files and API references; a bug fix needs stack traces and recent diffs. Routing context assembly based on task classification lets each task type receive only the context it needs.
When an agent decomposes a task into sub-tasks that share overlapping context, re-reading the same files and re-counting tokens wastes time. Implement this as an in-memory or filesystem cache keyed by file path and last-modified timestamp to avoid redundant work.
Large files benefit from hierarchical summarization: summarize them to fit the budget, while including focal files (those directly referenced by the task) at full resolution. This preserves detail where it matters and keeps the budget under control everywhere else.
Patterns to Approach with Caution
Two patterns from the broader context engineering ecosystem deserve caution.
Many teams assume embedding-based retrieval outperforms simpler methods for code, but for some code-centric tasks, keyword matching and file path matching can be competitive. Benchmark both approaches for your specific workload before committing. Embeddings excel at semantic similarity in natural language but can miss structural relationships in code, such as two files with no shared vocabulary but connected by an import chain or a call hierarchy.
Context window stuffing, including extra files "just in case" the agent might need them, wastes tokens and introduces noise that distracts the model from the actual task.
Implementation Checklist: Your Context Engineering Playbook
- ☐ Define your context payload schema (instruction, knowledge, state, task)
- ☐ Set per-layer token budgets (e.g., 15% instruction, 25% knowledge, 40% state, 20% task; adjust per task type)
- ☐ Implement file harvesting with glob patterns scoped to task type
- ☐ Implement git diff extraction (staged + recent commits)
- ☐ Build a relevance scoring function (keyword match, file proximity, recency)
- ☐ Implement greedy token-budget assembler with manifest output
- ☐ Add a context inspector UI for debugging and transparency
- ☐ Cache context for repeated sub-tasks within a session
- ☐ Log included/excluded items for every agent call (observability)
- ☐ Benchmark: compare agent output with full-repo context vs. curated context
- ☐ Iterate: review excluded items that should have been included (false negatives)
- ☐ Add skill-based routing so different task types trigger different context strategies
Items 1 through 6 form the minimum viable context engine. Items 7 through 12 address the operational concerns that surface once agents handle real workloads.
Common Pitfalls and How to Avoid Them
Pitfall 1: Treating All Files as Equal
Including every matched file with equal priority floods the context window with low-value content. Score each file by relevance to the current task keywords, recency of modification, and proximity to recently changed files. A utility file last modified six months ago should not compete equally with the middleware file the developer just edited.
Pitfall 2: Ignoring Token Counting Until Production
Many developers discover token limits only when the API rejects an oversized request. Count tokens at assembly time using a library like tiktoken that matches the target model's tokenizer. This makes budget allocation deterministic rather than reactive and prevents runtime failures.
Pitfall 3: Static Context Strategies for Dynamic Tasks
Applying the same glob patterns and the same token budgets to every task type produces mediocre results across the board. Route context assembly based on task classification, the skill-based routing pattern from the Agent-Skills-for-Context-Engineering repo, so each task type receives a context strategy optimized for its specific needs.
What Matters Now: Assembling Context, Not Crafting Prompts
Prompt engineering taught developers how to talk to models. Context engineering teaches them what to show models. This distinction directly determines whether an agent produces a useful code refactoring or a hallucinated mess. The techniques covered here, structured payloads, token budgeting, file harvesting, diff extraction, greedy assembly, and manifest-driven transparency, scale from local developer tools to production agent platforms.
Prompt engineering taught developers how to talk to models. Context engineering teaches them what to show models. This distinction directly determines whether an agent produces a useful code refactoring or a hallucinated mess.
Start with the implementation checklist above and build out the Node.js context engine. Measure the difference in agent output between full-repo context dumps and curated context payloads; the improvement becomes obvious quickly. The Agent-Skills-for-Context-Engineering repository was, at time of writing, an actively maintained resource for patterns beyond what this tutorial covers, particularly around advanced skill routing and multi-agent context coordination.