Vibe Coding: The Controversial New Way to Build Software with AI


- Premium Results
- Publish articles on SitePoint
- Daily curated jobs
- Learning Paths
- Discounts to dev tools
7 Day Free Trial. Cancel Anytime.
What Is Vibe Coding?
Vibe coding is an AI-assisted development workflow where the developer describes desired functionality in natural language, an AI model generates the implementation code, and the developer iterates by running the output and feeding errors back as prompts. Unlike traditional autocomplete tools, the AI leads implementation while the human leads intent, review, and architectural decisions.
Vibe coding has become one of the most polarizing terms in software development since Andrej Karpathy coined it in February 2025. The concept is straightforward: describe what you want in natural language, let an AI generate the code, accept the output, run it, and iterate by feel. No line-by-line authorship. No deep reading of every function. Just vibes. For some engineers, this represents a genuine productivity breakthrough. For others, it is a recipe for unmaintainable, insecure software shipped by people who never understood what they built. The truth, as usual, sits somewhere in the middle, and the specifics matter far more than the hot takes.
Table of Contents
- What Is Vibe Coding?
- The Tools Powering Vibe Coding
- Vibe Coding in Practice: A Hands-On Walkthrough
- When Vibe Coding Works and When It Doesn't
- Best Practices for Responsible Vibe Coding
- The Bigger Picture: What Vibe Coding Means for Developers
What Is Vibe Coding?
Origin of the Term
Karpathy, the former head of AI at Tesla and a founding researcher at OpenAI, posted about vibe coding in early February 2025. His description was deliberately informal: he talked about surrendering to the vibes of the AI, accepting code suggestions wholesale, running the result, and pasting errors back into the prompt when things broke. The philosophy is less about carefully crafting software and more about steering an AI toward a working outcome through iterative natural language conversation.
This is fundamentally different from the AI-assisted coding most developers already use. GitHub Copilot's inline suggestions, for instance, operate as sophisticated autocomplete. The developer remains the author. Copilot fills in gaps. Vibe coding flips that relationship. The AI becomes the primary author, and the developer becomes the director, reviewer, and sometimes just the person pressing "accept."
The Spectrum of AI-Assisted Development
It helps to think of AI involvement in coding as a spectrum with roughly four levels.
At the first level, autocomplete, tools like GitHub Copilot predict the next few lines while the developer writes the code. The second level, chat-assisted coding, has the developer asking an AI to explain a function, suggest a refactor, or debug a specific error. The human still drives the architecture and flow.
Vibe coding sits at the third level. Here, the developer writes natural language prompts describing entire features. The AI generates multi-file changes. The developer reviews, runs, and iterates. The fourth level, fully autonomous agents, includes tools like Devin or OpenHands that attempt to complete entire tasks from issue to pull request with minimal human intervention. (Devin's benchmark claims have been subject to independent scrutiny; real-world performance varies significantly.)
Vibe coding is not autocomplete on steroids. It is a workflow where the AI leads implementation and the developer leads intent.
That distinction between levels matters. Vibe coding is not autocomplete on steroids. It is a workflow where the AI leads implementation and the developer leads intent. The human still needs to evaluate correctness, but the mechanics of writing code shift dramatically.
The Tools Powering Vibe Coding
Cursor IDE
Cursor is a VS Code fork rebuilt around native AI integration. It ships three primary interaction surfaces: Tab completion (similar to Copilot), Chat (for asking questions about code), and Composer. Composer is the primary vibe coding surface. It accepts a natural language prompt and generates multi-file edits in a single pass, creating new files, modifying existing ones, and wiring components together.
Cursor supports multiple model backends, including GPT-4o, Claude 3.5 Sonnet, Claude Sonnet 4, and custom model configurations. The choice of model produces noticeably different output. Many developers report that Claude models more consistently follow project import styles and naming conventions in web development tasks, while GPT-4o sometimes generates more concise but less convention-aware code. Evaluate both for your specific use case. Developers can switch models mid-session depending on the task.
Windsurf (formerly Codeium)
Windsurf takes a different architectural approach through what it calls Cascade, an agentic multi-step reasoning flow. Rather than generating a single batch of edits like Cursor's Composer, Cascade breaks a request into sub-tasks, reasons through dependencies, and executes steps sequentially. This makes Windsurf particularly strong at navigating large codebases with complex interdependencies, where understanding the ripple effects of a change matters.
The practical difference: Cursor's Composer tends to be faster for isolated feature generation, while Windsurf's Cascade more reliably catches cross-file import updates and generates related files (like database migrations alongside schema changes) that Composer sometimes misses. Neither tool is strictly superior. The choice depends on project structure and developer preference.
Other Notable Tools
The vibe coding ecosystem extends well beyond IDE-based tools. Replit Agent operates as a full-stack development agent within Replit's browser-based environment. Bolt.new and Lovable target rapid web app generation from prompts, skewing toward front-end-heavy applications. Claude Code CLI brings agentic coding to the terminal, letting developers issue natural language commands that result in file creation and modification; shell command execution is supported but requires user confirmation by default.
Each tool fills a different gap. Replit Agent works well for quick deployable prototypes, and Bolt.new excels at generating front-end scaffolding. Claude Code CLI appeals to developers who prefer terminal workflows and want fine-grained control over which commands execute. All of them share the core vibe coding pattern: natural language in, working code out.
Here is a representative prompt that could be issued to either Cursor Composer or Windsurf Cascade:
Prompt: "Create a REST API endpoint in Express that accepts a JSON
body with a user's name and email, validates both fields, stores
them in a SQLite database, and returns the created user with a 201 status."
Both tools generate complete, runnable endpoint code from this single prompt, though they differ in how they structure middleware, handle database initialization, and organize file output. Cursor tends to produce a more compact single-file result unless instructed otherwise, while Windsurf's Cascade more reliably scaffolds separate modules for database, routes, and validation.
Vibe Coding in Practice: A Hands-On Walkthrough
Prerequisites
This walkthrough assumes the following environment (behavior may differ on other versions):
- Node.js 20 LTS (ESM support requires ≥14.13; 20 LTS recommended)
- Express 4.18
- better-sqlite3 9.x (requires native compilation; on Linux you need
python3,make, andg++; on Windows you need Visual Studio Build Tools) - express-validator 7.x
- Vitest 1.x
- supertest (for HTTP assertions in tests)
- Git (for version control between AI-generated changes)
- Cursor IDE (or Windsurf/alternative; Cursor-specific instructions below assume ≥0.45)
Your package.json must include "type": "module" for ESM imports to work. Without it, all import statements will throw SyntaxError: Cannot use import statement outside a module.
Project Structure
The walkthrough assumes the following directory layout. All import paths in the code snippets below are relative to this structure:
project-root/
├── package.json
├── .cursorrules
├── tasks.db (created automatically by better-sqlite3)
├── src/
│ ├── app.js (Express application setup)
│ ├── db.js (database connection)
│ └── routes/
│ └── tasks.js (task CRUD endpoints)
└── tests/
└── tasks.test.js (Vitest test suite)
Setting Up a Vibe Coding Session
Not every project benefits from vibe coding. The approach works best for well-understood, pattern-heavy applications: CRUD APIs, landing pages, CLI tools, internal dashboards. Projects requiring novel algorithmic work or deep domain-specific logic are poor fits.
Configuration matters. In Cursor, a .cursorrules file at the project root acts as a system prompt that persists across all interactions (note: Cursor 0.45+ introduced .cursor/rules as the preferred mechanism; .cursorrules is supported for backward compatibility). Windsurf uses .windsurfrules for equivalent functionality, and other tools have analogous system prompt mechanisms. This is where developers encode their standards, stack preferences, and constraints before the first real prompt:
You are a senior full-stack developer.
Tech stack: Node.js, Express, SQLite, vanilla HTML/CSS.
Always include input validation with express-validator.
Write tests with Vitest.
Use ESM imports. No TypeScript.
Ensure package.json includes "type": "module".
Always add authentication middleware to routes that modify data.
Never hardcode secrets. Use environment variables for all credentials.
This file does real work. Without it, AI-generated code drifts toward whatever patterns dominated the model's training data, often mixing CommonJS and ESM imports, introducing TypeScript in a JavaScript project, or choosing an unfamiliar testing framework. Establishing context upfront reduces the number of correction cycles downstream. Note the security-oriented rules at the end: without explicit instructions about authentication and secret handling, AI-generated code will silently omit both, and every endpoint produced in the session will inherit that gap.
Building a Task Manager API Step by Step
A typical vibe coding session follows an incremental prompt sequence. For a task manager API, that sequence might look like this:
- "Scaffold the project structure with package.json, entry point, and folder organization."
- "Create a SQLite database connection and a tasks table with id, title, status, and created_at columns."
- "Create CRUD endpoints for tasks."
- "Add input validation and proper error handling to all endpoints."
- "Generate Vitest tests for all task endpoints."
The application entry point wires Express middleware and mounts the router. This file must register express.json() before the router, or req.body will be undefined on every POST request, causing silent null insertions into the database:
// src/app.js
import express from 'express';
import taskRouter from './routes/tasks.js';
const app = express();
app.use(express.json()); // REQUIRED: must precede router mount
app.use('/api', taskRouter);
export default app;
The router imports a shared database connection. The db.js module initializes SQLite with proper error handling, WAL mode for concurrent access, and a busy timeout to avoid SQLITE_BUSY errors under load. The database path is resolved relative to the module file rather than the process working directory, preventing accidental creation of a second database file when the server is started from a different directory:
// src/db.js
import Database from 'better-sqlite3';
import { fileURLToPath } from 'url';
import path from 'path';
const dbPath = path.resolve(
path.dirname(fileURLToPath(import.meta.url)),
'../tasks.db'
);
let db;
try {
db = new Database(dbPath);
db.pragma('journal_mode = WAL');
db.pragma('busy_timeout = 3000');
} catch (err) {
console.error(JSON.stringify({ event: 'db_init_failed', error: err.message }));
process.exit(1);
}
// Ensure the database is closed cleanly on shutdown
process.on('exit', () => db.close());
process.on('SIGTERM', () => process.exit(0));
export default db;
Without the try/catch, a missing native binding or unwritable file path causes an opaque crash at import time with no log context. Without WAL mode and busy_timeout, even two near-simultaneous write requests can trigger SQLITE_BUSY exceptions that surface as unhandled 500 errors.
Here is what Prompt 3 typically produces in Cursor with the .cursorrules file above (shown as a partial excerpt; the full generation includes GET, PUT, and DELETE endpoints following the same pattern). The version below includes corrections that a responsible review pass would catch: prepared statements hoisted to module scope for performance, a length cap on title to prevent storage-amplification attacks, extracted validation arrays for readability, structured error logging, and BigInt-safe serialization of the inserted row ID:
// src/routes/tasks.js
// AI-generated from Prompt 3, reviewed and corrected
import { Router } from 'express';
import { body, validationResult } from 'express-validator';
import db from '../db.js';
const router = Router();
const VALID_STATUSES = ['todo', 'in-progress', 'done'];
// Prepared statement hoisted to module scope: compiled once, reused per request.
// better-sqlite3 docs explicitly recommend this over preparing inside handlers.
const insertTask = db.prepare(
'INSERT INTO tasks (title, status) VALUES (?, ?)'
);
const taskValidators = [
body('title')
.notEmpty()
.trim() // sanitises in-place before handler runs
.isLength({ max: 255 }), // prevent storage-amplification DoS
body('status').isIn(VALID_STATUSES),
];
router.post('/tasks', taskValidators, (req, res) => {
const errors = validationResult(req);
if (!errors.isEmpty()) {
console.error(JSON.stringify({ event: 'validation_failed', path: req.path, errors: errors.array() }));
return res.status(400).json({ errors: errors.array() });
}
const { title, status } = req.body;
const result = insertTask.run(title, status);
res.status(201).json({
id: Number(result.lastInsertRowid), // avoid BigInt serialisation TypeError
title,
status,
});
});
// GET, PUT, DELETE endpoints omitted for brevity — all generated in same pass
export default router;
Several things to note in this corrected version. The insertTask prepared statement is compiled once when the module loads rather than on every request. better-sqlite3 compiles SQL into a native prepared statement object, and reusing it avoids repeated compilation overhead under load. The title field has an .isLength({ max: 255 }) constraint; without it, an attacker could POST a multi-megabyte title string that gets stored verbatim, exhausting disk space. The Number() wrapper around result.lastInsertRowid prevents a TypeError: Do not know how to serialize a BigInt that JSON.stringify throws when the row ID exceeds Number.MAX_SAFE_INTEGER. Unlikely in a small app, but a latent crash in any long-lived database. Parameterized queries protect against SQL injection at the database layer, while express-validator provides defense-in-depth by constraining input before it reaches the query. Both layers matter.
The iterative loop continues: run the server, test an endpoint manually or via generated tests, paste any errors back into the next prompt.
When the Vibe Breaks: Debugging AI Output
AI-generated code fails in a few predictable ways. Hallucinated imports are common: the AI references a module method that does not exist in the installed version, or invents a package name entirely. Deprecated API usage is another frequent issue, particularly with fast-moving libraries. A third pattern is silent misconfiguration, where the code runs without errors but behaves incorrectly because a default value or option flag does not match the developer's intent.
A concrete example: when generating SQLite interaction code, AI models sometimes generate db.run(sql, params, callback) using the callback-based pattern from the sqlite3 package. In better-sqlite3, db.run() is synchronous and does not accept a callback; passing one produces unexpected behavior. The correct form is db.prepare(sql).run(params).
The fix prompt is straightforward: "The project uses better-sqlite3, which is synchronous. Replace any callback-style db.run() calls with db.prepare().run()." The AI corrects the code, and development continues. This "accept diff, run, paste error back" loop is the heartbeat of vibe coding. It works well for surface-level errors. It fails when the bug is a subtle logic issue that does not produce an obvious error message, which is precisely where experienced developers earn their keep.
Knowing when to stop prompting and start typing manually is a skill that comes only with experience. If the third correction prompt for the same issue produces another broken result, it is almost always faster to open the file and fix it by hand.
When Vibe Coding Works and When It Doesn't
Ideal Use Cases
Vibe coding delivers genuine value in scenarios where speed matters more than long-term maintainability: rapid prototyping, MVP development, internal tools, one-off scripts, hackathon projects, and proof-of-concept demos. It also works well as a learning accelerator. A developer unfamiliar with Express can prompt for a middleware chain, then read the generated output to understand registration order, error-handling flow, and how middleware passes control via next(). Front-end scaffolding and boilerplate generation are another strong fit, where the code is largely structural and the patterns are well-established.
Where Vibe Coding Falls Apart
Production systems with strict security, compliance, or performance requirements are poor candidates for vibe coding. The generated code may work, but working code and secure, performant, maintainable code meet different standards. Complex algorithmic logic, concurrency patterns, distributed systems coordination, and cryptographic implementations all require the kind of precise reasoning that current AI models handle unreliably.
Large existing codebases with deep domain context also resist vibe coding. Even with tools like Windsurf that emphasize context awareness, AI models struggle with the implicit knowledge embedded in mature projects: why the team chose a particular pattern, what edge cases a workaround addresses, which modules are fragile.
If a developer cannot review the generated code and assess its correctness, they should not ship it.
The most important constraint is evaluative: if a developer cannot review the generated code and assess its correctness, they should not ship it. This is the "if you can't review it, you can't ship it" rule, and it applies regardless of how the code was produced.
The Technical Debt Question
Simon Willison, co-creator of Django and creator of Datasette, and a prominent voice on AI-assisted development, has offered a useful framing: vibe coding is fine if the output is treated as disposable. The problems emerge when vibe-coded prototypes quietly become production systems maintained by people who never understood the generated code. That gap between "it works" and "I understand why it works" is where technical debt accumulates silently.
Best Practices for Responsible Vibe Coding
Establish Guardrails Before You Start
Create a .cursorrules file (or .cursor/rules on Cursor 0.45+) or equivalent system prompt first in any vibe coding session. Define the tech stack, coding conventions, security requirements, and testing expectations upfront. Include explicit instructions about authentication and secret handling; without them, every AI-generated endpoint will silently lack auth middleware. Use version control aggressively. Commit before each major AI-generated change so that rolling back a bad generation is trivial rather than painful.
Review, Test, Understand
Always read the generated diffs before accepting. This sounds obvious, but the velocity of vibe coding creates pressure to skip review, and that pressure is the source of most vibe coding failures. Have the AI generate tests, then verify the tests are meaningful. AI models frequently produce tests that pass but assert nothing useful:
// tests/tasks.test.js
import { describe, test, expect } from 'vitest';
import request from 'supertest';
import app from '../src/app.js';
// Meaningless test (AI-generated, passes but tests nothing useful)
test('should return response', async () => {
const res = await request(app).get('/api/tasks');
expect(res).toBeDefined();
});
// Meaningful test (after human refinement)
test('should return 201 and created task with valid input', async () => {
const res = await request(app)
.post('/api/tasks')
.send({ title: 'Write article', status: 'todo' });
expect(res.status).toBe(201);
expect(res.body).toMatchObject({ title: 'Write article', status: 'todo' });
expect(typeof res.body.id).toBe('number');
expect(res.body.id).toBeGreaterThan(0);
});
The first test will never catch a regression. The second one will. Generating tests is easy. Generating tests that actually validate behavior requires human judgment.
Know Your Exit Strategy
Identify the point where vibe coding velocity drops below manual coding velocity. This crossover point exists in every project, usually when the accumulated context exceeds what the AI model can reliably track, or when changes require understanding cross-cutting concerns the AI was never explicitly told about. Refactor AI-generated code into maintainable modules before building further. Document the architectural decisions you accepted from the AI, because six months later, nobody will remember why the code is structured the way it is.
The Bigger Picture: What Vibe Coding Means for Developers
A Shift in the Developer Skill Set
Vibe coding does not eliminate the need for programming knowledge. It reshifts which skills matter most. Prompting effectively is a real skill, but it layers on top of, not in place of, understanding code. The developers who get the most from vibe coding are those who can evaluate output quickly, spot subtle errors, and know when the AI is confidently wrong. Code review, architecture, and system design become more important, not less, when an AI generates the implementation rather than a human writing it.
Junior developers should learn fundamentals before adopting vibe coding as a primary workflow. Using AI to generate code you cannot evaluate is not productivity. It is risk deferral.
Where This Is Headed
The trend so far suggests that IDE-integrated AI agents are becoming the default development environment. Cursor, Windsurf, and their competitors are converging on a model where the AI handles implementation and the developer handles intent, review, and architectural decisions. Today's vibe coding workflows will likely evolve into more structured agentic workflows, with better guardrails, richer context management, and more reliable output.
Vibe coding is a tool in the toolbox. One that compresses hours of scaffolding into minutes, but carries real constraints.
Vibe coding is a tool in the toolbox. One that compresses hours of scaffolding into minutes, but carries real constraints. Engineers who understand both its capabilities and its failure modes will build faster without building worse.