Getting Started with Chrome's window.ai Prompt API


- Premium Results
- Publish articles on SitePoint
- Daily curated jobs
- Learning Paths
- Discounts to dev tools
7 Day Free Trial. Cancel Anytime.
How to Use the Chrome Prompt API
- Install Chrome 138+ (Dev or Canary channel) on a device with a supported GPU.
- Enable the
optimization-guide-on-device-modelandprompt-api-for-gemini-nanoflags inchrome://flags. - Confirm the Gemini Nano model download completed at
chrome://on-device-internals. - Check availability with
LanguageModel.availability()and handle all four status values. - Create a session via
LanguageModel.create()with a system prompt and optional parameters. - Send prompts using
session.prompt()for full responses orsession.promptStreaming()for real-time output. - Monitor token usage with
session.countPromptTokens()andsession.tokensLeftbefore each call. - Destroy the session in a
finallyblock to free GPU and RAM resources.
The Chrome Prompt API gives frontend developers the ability to run LLM inference directly in the browser with no server infrastructure and no user data leaving the device. The API connects to Gemini Nano running on-device, offering a promise-based and streaming JavaScript interface for text generation. It ships in Chrome 138+ (Dev or Canary channel) behind a pair of feature flags, and it works right now.
This tutorial walks through everything needed to go from enabling the flags to building a working summarization tool.
Table of Contents
- What Is the Chrome Prompt API?
- Prerequisites and Environment Setup
- Checking Browser and Model Availability
- Creating a Language Model Session
- Sending Prompts and Handling Responses
- Tutorial: Build a Simple Summarization Tool
- Tips, Limitations, and Best Practices
- What's Next for Built-in AI in the Browser
What Is the Chrome Prompt API?
Chrome's Built-in AI initiative bundles several on-device AI capabilities directly into the browser. The Prompt API, sometimes referred to as the Language Model API, is the core text-generation interface within that initiative. It lives under the LanguageModel global object and provides a standardized way to send natural language prompts to Gemini Nano, a lightweight LLM that runs entirely on the user's machine. In code, reference it directly as LanguageModel (e.g., LanguageModel.create()), not via window.ai.LanguageModel.
Because inference happens on-device, there is no network round-trip for generation. Local development needs no API key, though Origin Trial users will manage a trial token instead. The model ships as part of Chrome's component updater infrastructure, and the API is available in Chrome 138 and later (Dev or Canary channel) behind the chrome://flags/#optimization-guide-on-device-model and chrome://flags/#prompt-api-for-gemini-nano flags.
Because inference happens on-device, there is no network round-trip for generation. Local development needs no API key.
Developers can also access it through an Origin Trial for broader testing. Check the Chrome Origin Trials page for current availability, and inject the token via <meta http-equiv="origin-trial" content="YOUR_TOKEN"> in your HTML. On the standards track, the API is a W3C Web Incubator Community Group (WICG) proposal, with the explicit goal of cross-browser standardization over time.
Prerequisites and Environment Setup
System Requirements
- Chrome 138 or newer in the Dev or Canary channel. These flags do not appear in Chrome Stable. Download from chrome.com/chrome/dev or google.com/chrome/canary.
- Windows, macOS, Linux, or ChromeOS. Support scope varies by release.
- The on-device model requires a supported GPU and enough RAM to hold the model in memory. Exact minimums are not published. Navigate to
chrome://on-device-internalsto check whether your device qualifies; if it does not, the availability check will return"unavailable". - Downloading the model takes approximately 1.7 GB as of early Chrome 138 releases. Check
chrome://on-device-internalsfor the current size in your version. - A network connection is required for the initial model download. Subsequent inference is fully offline.
Enable the Required Chrome Flags
Open Chrome (Dev or Canary, version 138+) and navigate to chrome://flags. Search for and enable these two flags:
- optimization-guide-on-device-model — set to "Enabled BypassPerfRequirement" (this relaxes hardware checks during development).
- prompt-api-for-gemini-nano — set to "Enabled".
Relaunch Chrome when prompted.
Confirm the Model Is Downloaded
After enabling the flags, Chrome begins downloading Gemini Nano in the background. The download takes a few minutes depending on connection speed.
To verify progress, navigate to chrome://on-device-internals. On this page, look for the model status field. The status "Installed" or a displayed file path confirms the download succeeded. Alternatively, calling the availability check (covered next) can trigger the download if it has not started yet.
Checking Browser and Model Availability
Not every Chrome installation will have the model ready. The device might not meet hardware requirements, the download might still be in progress, or the flags might be disabled. Graceful degradation is essential.
The LanguageModel.availability() method returns one of four strings: "available" (model is ready), "downloadable" (model can be fetched but has not been yet), "downloading" (fetch in progress), or "unavailable" (not supported on this device/browser).
async function checkPromptAPI() {
// 'self' refers to the global object (window in main thread, or the worker global in workers)
if (!("LanguageModel" in self)) {
return { status: "unavailable", message: "Prompt API not supported in this browser." };
}
const availability = await LanguageModel.availability();
const messages = {
available: "Model is ready. You can start prompting.",
downloadable: "Model is available for download. Creating a session will trigger it.",
downloading: "Model is currently downloading. Please wait.",
unavailable: "Model is not available on this device."
};
if (!messages[availability]) {
console.warn(`Unexpected availability status: "${availability}"`);
}
return { status: availability, message: messages[availability] || "Unknown status." };
}
This standalone snippet is illustrative — it demonstrates the availability check in isolation. The full application in the tutorial below integrates the same logic into its init() function, so there is no need to use both.
Checking for "LanguageModel" in self first prevents reference errors in browsers that have no awareness of the API at all.
Creating a Language Model Session
All interaction with the Prompt API flows through a session object, created via LanguageModel.create(). A session encapsulates the model configuration, conversation state, and token budget.
Key options passed to create():
systemPrompt: sets the model's persona and behavioral constraints.temperature: controls randomness (lower values produce more focused, less varied output; higher values increase creativity).topK: limits token sampling to the top K candidates. These parameters use model defaults if omitted.signal: anAbortSignalfor cancellation support (also accepted bypromptStreaming()for per-call cancellation).
const session = await LanguageModel.create({
systemPrompt: "You are a helpful summarizer. Respond in two sentences or fewer."
});
Understanding Token Limits
Every session exposes three properties for managing context: maxTokens (the total context window), tokensSoFar (tokens consumed by the system prompt and conversation history), and tokensLeft (remaining capacity). The context window varies by model version; read session.maxTokens at runtime for the authoritative value (commonly around 6,144 tokens in early releases). For longer inputs, pre-checking token count with session.countPromptTokens(text) before sending a prompt avoids silent truncation or errors.
Sending Prompts and Handling Responses
Non-Streaming Response
The simplest path: call session.prompt() with a string. It returns a promise that resolves to the complete generated text.
const result = await session.prompt("Summarize this paragraph: The rapid growth of on-device AI...");
console.log(result);
This blocks until the full response is generated, which works fine for short outputs but creates noticeable UI lag for longer generations.
Streaming Response
For real-time rendering, session.promptStreaming() returns a ReadableStream. The API returns the full accumulated response as each chunk, not a delta. Verify this behavior in your Chrome version, as streaming semantics may change. The application code below includes auto-detection logic for both modes.
const stream = session.promptStreaming("Summarize this paragraph: The rapid growth of on-device AI...");
let accumulated = "";
let isCumulative = null;
let chunkIndex = 0;
for await (const chunk of stream) {
chunkIndex++;
if (chunkIndex === 2) {
// If the second chunk starts with the first chunk's content, the API is cumulative
isCumulative = chunk.startsWith(accumulated);
}
if (isCumulative === false) {
accumulated += chunk;
outputEl.textContent = accumulated;
} else {
// Cumulative (default assumption) or not yet determined
outputEl.textContent = chunk;
accumulated = chunk;
}
}
If chunks are cumulative, directly assigning chunk to textContent produces a smooth typewriter effect without any string accumulation logic. If you know that chunks are deltas in your Chrome version, switch to outputEl.textContent += chunk.
Tutorial: Build a Simple Summarization Tool
The goal: a minimal single-page application where users paste text into a textarea and receive an on-device summary streamed into the page. No build tools. No dependencies.
The goal: a minimal single-page application where users paste text into a textarea and receive an on-device summary streamed into the page. No build tools. No dependencies.
HTML Structure
Save the following as index.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>On-Device Summarizer</title>
<style>
body { font-family: system-ui, sans-serif; max-width: 640px; margin: 2rem auto; padding: 0 1rem; }
textarea { width: 100%; height: 150px; margin-bottom: 0.5rem; }
#output { white-space: pre-wrap; padding: 1rem; background: #f5f5f5; min-height: 3rem; margin-top: 1rem; }
#status { font-size: 0.85rem; color: #666; }
button { margin-right: 0.5rem; }
</style>
</head>
<body>
<h1>On-Device Summarizer</h1>
<span id="status">Checking model availability...</span>
<textarea id="input" maxlength="100000" placeholder="Paste text to summarize..."></textarea>
<button id="summarize" disabled>Summarize</button>
<button id="cancel" hidden>Cancel</button>
<div id="output"></div>
<script src="app.js"></script>
</body>
</html>
JavaScript Logic
Save the following as app.js in the same folder as index.html:
const inputEl = document.getElementById("input");
const outputEl = document.getElementById("output");
const statusEl = document.getElementById("status");
const summarizeBtn = document.getElementById("summarize");
const cancelBtn = document.getElementById("cancel");
const SYSTEM_PROMPT = "You are a helpful summarizer. Respond in two sentences or fewer.";
const INPUT_MAX_LENGTH = 50_000;
const SESSION_TIMEOUT_MS = 30_000;
let controller = null;
let isRunning = false;
/**
* Returns a promise that races `promise` against a timeout.
* Rejects with a descriptive error if the timeout fires first.
*/
function withTimeout(promise, ms, label) {
const timer = new Promise((_, reject) =>
setTimeout(() => reject(new Error(`${label} timed out after ${ms}ms`)), ms)
);
return Promise.race([promise, timer]);
}
async function init() {
if (!("LanguageModel" in self)) {
statusEl.textContent = "Prompt API not supported. Enable flags in chrome://flags.";
return;
}
try {
const availability = await LanguageModel.availability();
if (availability === "available") {
statusEl.textContent = "Model ready.";
summarizeBtn.disabled = false;
} else if (availability === "downloadable" || availability === "downloading") {
statusEl.textContent = `Model status: ${availability}. Please wait for the download to finish, then refresh this page.`;
} else {
statusEl.textContent = "Model unavailable on this device.";
}
} catch (err) {
console.error("LanguageModel.availability() failed:", err);
statusEl.textContent = "Could not check model status. Reload the page to retry.";
}
}
async function summarize() {
if (isRunning) return;
isRunning = true;
const text = inputEl.value.trim();
if (!text || text.length > INPUT_MAX_LENGTH) {
outputEl.textContent = text.length > INPUT_MAX_LENGTH
? "Input too large. Please shorten your text."
: "";
isRunning = false;
return;
}
summarizeBtn.disabled = true;
cancelBtn.hidden = false;
outputEl.textContent = "";
const localController = new AbortController();
controller = localController;
let session = null;
try {
session = await withTimeout(
LanguageModel.create({
systemPrompt: SYSTEM_PROMPT,
signal: localController.signal
}),
SESSION_TIMEOUT_MS,
"Session creation"
);
const tokenCount = await session.countPromptTokens(text);
statusEl.textContent = `Input tokens: ${tokenCount} | Budget: ${session.tokensLeft} remaining`;
if (tokenCount > session.tokensLeft) {
outputEl.textContent = "Input exceeds token budget. Please shorten your text.";
return;
}
const stream = session.promptStreaming(`Summarize this:
${text}`, {
signal: localController.signal
});
let accumulated = "";
let isCumulative = null;
let chunkIndex = 0;
for await (const chunk of stream) {
chunkIndex++;
if (chunkIndex === 2) {
isCumulative = chunk.startsWith(accumulated);
}
if (isCumulative === false) {
accumulated += chunk;
outputEl.textContent = accumulated;
} else {
outputEl.textContent = chunk;
accumulated = chunk;
}
}
const used = session.tokensSoFar;
const max = session.maxTokens;
statusEl.textContent = `Tokens used: ${used} / ${max}`;
} catch (err) {
if (err.name === "AbortError") {
outputEl.textContent = "[Generation cancelled]";
} else {
console.error("Summarization failed:", err);
outputEl.textContent = "Something went wrong. Check the console for details.";
}
statusEl.textContent = "Ready.";
} finally {
if (session) {
session.destroy();
}
isRunning = false;
summarizeBtn.disabled = false;
cancelBtn.hidden = true;
controller = null;
}
}
summarizeBtn.addEventListener("click", summarize);
cancelBtn.addEventListener("click", () => controller?.abort());
init();
Enhancing the UX
The script above handles several UX concerns that go beyond the basic API wiring.
The isRunning flag acts as a re-entrancy guard. It prevents overlapping calls to summarize(), even if the user clicks the button rapidly before the first await suspends execution. This ensures only one session is ever active at a time.
Cancellation works through AbortController wiring. The Cancel button calls controller.abort(), and because localController is scoped per invocation, aborting always targets the correct in-flight session. Session creation is also wrapped in a timeout so that a stalled model download or initialization does not lock the UI indefinitely. If the timeout fires, the error is caught and the button re-enables.
Before any API call fires, the input size guard rejects extremely large pastes, preventing the tab from hanging on token counting. Once the input passes that check, countPromptTokens() verifies the text fits within the session's remaining token budget, avoiding wasted inference on inputs that would fail anyway.
The streaming loop detects whether the API emits cumulative chunks or deltas by comparing the second chunk against the first, then adjusts rendering accordingly. Raw API error strings go to the console for debugging, while the output area shows a generic user-facing message. Finally, session.destroy() runs in the finally block on every exit path (success, error, or cancellation), freeing GPU and RAM resources.
Note on model download state: If the model is still downloading when you first load the page, the Summarize button will remain disabled. Refresh the page after the download completes (check progress at chrome://on-device-internals) to enable the button.
Tips, Limitations, and Best Practices
For multi-turn conversations, clone an existing session with session.clone() instead of recreating one from scratch. This preserves the system prompt configuration without re-initializing the session:
const cloned = await session.clone();
Note that the exact state preserved by clone() (e.g., whether conversation history carries over or only the system prompt configuration) should be verified against the current API behavior.
Call session.destroy() when a session is no longer needed. Gemini Nano's memory footprint is nontrivial; check Chrome Task Manager (Shift+Esc) to see per-tab memory usage after session creation. Orphaned sessions retain those resources. Always place destroy() in a finally block to ensure cleanup on error and abort paths.
Always pre-check with session.countPromptTokens(text) before sending a prompt. The context window fills quickly with longer documents. Read session.maxTokens at runtime for the exact budget.
All processing stays on-device. No data is transmitted to external servers during inference, making this well-suited for applications handling sensitive user content.
The model supports English only, handles text only (no images or audio), and runs exclusively in Chrome. Its quality suits focused single-turn tasks like summarization, reformulation, and classification. Multi-step chain-of-thought prompts or requests exceeding roughly 2,000 input tokens degrade noticeably, and code generation pushes well beyond what Gemini Nano handles reliably.
All processing stays on-device. No data is transmitted to external servers during inference, making this well-suited for applications handling sensitive user content.
What's Next for Built-in AI in the Browser
The Prompt API is one piece of a broader effort. Companion APIs in development include the Summarizer API and the Writer and Rewriter API, both of which target specific tasks with optimized interfaces. A Translator API and Language Detector API are also in progress.
Full documentation lives at the Chrome Built-in AI documentation site. The WICG tracks proposals on GitHub. The API is in active development and the core LanguageModel.create() and session.prompt() patterns work for local development and prototyping today.
Try it yourself: Save the HTML block as index.html and the JavaScript block as app.js in the same folder. Serve them with a local server (e.g., npx serve . or VS Code Live Server) — do not open index.html directly via file://, as script loading may fail. Enable the Chrome flags, and run an on-device summarizer with zero dependencies. No API key needed.