Documentation
How Browsergent works
Browsergent is an AI agent for the browser. You give it a task in plain English; it reasons with an LLM, generates JavaScript, runs that JS in a sandboxed runtime, observes the result, and iterates until the task is done.
The core principle
LLM does reasoning, JS does acting. The model’s
only browser tool is run_js. It never calls Chrome APIs or
touches the DOM directly — it writes JavaScript, and the
@pi-oxide/extension-js runtime turns that into typed
page.* commands executed against the active tab.
The brain is a Rust state machine compiled to WASM
(@pi-oxide/pi-host-web) running in a Web Worker. The LLM is
called over the Anthropic Messages API. Side effects happen only through
the typed command protocol.
The agent loop
- Observe — snapshot the active tab into a structured element list (refs, roles, labels).
- Reason — send the task + history + snapshot to the LLM.
- Act — the model returns JavaScript via
run_js; the runtime dispatchespage.*commands. - Observe again — the result feeds back into the next turn.
- Stop — the model signals completion or you stop the run.
The command protocol
Every browser action is a typed BrowserCommand. A few
examples:
type BrowserCommand =
| { kind: "page.snapshot"; options?: SnapshotOptions }
| { kind: "page.click"; refId: RefId }
| { kind: "page.fill"; refId: RefId; text: string }
| { kind: "page.select"; refId: RefId; value: string }
| { kind: "page.scroll"; direction: "up" | "down" }
| { kind: "page.goto"; url: string }
| { kind: "page.wait"; ms: number };
Results are BrowserResult — an ok value
or a typed error with a machine-readable code (e.g.
E_STALE, E_NOT_FOUND). The model sees errors
and recovers.
Configuration
Open Settings in the side panel and provide:
| Field | Example |
|---|---|
| API Key | sk-ant-api03-… |
| Base URL | https://api.anthropic.com |
| Model | claude-sonnet-4-6 |
Compatible providers: Anthropic,
DeepSeek (api.deepseek.com/anthropic),
z.ai / GLM, or any endpoint implementing the Anthropic
Messages API. Your key stays in the browser — it’s only ever
sent to the base URL you configure.
Skills & files
/skill:<name>activates a reusable skill at compose time — its instructions are folded into the prompt.@[file:name.ext]attaches a session file (stored in OPFS) into the task.- The Files panel lets you upload, edit, and manage session files.
Known limitations
- Chrome only. MV3 side panel + content scripts; not ported to Firefox/Safari.
- Anthropic Messages API only. OpenAI-native function-calling isn’t supported on the wire.
- No headless mode. It drives your real tab.
- Context-bound. Long sessions are compacted, but very long tasks may lose earlier detail.
- Single tab. The agent operates on one active tab at a time.
- Stale refIds. Snapshot refs (
eNNN) are single-use within an observation — reusing one is the most common failure.
Experimental & safety
Browsergent is v0.1 experimental. Its current philosophy is to expose everything the Chrome extension can access so we can learn the boundary of browser-agent capability. It may read page content, cookies, auth headers, and request/response metadata. Always review its actions and avoid using it on accounts where that level of access is unacceptable.