Documentation

How Browsergent works

Browsergent is an AI agent for the browser. You give it a task in plain English; it reasons with an LLM, generates JavaScript, runs that JS in a sandboxed runtime, observes the result, and iterates until the task is done.

The core principle

LLM does reasoning, JS does acting. The model’s only browser tool is run_js. It never calls Chrome APIs or touches the DOM directly — it writes JavaScript, and the @pi-oxide/extension-js runtime turns that into typed page.* commands executed against the active tab.

The brain is a Rust state machine compiled to WASM (@pi-oxide/pi-host-web) running in a Web Worker. The LLM is called over the Anthropic Messages API. Side effects happen only through the typed command protocol.

The agent loop

  1. Observe — snapshot the active tab into a structured element list (refs, roles, labels).
  2. Reason — send the task + history + snapshot to the LLM.
  3. Act — the model returns JavaScript via run_js; the runtime dispatches page.* commands.
  4. Observe again — the result feeds back into the next turn.
  5. Stop — the model signals completion or you stop the run.

The command protocol

Every browser action is a typed BrowserCommand. A few examples:

type BrowserCommand =
  | { kind: "page.snapshot"; options?: SnapshotOptions }
  | { kind: "page.click";   refId: RefId }
  | { kind: "page.fill";    refId: RefId; text: string }
  | { kind: "page.select";  refId: RefId; value: string }
  | { kind: "page.scroll";  direction: "up" | "down" }
  | { kind: "page.goto";    url: string }
  | { kind: "page.wait";    ms: number };

Results are BrowserResult — an ok value or a typed error with a machine-readable code (e.g. E_STALE, E_NOT_FOUND). The model sees errors and recovers.

Configuration

Open Settings in the side panel and provide:

FieldExample
API Keysk-ant-api03-…
Base URLhttps://api.anthropic.com
Modelclaude-sonnet-4-6

Compatible providers: Anthropic, DeepSeek (api.deepseek.com/anthropic), z.ai / GLM, or any endpoint implementing the Anthropic Messages API. Your key stays in the browser — it’s only ever sent to the base URL you configure.

Skills & files

  • /skill:<name> activates a reusable skill at compose time — its instructions are folded into the prompt.
  • @[file:name.ext] attaches a session file (stored in OPFS) into the task.
  • The Files panel lets you upload, edit, and manage session files.

Known limitations

  • Chrome only. MV3 side panel + content scripts; not ported to Firefox/Safari.
  • Anthropic Messages API only. OpenAI-native function-calling isn’t supported on the wire.
  • No headless mode. It drives your real tab.
  • Context-bound. Long sessions are compacted, but very long tasks may lose earlier detail.
  • Single tab. The agent operates on one active tab at a time.
  • Stale refIds. Snapshot refs (eNNN) are single-use within an observation — reusing one is the most common failure.

Experimental & safety

Browsergent is v0.1 experimental. Its current philosophy is to expose everything the Chrome extension can access so we can learn the boundary of browser-agent capability. It may read page content, cookies, auth headers, and request/response metadata. Always review its actions and avoid using it on accounts where that level of access is unacceptable.

Going deeper