Skip to content

Getting started

browxai is an MCP server that gives an AI agent a curated browser-control surface. It runs over stdio and is driven by any MCP client.

Terminal window
npm install -g browxai
npx playwright-core install chromium # one-time, ~150 MB

A global install puts the browxai binary on your PATH, so an MCP client can launch it by name (command: "browxai"). The binary is the MCP server on the stdio transport.

Add browxai to your client’s MCP server config. For example, an .mcp.json:

{
"mcpServers": {
"browxai": {
"command": "browxai",
},
},
}

By default the server launches a managed Chromium with its own profile, headed, with the default capability set (read, navigation, action, human). Everything dangerous is opt-in.

VariablePurpose
BROWX_WORKSPACEWhere all transient state lives (default ~/.browxai/). Never cwd.
BROWX_HEADLESS1 launches headless.
BROWX_CAPABILITIESComma-separated capability set. Add eval, network-body, clipboard, or file-io to opt into gated tools.
BROWX_ENGINEBrowser engine: chromium (default), firefox, webkit, android, or safari. Also --engine <kind>.
BROWX_ATTACH_CDPLoopback CDP endpoint to attach to an existing Chrome (BYOB).

See the tool reference for the full configuration surface, and note that capabilities are resolved once at server start: changing them means restarting the server.

A typical agent loop:

  1. navigate to a URL.
  2. snapshot to get the accessibility tree plus DOM-walk; every node has a stable [ref=eN].
  3. find to describe the target in natural language; get ranked candidates with a stability flag, an actionable verdict, and a visible-rect bbox.
  4. click / fill / … to act by ref; each returns a structured ActionResult describing what navigated, what structure changed, and a console/network slice.

For verification use text_search, inspect, and the read tools; for flaky or transient UI use wait_for, sample, and act_and_sample.

  • Tool reference is every tool, its inputs and outputs, example calls, the configuration and session model, and the stability policy.
  • Agent guidance is the reach-for-this-not-that map: the footguns agents hit and the curated tool that avoids each one.
  • Security and threat model is the capability model, what browxai defends against, and what it explicitly does not.
Made by Kalebtec · GitHub · MIT licensed