Designed for Agents - Browser-act

Browser-act is built for models, not for humans writing browser scripts by hand. The interface is designed around one question: how can an AI agent understand the browser state and choose the next safe action?

Compact state output

Agents do not need the full DOM or verbose JSON for every step. state returns a compact, indexed view:

url=https://example.com/login
title=Login

*[1]<div id=login-form />
  *[2]<input type=email placeholder=Email address />
  *[3]<input type=password placeholder=Password />
  *[4]<button id=submit />
    Sign In

The * marker highlights new or changed elements since the previous state call. This helps the agent focus on what changed.

State field	How the agent uses it
`url`	Confirms the current page before taking action
`title`	Checks whether navigation reached the expected screen
`[N]` index	Provides the target for commands such as `click 4` or `input 2`
`*` marker	Points attention to elements that changed after the last action

Indexed interaction

Agents act by element index:

browser-act --session s1 click 4
browser-act --session s1 input 2 "hello@example.com"

No selector guessing

Agents do not need to generate XPath or CSS selectors for routine actions.

Same view, same actions

The agent acts on the same indexed elements it sees in state.

Refresh after change

Calling state again refreshes indexes after navigation or page updates.

Semantic browser descriptions

Each browser has a desc field. It tells the agent what the browser is for:

Logged-in shopping account for price monitoring.

Match tasks

Use desc to match a new request to an existing browser.

Avoid duplicates

Reuse known browsers instead of creating new ones for the same job.

Improve over time

Append useful context when a browser becomes associated with more workflows.

Update descriptions with:

browser-act browser update <browser_id> --desc-append "Also used for order tracking"
browser-act browser update <browser_id> --desc "New complete description"

Browser selection priority

When multiple browsers exist, the agent should follow this order:

This selection logic belongs to the Skill layer. After a user chooses, the agent should update desc so future tasks can match more directly.

Safety by default

Browser automation can affect real accounts and real data. Browser-act uses confirmation rules to keep the user in control.

Confirmation gates

Agents should ask before sensitive operations:

Operation	Why confirmation matters
Create any browser	Creates a new automation endpoint
Delete a browser	Destroys persistent browser state
Import a profile	Copies login state into a managed browser
Change proxy settings	Changes network identity
Change privacy mode	Changes fingerprint and persistence behavior
Change `confirm_before_use`	Changes safety behavior
Open a `confirm_before_use` browser	Uses a browser marked as sensitive

[!WARNING] These confirmation gates are agent instructions, not a replacement for platform-level security. Their effectiveness depends on the agent runtime and model following the Skill instructions.

Example confirmation:

Agent: I plan to create a stealth browser for price monitoring.
       Type: stealth
       Name: price-monitor
       Proxy: US dynamic proxy

       Continue?

User: Yes.

Agent: Running browser create.

Rules:

prior approval does not carry over to a new sensitive operation
every sensitive operation needs its own confirmation
strong wording in the user’s original prompt does not replace confirmation
the agent should explain what it will do before doing it

Local data handling

Data	Location	Leaves the machine?
Cookies	Browser-act local storage	No
Login sessions	Isolated browser profile	No
Page content	In memory during the task	No
Screenshots	Local file system when saved	No
Network captures	Memory or local HAR files	No
Browser profiles	Isolated local directories	No

The exception is solve-captcha, which sends CAPTCHA challenge images to Browser-act cloud for solving. It should not include cookies, page content, or full URLs.

Advanced capabilities

Browser-act includes more than simple click and input commands:

Network capture and HAR

Find API endpoints, debug auth flows, capture XHR-loaded data, and analyze page loading.

JavaScript evaluation

Run eval for complex extraction or page-local operations.

Cookie import and export

Move session state between browser types, machines, or CI jobs.

Offline mode

Test forms and button flows without making real network requests.

The full command list is in Command Reference.

Learn more

Command Reference

Open the full Browser-act CLI command index.

Anti-detection & Blocking

Understand blocking, CAPTCHA handling, and handoff.

Concurrency & Isolation

Run parallel agent work without mixing state.

​Compact state output

​Indexed interaction