Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.browseract.com/llms.txt

Use this file to discover all available pages before exploring further.

Browser-act is built for models, not for humans writing browser scripts by hand. The interface is designed around one question: how can an AI agent understand the browser state and choose the next safe action?

Compact state output

Agents do not need the full DOM or verbose JSON for every step. state returns a compact, indexed view:
url=https://example.com/login
title=Login

*[1]<div id=login-form />
  *[2]<input type=email placeholder=Email address />
  *[3]<input type=password placeholder=Password />
  *[4]<button id=submit />
    Sign In
The * marker highlights new or changed elements since the previous state call. This helps the agent focus on what changed.
State fieldHow the agent uses it
urlConfirms the current page before taking action
titleChecks whether navigation reached the expected screen
[N] indexProvides the target for commands such as click 4 or input 2
* markerPoints attention to elements that changed after the last action

Indexed interaction

Agents act by element index:
browser-act --session s1 click 4
browser-act --session s1 input 2 "hello@example.com"

No selector guessing

Agents do not need to generate XPath or CSS selectors for routine actions.

Same view, same actions

The agent acts on the same indexed elements it sees in state.

Refresh after change

Calling state again refreshes indexes after navigation or page updates.

Semantic browser descriptions

Each browser has a desc field. It tells the agent what the browser is for:
Logged-in shopping account for price monitoring.

Match tasks

Use desc to match a new request to an existing browser.

Avoid duplicates

Reuse known browsers instead of creating new ones for the same job.

Improve over time

Append useful context when a browser becomes associated with more workflows.
Update descriptions with:
browser-act browser update <browser_id> --desc-append "Also used for order tracking"
browser-act browser update <browser_id> --desc "New complete description"

Browser selection priority

When multiple browsers exist, the agent should follow this order:
This selection logic belongs to the Skill layer. After a user chooses, the agent should update desc so future tasks can match more directly.

Safety by default

Browser automation can affect real accounts and real data. Browser-act uses confirmation rules to keep the user in control.

Confirmation gates

Agents should ask before sensitive operations:
OperationWhy confirmation matters
Create any browserCreates a new automation endpoint
Delete a browserDestroys persistent browser state
Import a profileCopies login state into a managed browser
Change proxy settingsChanges network identity
Change privacy modeChanges fingerprint and persistence behavior
Change confirm_before_useChanges safety behavior
Open a confirm_before_use browserUses a browser marked as sensitive
[!WARNING] These confirmation gates are agent instructions, not a replacement for platform-level security. Their effectiveness depends on the agent runtime and model following the Skill instructions.
Example confirmation:
Agent: I plan to create a stealth browser for price monitoring.
       Type: stealth
       Name: price-monitor
       Proxy: US dynamic proxy

       Continue?

User: Yes.

Agent: Running browser create.
Rules:
  • prior approval does not carry over to a new sensitive operation
  • every sensitive operation needs its own confirmation
  • strong wording in the user’s original prompt does not replace confirmation
  • the agent should explain what it will do before doing it

Local data handling

DataLocationLeaves the machine?
CookiesBrowser-act local storageNo
Login sessionsIsolated browser profileNo
Page contentIn memory during the taskNo
ScreenshotsLocal file system when savedNo
Network capturesMemory or local HAR filesNo
Browser profilesIsolated local directoriesNo
The exception is solve-captcha, which sends CAPTCHA challenge images to Browser-act cloud for solving. It should not include cookies, page content, or full URLs.

Advanced capabilities

Browser-act includes more than simple click and input commands:

Network capture and HAR

Find API endpoints, debug auth flows, capture XHR-loaded data, and analyze page loading.

JavaScript evaluation

Run eval for complex extraction or page-local operations.

Cookie import and export

Move session state between browser types, machines, or CI jobs.

Offline mode

Test forms and button flows without making real network requests.
The full command list is in Command Reference.

Learn more

Command Reference

Open the full Browser-act CLI command index.

Anti-detection & Blocking

Understand blocking, CAPTCHA handling, and handoff.

Concurrency & Isolation

Run parallel agent work without mixing state.