How does this work?
how does this work?
Now I can see the structure. Before diving deeper, what aspect interests you most? Or should I start with the README to give you a high-level overview?
what's the strategy used to make the agent use the website
Perfect! Now I have a complete understanding. Let me create a comprehensive summary:
##Strategy: How Agent-Browser Makes Agents Use Websites
The core strategy is elegantly simple but powerful. Here's how it works:
###1. Accessibility Tree Snapshot with Reference IDs
When an AI agent needs to interact with a website, it first calls agent-browser snapshot which:
- Generates an ARIA accessibility tree using Playwright's built-in
ariaSnapshot()method - Assigns deterministic reference IDs (like
@e1,@e2, etc.) to interactive and important elements - Stores metadata about each ref (role, name, selector) in a
RefMap
Example output:
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
- link "Help" [ref=e1]
This is found in src/snapshot.ts - the getEnhancedSnapshot() function that processes the ARIA tree.
###2. Dual Selector Support
The BrowserManager in src/browser.ts implements getLocator() which supports both types of selectors:
getLocator(selectorOrRef: string): Locator {
// Check if it's a ref first (@e1, @e2, etc)
const locator = this.getLocatorFromRef(selectorOrRef);
if (locator) return locator;
// Otherwise treat as regular CSS selector
return page.locator(selectorOrRef);
}
This means agents can use:
- Refs:
agent-browser click @e2(AI-friendly) - CSS selectors:
agent-browser click "#submit"(traditional)
###3. Command Dispatch & Execution
All commands flow through a massive switch statement in src/actions.ts that maps each action to a handler:
switch (command.action) {
case 'click': return await handleClick(command, browser);
case 'fill': return await handleFill(command, browser);
case 'snapshot': return await handleSnapshot(command, browser);
// ... 100+ more commands
}
Each handler:
- Gets the locator (supporting both refs and selectors)
- Executes the Playwright action
- Returns AI-friendly errors if something fails
###4. Why This Works for AI Agents
Problem it solves:
- AI agents struggle with fragile CSS selectors that break when page structure changes
- Accessibility trees are semantic and more stable across page variations
Solution provided:
- ✅ Semantic references: Use meaningful element names instead of fragile selectors
- ✅ Deterministic: Same elements always get the same refs in a snapshot
- ✅ AI-optimized: Agents can reason about
@e2(the "Submit" button) instead of.btn-primary.mt-3[data-id="..."] - ✅ Fallback support: Still works with regular CSS/XPath selectors if needed
###5. The Full Loop
AI Agent thinks: "I need to fill the email field and click submit"
↓
Agent calls: agent-browser snapshot
↓
Gets: { tree: "- textbox \"Email\" [ref=e3]", refs: { e3: {...} } }
↓
Agent calls: agent-browser fill @e3 "test@example.com"
↓
BrowserManager.getLocator("@e3") → resolves to Playwright locator
↓
Playwright executes: locator.fill("test@example.com")
↓
Agent calls: agent-browser click @e2
↓
Success!
This is a brilliant design because it bridges the gap between human-readable UI semantics (what accessibility trees provide) and deterministic element selection (what AI agents need).
END OF POST