page.perform() method executes single, granular actions on a web page. Itβs optimized for speed and reliability, using the accessibility tree instead of screenshots.
Overview
| Characteristic | Description |
|---|---|
| Speed | β‘ Fast - Uses accessibility tree (no screenshots) |
| Cost | π° Cheap - Single LLM call per action |
| Reliability | π― Direct element finding and execution |
| Efficiency | π Text-based DOM analysis with automatic ad-frame filtering |
Basic Usage
Common Actions
Click Elements
Fill or Type Inputs
Form Interactions
Scrolling
Hover
Keyboard Actions
When to Use perform() vs ai()
Use page.perform()
- Single, specific actions
- When you know exactly what action is needed
- Fast, reliable execution
- Lower token cost
Use page.ai()
- Complex multi-step workflows
- When visual context is needed
- Tasks requiring decision making
- When next action depends on page state
Example: Combining Both
Return Value
page.perform() returns a TaskOutput object:
Checking Success
Error Handling
Tips for Writing Effective Instructions
Be specific about the target element
Be specific about the target element
Good: βclick the blue βSign Upβ button at the bottom of the formβBad: βclick the buttonβ
Include context when elements look similar
Include context when elements look similar
Good: βfill the email input in the login form with [email protected]βBad: βfill emailβ
Use visible text for identification
Use visible text for identification
Good: βclick the link that says βLearn MoreββBad: βclick the third linkβ
Specify the action clearly
Specify the action clearly
Good: βtype βsearch queryβ into the search boxβBad: βsearch for somethingβ
CDP Actions
HyperAgent uses Chrome DevTools Protocol (CDP) for precise element interactions by default. This provides:- Exact coordinate-based clicks
- Deep iframe support
- Auto-filtering of ad frames