page.perform()

The page.perform() method executes single, granular actions on a web page. It’s optimized for speed and reliability, using the accessibility tree instead of screenshots.

Overview

Characteristic	Description
Speed	⚡ Fast - Uses accessibility tree (no screenshots)
Cost	💰 Cheap - Single LLM call per action
Reliability	🎯 Direct element finding and execution
Efficiency	📊 Text-based DOM analysis with automatic ad-frame filtering

Basic Usage

import { HyperAgent } from "@hyperbrowser/agent";

const agent = new HyperAgent({
  llm: { provider: "openai", model: "gpt-4o" },
});

const page = await agent.newPage();
await page.goto("https://example.com/login");

// Execute single actions
await page.perform("fill email with user@example.com");
await page.perform("fill password with mypassword");
await page.perform("click the login button");

await agent.closeAgent();

Common Actions

Click Elements

await page.perform("click the login button");
await page.perform("click the first search result");
await page.perform("click the 'Add to Cart' button");
await page.perform("click the menu icon in the top right");

Fill or Type Inputs

await page.perform("fill email with test@example.com");
await page.perform("type 'mechanical keyboard' into the search box");
await page.perform("fill the password field with MySecurePass123");

Form Interactions

await page.perform("check the 'Remember me' checkbox");
await page.perform("uncheck the newsletter subscription");
await page.perform("select 'United States' from the country dropdown");

Scrolling

// Scroll to a specific element
await page.perform("scroll to the pricing section");
await page.perform("scroll the reviews section into view");

// Scroll by percentage
await page.perform("scroll to 50% of the page");
await page.perform("scroll to the bottom of the page");

// Chunk-based scrolling (useful for infinite scroll or long pages)
await page.perform("scroll to the next chunk");
await page.perform("scroll to the previous chunk");

Hover

await page.perform("hover over the user profile menu");
await page.perform("hover over the dropdown to reveal options");

Keyboard Actions

await page.perform("press Enter");
await page.perform("press Escape to close the modal");
await page.perform("press Tab to move to the next field");

When to Use perform() vs ai()

Use page.perform()

Single, specific actions
When you know exactly what action is needed
Fast, reliable execution
Lower token cost

Use page.ai()

Complex multi-step workflows
When visual context is needed
Tasks requiring decision making
When next action depends on page state

Example: Combining Both

const page = await agent.newPage();
await page.goto("https://amazon.com");

// Use perform for known, simple actions
await page.perform("click the search box");
await page.perform("type 'laptop' into the search box");
await page.perform("click the search button");

// Use ai() when complex decision-making is needed
await page.ai("find the best-rated laptop under $1000 and add it to cart");

Return Value

page.perform() returns a TaskOutput object:

interface TaskOutput {
  taskId: string;
  status: TaskStatus;  // "completed" | "failed"
  output: string;      // Result message
  steps: AgentStep[];  // Steps taken (usually 1 for perform)
}

Checking Success

const result = await page.perform("click the submit button");

if (result.status === "completed") {
  console.log("Action successful:", result.output);
} else {
  console.error("Action failed:", result.output);
}

Error Handling

try {
  await page.perform("click the non-existent button");
} catch (error) {
  console.error("Failed to perform action:", error);
}

Tips for Writing Effective Instructions

Be specific about the target element

Good: “click the blue ‘Sign Up’ button at the bottom of the form”Bad: “click the button”

Include context when elements look similar

Good: “fill the email input in the login form with user@example.com”Bad: “fill email”

Use visible text for identification

Good: “click the link that says ‘Learn More’”Bad: “click the third link”

Specify the action clearly

Good: “type ‘search query’ into the search box”Bad: “search for something”

CDP Actions

HyperAgent uses Chrome DevTools Protocol (CDP) for precise element interactions by default. This provides:

Exact coordinate-based clicks
Deep iframe support
Auto-filtering of ad frames

To disable CDP and use Playwright locators instead:

const agent = new HyperAgent({
  cdpActions: false,
});

Next Steps

page.ai()

Complex multi-step automation

page.extract()

Extract structured data

Action Caching

Record and replay automations

Getting Started

Core Methods

Configuration

Action Caching

Advanced

Overview

Basic Usage

Common Actions

Click Elements

Fill or Type Inputs

Form Interactions

Scrolling

Hover

Keyboard Actions

When to Use perform() vs ai()

Use page.perform()

Use page.ai()

Example: Combining Both

Return Value

Checking Success

Error Handling

Tips for Writing Effective Instructions

CDP Actions

Next Steps

page.ai()

page.extract()

Action Caching

Getting Started

Core Methods

Configuration

Action Caching

Advanced

​Overview

​Basic Usage

​Common Actions

​Click Elements

​Fill or Type Inputs

​Form Interactions

​Scrolling

​Hover

​Keyboard Actions

​When to Use perform() vs ai()

Use page.perform()

Use page.ai()

​Example: Combining Both

​Return Value

​Checking Success

​Error Handling

​Tips for Writing Effective Instructions

​CDP Actions

​Next Steps

page.ai()

page.extract()

Action Caching

Overview

Basic Usage

Common Actions

Click Elements

Fill or Type Inputs

Form Interactions

Scrolling

Hover

Keyboard Actions

When to Use perform() vs ai()

Example: Combining Both

Return Value

Checking Success

Error Handling

Tips for Writing Effective Instructions

CDP Actions

Next Steps