Skip to main content
OpenAI’s Computer-Using Agent (CUA) is the AI model powering Operator—OpenAI’s agent that can navigate websites, fill forms, and complete multi-step workflows in a browser. CUA interacts with web interfaces like a human, clicking buttons, typing text, and handling complex tasks without needing specialized APIs. With Hyperbrowser, you can leverage CUA to automate browser tasks with a simple API call. Simply provide a task description, and CUA handles the rest—navigating pages, extracting information, and completing your objective. You can view your CUA tasks in the dashboard.

How It Works

You can use CUA in two ways:
  1. Start and Wait: SDKs provide a startAndWait() method that blocks until the task completes and returns the result
  2. Async Pattern: Start a task, get a job ID, then poll for status and results—useful for long-running tasks or when you want more control

Installation

npm install @hyperbrowser/sdk dotenv

Quick Start

The simplest way to run a CUA task is with the startAndWait() method, which handles everything for you:
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";

config();

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

async function main() {
  const result = await client.agents.cua.startAndWait({
    task: "Go to Hacker News and tell me the title of the top post",
    maxSteps: 20,
  });

  console.log(`Output:\n${result.data?.finalResult}`);
}

main().catch((err) => {
  console.error(`Error: ${err.message}`);
});

Async Pattern

When you need more control, use the async pattern to start a task and poll for results:
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";

config();

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

async function main() {
  try {
    // Start the task
    const task = await client.agents.cua.start({
      task: "What is the title of the first post on Hacker News today?",
      maxSteps: 20,
    });

    console.log(`Task started: ${task.jobId}`);
    console.log(`Watch live: ${task.liveUrl}`);

    // Poll for completion
    let result;
    while (true) {
      result = await client.agents.cua.getStatus(task.jobId);
      console.log(`Status: ${result.status}`);

      if (result.status === "completed" || result.status === "failed") {
        break;
      }

      await new Promise((resolve) => setTimeout(resolve, 5000)); // Wait 5s
    }

    const fullResult = await client.agents.cua.get(task.jobId);

    if (fullResult.status === "completed") {
      console.log("Result:", fullResult.data?.finalResult);
      console.log("Steps taken:", fullResult.data?.steps?.length);
    } else {
      console.error("Task failed:", fullResult.error);
    }
  } catch (err) {
    console.error(`Error: ${err.message}`);
  }
}

main();

Stop a Running Task

Stop a task before it completes:
await client.agents.cua.stop("job-id");

Parameters

task
string
required
Natural language description of what you want CUA to accomplish. Be specific for best results.
maxSteps
number
default:"20"
Maximum number of actions CUA can take (clicks, typing, navigation, etc.). Increase for complex tasks.
maxFailures
number
default:"3"
Maximum consecutive failures before the task is aborted.
sessionId
string
ID of an existing browser session to reuse. Useful for multi-step workflows that need to maintain the same browser session.
keepBrowserOpen
boolean
default:"false"
Keep the browser session alive after task completion.
useComputerAction
boolean
default:"false"
Allow the agent to interact by executing actions on the actual computer not just within the page. Allows the agent to see the entire screen instead of just the page contents.
sessionOptions
object
Session configuration (proxy, stealth, captcha solving, etc.). Only applies when creating a new session. If you provide an existing sessionId, these options are ignored.
useCustomApiKeys
boolean
default:"false"
Use your own OpenAI API key instead of consuming Hyperbrowser credits for LLM calls. You will only be charged for browser usage.
apiKeys
object
API key for openai. Required when useCustomApiKeys is true.
{
  openai: "..."
}
The agent may not complete the task within the specified maxSteps. If that happens, try increasing the maxSteps parameter.Additionally, the browser session used by the AI Agent will time out based on your team’s default Session Timeout settings or the session’s timeoutMinutes parameter if provided. You can adjust the default Session Timeout in the Settings page.
useComputerAction can often be better for completing tasks but may require more steps. It is especially useful when the agent needs to interact with elements on the page that might not be accessible by or visible to Playwright. Since it allows the agent to see and interact with the entire screen, it is much more powerful. Instead of executing actions with Playwright which can only interact with the page via CDP, computer actions allow the agent to interact directly with computer primitives (direct clicks, typing, scroll, etc.).

Reuse Browser Sessions

You can pass in an existing sessionId to the CUA task so that it can execute the task on an existing session. Also, if you want to keep the session open after executing the task, you can supply the keepBrowserOpen parameter.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";

config();

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

const main = async () => {
  const session = await client.sessions.create();

  try {
    const result = await client.agents.cua.startAndWait({
      task: "What is the title of the first post on Hacker News today?",
      sessionId: session.id,
      keepBrowserOpen: true,
    });

    console.log(`Output:\n${result.data?.finalResult}`);

    const result2 = await client.agents.cua.startAndWait({
      task: "Tell me how many upvotes the first post has.",
      sessionId: session.id,
    });

    console.log(`\nOutput:\n${result2.data?.finalResult}`);
  } catch (err) {
    console.error(`Error: ${err}`);
  } finally {
    await client.sessions.stop(session.id);
  }
};

main().catch((err) => {
  console.error(`Error: ${err.message}`);
});
Always set keepBrowserOpen: true on tasks that you want to reuse the session from. Otherwise, the session will be automatically closed when the task completes.

Using Your Own API Keys

Bring your own OpenAI API key to avoid consuming Hyperbrowser credits for LLM calls. You’ll still be charged for browser session usage, but save on token costs.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";

config();

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

const main = async () => {
  const result = await client.agents.cua.startAndWait({
    task: "What is the title of the first post on Hacker News today?",
    useCustomApiKeys: true,
    apiKeys: {
      openai: "<OPENAI_API_KEY>",
    },
  });

  console.log(`Output:\n\n${result.data?.finalResult}`);
};

main().catch((err) => {
  console.error(`Error: ${err.message}`);
});

Session Configuration

Customize the browser session used by CUA with session options.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";

config();

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

const main = async () => {
  const result = await client.agents.cua.startAndWait({
    task: "What is the title of the first post on Hacker News today?",
    sessionOptions: {
      acceptCookies: true,
    }
  });

  console.log(`Output:\n\n${result.data?.finalResult}`);
};

main().catch((err) => {
  console.error(`Error: ${err.message}`);
});
sessionOptions only applies when creating a new session. If you provide a sessionId, these options are ignored.
Proxies and CAPTCHA solving add latency to page navigation. Only enable them when necessary for your use case.

Best Practices

Be explicit about what you want CUA to do. Instead of “check the website”, say “go to example.com, find the pricing page, and extract the cost of the Enterprise plan”.
Simple tasks need 10-20 steps. Complex multi-page workflows might need 50+ steps. Monitor failed tasks and adjust accordingly.
It is usually better to split up complex tasks into smaller, more manageable ones and execute them as separate agent calls on the same session.