# Computer-Use Agents SOTA Challenge

Congrats on joining the Cua + HUD hackathon at Hack The North 2025!

This notebook will show you how to create a computer use agent with Cua and evaluate it using HUD.

## 💻 Prequisites

Clone the Cua repository and install project dependencies.

The easiest way to get started is by getting set up with the Cua development repository.

Install [Docker](https://www.docker.com/products/docker-desktop/) and [pdm](https://pdm-project.org/en/latest/#recommended-installation-method).

Clone the Cua repository:

`git clone https://github.com/trycua/cua`

Install the project dependencies:

`cd cua && pdm install`

Now, you should be able to run the `notebooks/hud_hackathon.ipynb` notebook in VS Code with the `.venv` virtual environment selected.

## ☁️ Connect to cloud services

Create a free HUD accounts and load your API keys. 

1. Create a HUD account at https://www.hud.so/
4. Create a .env file:

In [None]:
# Create a .env file if it doesn't exist

ENV_TEMPLATE = """# Required environment variables:
HUD_API_KEY=

# Any LLM provider will work:
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
"""

import os
if not os.path.exists(".env"):
    open(".env", "w").write(ENV_TEMPLATE)
    print("A .env file was created! Fill in the empty values.")

5. Fill in all missing values in the .env file

In [None]:
# Read the .env file
# HUD requires the .env file to be in the same directory

from dotenv import load_dotenv
load_dotenv(dotenv_path='.env', override=True)

assert os.getenv("HUD_API_KEY")

## 🤖 Create a computer use agent

Create and a computer use agent using the Cua SDK.

In [None]:
import logging
from pathlib import Path
from agent import ComputerAgent

# Here you can set the model and tools for your agent.
# Computer use models: https://www.trycua.com/docs/agent-sdk/supported-agents/computer-use-agents
# Composed agent models: https://www.trycua.com/docs/agent-sdk/supported-agents/composed-agents
# Custom tools: https://www.trycua.com/docs/agent-sdk/custom-tools
agent_config = {
    "model": "openai/computer-use-preview",
    "trajectory_dir": str(Path("trajectories")),
    "only_n_most_recent_images": 3,
    "verbosity": logging.INFO
}

## 🖱️ Test your agent

Run your agent on a test scenario in a Docker container.

Make sure Docker is running to launch the computer.

You can view the live VNC stream from the Docker container at `http://localhost:8006/`

In [None]:
from computer import Computer, VMProviderType
import webbrowser

# Connect to your existing cloud container
computer = Computer(
    os_type="linux",
    provider_type=VMProviderType.DOCKER,
    verbosity=logging.INFO
)
await computer.run()

agent_config["tools"] = [ computer ]

webbrowser.open("http://localhost:8006/", new=0, autoraise=True)

Try running the computer use agent on a simple task.

Trajectories are saved in the format: `trajectories/YYYY-MM-DD_computer-use-pre_XXX`.

In [None]:
# Create agent
agent = ComputerAgent(**agent_config)

tasks = [
    "Open the web browser and search for a repository named trycua/cua on GitHub."
]

for i, task in enumerate(tasks):
    print(f"\nExecuting task {i}/{len(tasks)}: {task}")
    async for result in agent.run(task):
        print(result)
        pass

    print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")

## 🧐 Benchmark your agent

Test your agent's performance on a selection of tasks from the OSWorld benchmark.

In [None]:
import uuid
from pprint import pprint
from agent.integrations.hud import run_full_dataset

job_name = f"osworld-test-{str(uuid.uuid4())[:4]}"

# Full dataset evaluation (runs via HUD's run_dataset under the hood)
# See the documentation here: https://docs.trycua.com/docs/agent-sdk/integrations/hud#running-a-full-dataset
results = await run_full_dataset(
    dataset="ddupont/OSWorld-Tiny-Public",
    job_name=job_name,
    **agent_config,
    max_concurrent=20,
    max_steps=50,
    #split="train[:5]"
)

# results is a list from hud.datasets.run_dataset; inspect/aggregate as needed
print(f"Job: {job_name}")
print(f"Total results: {len(results)}")
pprint(results[:3])

## 🦾 Improve your agent

To improve your agent for OSWorld-Verified, experiment with different models and add custom tools that fit your use case. You can also dive into the ComputerAgent source code to design an improved version or subclass tailored to your needs.

Learn more about [Customizing Your ComputerAgent](https://docs.trycua.com/docs/agent-sdk/customizing-computeragent) in the docs.