# ComputerAgent HUD Integration for OSWorld

This notebook demonstrates how to use the ComputerAgent with HUD for OSWorld benchmarking.
The ComputerAgent integration provides the same interface as OperatorAgent but works with both Claude and OpenAI models.

In [None]:
# # Install dependencies if needed
# !uv venv 
# !source .venv/bin/activate
# !uv sync

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Required environment variables:
# - HUD_API_KEY (for HUD access)
# - ANTHROPIC_API_KEY (for Claude models)
# - OPENAI_API_KEY (for OpenAI models)

from hud import gym, load_taskset
from pprint import pprint
import asyncio

In [3]:
# Import the HUD-integrated ComputerAgent
from agent.integrations.hud import ComputerAgent

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
# Load OSWorld taskset
taskset = await load_taskset("OSWorld-Verified")
print(f"Total tasks in OSWorld: {len(taskset)}")

# Select a test task
test = taskset[148]
print(f"Task prompt: {test.prompt}")

Total tasks in OSWorld: 367
Task prompt: Can you make my computer bring back the last tab I shut down?


In [4]:
# Load SheetBench taskset
taskset = await load_taskset("SheetBench-V2")
print(f"Total tasks in SheetBench: {len(taskset)}")

# Select a test task
test = taskset[0]
print(f"Task prompt: {test.prompt}")

Total tasks in SheetBench: 50
Task prompt: Given the Input data, determine the ticker with the greatest correlation between volume and next day price change.
- in ANSWER tab put the Ticker in A1 and the correlation in B1
  - use CORREL to determine correlation
- be sure to first sort the date by ticker z to a and then date ascending before calculating nextdaypricechange %
Correlation should be rounded to 2 decimal points


In [5]:
# Create environment (takes ~2.5 minutes to start)
env = await gym.make(test)
print("Environment ready!")

[INFO] 2025-08-08 15:16:46,133 | hud.environment | View the live trace at https://app.hud.so/trace/662fd59f-5a8d-4205-9b88-32c00d0feab0


Environment ready!


In [6]:
await env.stream() # vnc

'\n    <div style="width: 960px; height: 540px; overflow: hidden;">\n        <div style="transform: scale(0.5); transform-origin: top left;">\n            <iframe src="https://live.anchorbrowser.io?sessionId=7486a5f7-d7eb-458e-b1b1-a11852e0e217" width="1920" height="1080" style="border: 1px solid #ddd;">\n            </iframe>\n        </div>\n    </div>\n    '

## Test with Claude Model

The ComputerAgent can use Claude models just like the original ClaudeAgent:

In [7]:
import logging
# Create ComputerAgent with Claude
claude_agent = ComputerAgent(
    model="anthropic/claude-3-5-sonnet-20241022",
    environment="linux",  # OSWorld typically uses Linux
    verbosity=logging.INFO,
)

print(f"Created Claude agent: {claude_agent.name}")

Created Claude agent: computeragent-claude-3-5-sonnet-20241022


In [8]:
# Initial observation
obs, _ = await env.reset()
print("Initial observation complete")

# Agent loop with Claude
for i in range(8):
    print(f"========= Step {i + 1} ==========")
    
    try:
        action, done = await claude_agent.predict(obs)
        print(f"Agent's action: {action}")

        obs, reward, terminated, info = await env.step(action)

        if done or terminated:
            print(f"Task completed after {i + 1} steps")
            break
            
    except Exception as e:
        print(f"Error in step {i + 1}: {e}")
        break

Initial observation complete


2025-08-08 15:17:04,030 - agent.ComputerAgent - INFO - LLM processing started with 1 messages


Agent's action: [ResponseAction(type='response', reasoning='I\'ll help you complete this task step by step, but I notice that I don\'t have any input data or access to Excel through the available functions. The only function I have access to is the "computer" function which allows for basic desktop interaction.\n\nTo properly assist you, I would need:\n1. The actual input data you want to analyze\n2. Access to Excel or another spreadsheet tool to perform the calculations\n\nCould you please provide the input data and confirm if there\'s a specific way to access Excel or the data file on this system?\n\nOnce provided, I can help calculate correlations between volume and next day price changes, sort the data as specified, and format the results according to your requirements.', logs={'conversation_length': 2}, text='I\'ll help you complete this task step by step, but I notice that I don\'t have any input data or access to Excel through the available functions. The only function I have ac

## Evaluate Results

In [None]:
# Evaluate environment state
result = await env.evaluate()
print("=== Final Evaluation ===")
pprint(result)

In [9]:
# Clean up
await env.close()
print("Environment closed")

Environment closed


## Run OSWorld-Verified in parallel

In [None]:
from agent.integrations.hud import run_job
from hud import load_taskset
import logging

# Load taskset
taskset = await load_taskset("OSWorld-Verified")
taskset = taskset[:10] # limit to 10 tasks instead of all 370

# Run benchmark job
job = await run_job(
    model="openai/computer-use-preview",
    task_or_taskset=taskset,
    job_name="test-computeragent-job",
    max_concurrent_tasks=5,
    # add any extra ComputerAgent kwargs:
    verbosity=logging.INFO,  # Enable logging
    # trajectory_dir=".."       # Save trajectories locally
)

# Get results OR view them at app.hud.so
print(await job.get_analytics())
print(f"View results at: https://app.hud.so/jobs/{job.id}")