# ComputerAgent HUD Integration for OSWorld

This notebook demonstrates how to use the ComputerAgent with HUD for OSWorld benchmarking.
The ComputerAgent integration provides the same interface as OperatorAgent but works with both Claude and OpenAI models.

In [None]:
# # Install dependencies if needed
# !uv venv 
# !source .venv/bin/activate
# !uv sync

In [1]:
# Required environment variables:
# - HUD_API_KEY (for HUD access)
# - ANTHROPIC_API_KEY (for Claude models)
# - OPENAI_API_KEY (for OpenAI models)

from hud import gym, load_taskset
from pprint import pprint
import asyncio

In [2]:
# Import the HUD-integrated ComputerAgent
from agent.integrations.hud import ComputerAgent

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Load OSWorld taskset
taskset = await load_taskset("OSWorld-Verified")
print(f"Total tasks in OSWorld: {len(taskset)}")

# Select a test task
test = taskset[144]
print(f"Task prompt: {test.prompt}")

Total tasks in OSWorld: 367
Task prompt: Make the background color of slide 2 same as the color of its title.


In [4]:
# Create environment (takes ~2.5 minutes to start)
env = await gym.make(test)
print("Environment ready!")

[ERROR] 2025-08-08 12:42:12,634 | hud.exceptions | HTTP error from HUD SDK: Request failed: Environment is in error state, cannot invoke functions | URL: https://orchestration.hud.so/hud-gym/api/v2/environments/525ea26c-096d-41bc-b968-54c62a7f1b9d/invoke | Status: 400 | Response: {"detail":"Environment is in error state, cannot invoke functions"}


GymMakeException: Failed to create environment | Data: {'gym_name': 'OSWorld-Ubuntu', 'environment_prompt': None, 'exception': 'Request failed: Environment is in error state, cannot invoke functions | Status: 400 | Response Text: {"detail":"Environment is in error state, cannot invoke functions"} | Response JSON: {\'detail\': \'Environment is in error state, cannot invoke functions\'} | Headers: {\'content-length\': \'67\', \'content-type\': \'application/json\', \'date\': \'Fri, 08 Aug 2025 16:42:11 GMT\', \'server\': \'railway-edge\', \'x-railway-edge\': \'railway/us-east4\', \'x-railway-request-id\': \'cH9FJpMKQIGTcIome6l53A\'}'}

## Test with Claude Model

The ComputerAgent can use Claude models just like the original ClaudeAgent:

In [None]:
# Create ComputerAgent with Claude
claude_agent = ComputerAgent(
    model="anthropic/claude-3-5-sonnet-20241022",
    environment="linux",  # OSWorld typically uses Linux
)

print(f"Created Claude agent: {claude_agent.name}")

In [None]:
# Initial observation
obs, _ = await env.reset()
print("Initial observation complete")

# Agent loop with Claude
for i in range(8):
    print(f"========= Step {i + 1} ==========")
    
    try:
        action, done = await claude_agent.predict(obs)
        print(f"Agent's action: {action}")

        obs, reward, terminated, info = await env.step(action)

        if done or terminated:
            print(f"Task completed after {i + 1} steps")
            break
            
    except Exception as e:
        print(f"Error in step {i + 1}: {e}")
        break

## Test with OpenAI Model

The same ComputerAgent can also use OpenAI models:

In [None]:
# Reset environment for OpenAI test
await env.reset()

# Create ComputerAgent with OpenAI
openai_agent = ComputerAgent(
    model="openai/computer-use-preview",
    environment="linux",
)

print(f"Created OpenAI agent: {openai_agent.name}")

In [None]:
# Initial observation
obs, _ = await env.reset()
print("Initial observation complete")

# Agent loop with OpenAI
for i in range(8):
    print(f"========= Step {i + 1} ==========")
    
    try:
        action, done = await openai_agent.predict(obs)
        print(f"Agent's action: {action}")

        obs, reward, terminated, info = await env.step(action)

        if done or terminated:
            print(f"Task completed after {i + 1} steps")
            break
            
    except Exception as e:
        print(f"Error in step {i + 1}: {e}")
        break

## Evaluate Results

In [None]:
# Evaluate environment state
result = await env.evaluate()
print("=== Final Evaluation ===")
pprint(result)

In [None]:
# Clean up
await env.close()
print("Environment closed")

## Comparison with Original Agents

The ComputerAgent provides the same interface as ClaudeAgent and OperatorAgent:

In [None]:
# Compare with original HUD agents
from hud.agent import ClaudeAgent, OperatorAgent

# Original agents
original_claude = ClaudeAgent()
original_operator = OperatorAgent(environment="linux")

# ComputerAgent versions
computer_claude = ComputerAgent(model="anthropic/claude-3-5-sonnet-20241022", environment="linux")
computer_openai = ComputerAgent(model="openai/computer-use-preview", environment="linux")

print("Original agents:")
print(f"  ClaudeAgent: {original_claude.name}")
print(f"  OperatorAgent: {original_operator.name}")
print("\nComputerAgent versions:")
print(f"  ComputerAgent (Claude): {computer_claude.name}")
print(f"  ComputerAgent (OpenAI): {computer_openai.name}")

print("\nAll agents have the same interface and can be used interchangeably!")