- 2026-05-17: We release the code of AgentHijack. Check it out!
- 2026-05-01: AgentHijack is accepted to ICML 2026!
This repository is built on OSWorld, ref to it for installation. We recommend using VMware/Docker to run experiments, as these have been verified by us.
If you wish to run the baseline agent used in our paper, you can execute the following command, using GPT-4o under pop_ups as an example:
python run.py --path_to_vm vmware_vm_data/Ubuntu0/Ubuntu0.vmx --headless --observation_type screenshot --model openai/chatgpt-4o-latest --noise_type pop_ups --result_dir ./resultsThe results, which include screenshots, actions and summaries of the agent's task completion, will be saved in the ./results directory in this case. You can then run the following command to obtain the result:
python show_result.pyFor convenience, we utilize OpenRouter to integrate the APIs of different LLMs, write your api_key or change it to other interface in mm_agents/agent.py.
Client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="", # put your api_key here
)We provide the deployment code in /vllm_server, deploy corresponding agents before running experiments. For UI-TARS-7B-DPO/UI-TARS-1.7-7B models, we recommend use 1×A100 GPU. For UI-TARS-72B-DPO, 4×A100 GPUs are needed for inference.
nohup bash vllm_server/ui-tars-1.5-7b.sh > server.log &After successful deployment, run the following command to obtain the result:
python run_uitars.py --path_to_vm vmware_vm_data/Ubuntu0/Ubuntu0.vmx --headless --observation_type screenshot --model ui-tars --noise_type pop_ups --result_dir ./resultsYou can also use run_multienv_uitars.py for parallel execution. It should be noted that, currently, corruption "network_error" can not run in docker environment. Therefore, we recommend use vmware for network_error evaluation.
python run_multienv_uitars.py --path_to_vm "" --headless --observation_type screenshot --model ui-tars --noise_type pop_ups --num_envs 4 --result_dir ./resultsDownload the AgentHijack-Agent from huggingface, then deploy it to run evaluation experiment.
nohup bash vllm_server/agenthijack-agent.sh > server.log &python run_agenthijack_agent.py --path_to_vm vmware_vm_data/Ubuntu0/Ubuntu0.vmx --headless --observation_type screenshot --model ui-tars --noise_type pop_ups --result_dir ./resultsTo support flexible setups for different corruptions, we offer configurable parameters in YAML file /vllm_server/default.yaml. Please ref to our paper for detailed explanations of these parameters.
If you find this environment useful, please consider citing our work:
@inproceedings{sun2026agenthijack,
title = {AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions},
author = {Jingwei Sun and Jianing Zhu and Yuanyi Li and Tongliang Liu and Xia Hu and Bo Han},
booktitle = {Forty-third International Conference on Machine Learning},
year = {2026},
url = {https://openreview.net/forum?id=0H5Im3Xvuf}
}
Parts of the codes are borrowed from OSWorld and PopupAttack, we express our great thanks to them for the wonderful works.
