Important
[2025-06-25] We released a technical preview version of a CLI - Introducing Agent TARS Beta, a multimodal AI agent that aims to explore a work form that is closer to human-like task completion through rich multimodal capabilities (such as GUI Agent, Vision) and seamless integration with various real-world tools.
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.
Β Β π Paper Β Β
| π€ Hugging Face ModelsΒ Β
|   𫨠DiscordΒ Β
| Β Β π€ ModelScopeΒ Β
π₯οΈ Desktop Application Β Β
| Β Β π Midscene (use in browser) Β Β
| Β Β
Instruction | Local Operator | Remote Operator |
---|---|---|
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting. | computer-use-triple-speed.mp4 |
remote-computer-operators.mp4 |
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub? | browser-use-triple-speed.mp4 |
remote-browser-operators.mp4 |
- [2025-06-12] - π We are thrilled to announce the release of UI-TARS Desktop v0.2.0! This update introduces two powerful new features: Remote Computer Operator and Remote Browser Operatorβboth completely free. No configuration required: simply click to remotely control any computer or browser, and experience a new level of convenience and intelligence.
- [2025-04-17] - π We're thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control.
- [2025-02-20] - π¦ Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
- [2025-01-23] - π We updated the Cloud Deployment section in the δΈζη: GUI樑ει¨η½²ζη¨ with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
- π€ Natural language control powered by Vision-Language Model
- π₯οΈ Screenshot and visual recognition support
- π― Precise mouse and keyboard control
- π» Cross-platform support (Windows/MacOS/Browser)
- π Real-time feedback and status display
- π Private and secure - fully local processing
- π οΈ Effortless setup and intuitive remote operators
See Quick Start.
See Deployment.
See CONTRIBUTING.md.
See @ui-tars/sdk
UI-TARS Desktop is licensed under the Apache License 2.0.
If you find our paper and code useful in your research, please consider giving a star β and citation π
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}