Skip to content

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

License

Notifications You must be signed in to change notification settings

nanshan-bladesman/UI-TARS-desktop

This branch is 62 commits behind bytedance/UI-TARS-desktop:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fb2932a Β· Mar 23, 2025
Mar 17, 2025
Mar 21, 2025
Jan 20, 2025
Feb 2, 2025
Mar 23, 2025
Mar 19, 2025
Mar 19, 2025
Mar 21, 2025
Feb 23, 2025
Feb 20, 2025
Jan 22, 2025
Jan 20, 2025
Jan 20, 2025
Jan 20, 2025
Jan 21, 2025
Mar 19, 2025
Jan 21, 2025
Jan 20, 2025
Jan 20, 2025
Mar 19, 2025
Feb 20, 2025
Mar 19, 2025
Mar 19, 2025
Jan 21, 2025
Mar 19, 2025
Feb 23, 2025
Feb 20, 2025
Mar 19, 2025
Mar 22, 2025
Mar 19, 2025
Mar 16, 2025
Mar 19, 2025
Mar 16, 2025
Jan 21, 2025

Repository files navigation

Important

[2025-03-18] We released a technical preview version of a new desktop app - Agent TARS, a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.

UI-TARS

UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.

   πŸ“‘ Paper    | πŸ€— Hugging Face Models   |   πŸ«¨ Discord   |   πŸ€– ModelScope  
πŸ–₯️ Desktop Application    |    πŸ‘“ Midscene (use in browser)

Showcases

Instruction Video
Get the current weather in SF using the web browser
new_mac_action_weather.mp4
Send a twitter with the content "hello world"
new_send_twitter_windows.mp4

News

  • [2025-02-20] - πŸ“¦ Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - πŸš€ We updated the Cloud Deployment section in the δΈ­ζ–‡η‰ˆ: GUIζ¨‘εž‹ιƒ¨η½²ζ•™η¨‹ with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Features

  • πŸ€– Natural language control powered by Vision-Language Model
  • πŸ–₯️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • πŸ’» Cross-platform support (Windows/MacOS)
  • πŸ”„ Real-time feedback and status display
  • πŸ” Private and secure - fully local processing

Quick Start

See Quick Start.

Deployment

See Deployment.

Contributing

See CONTRIBUTING.md.

SDK (Experimental)

See @ui-tars/sdk

License

UI-TARS Desktop is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation πŸ“

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}

About

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 94.0%
  • JavaScript 3.1%
  • Less 0.9%
  • SCSS 0.9%
  • HTML 0.6%
  • CSS 0.3%
  • Other 0.2%