Midscene.js

Your AI Operator for Web, Android, Automation & Testing.

Midscene.js allows AI to serve as your web and Android operator 🤖. Simply describe what you want to achieve in natural language, and it will assist you in operating the interface, validating content, and extracting data. Whether you seek a quick experience or in-depth development, you'll find it easy to get started.

Showcases

The following recorded example video is based on the UI-TARS-1.5-7B model, and the video has not been sped up at all~

Instruction	Video
Post a Tweet	twitter-video-1080p.mp4
Use JS code to drive task orchestration, collect information about Jay Chou's concert, and write it into Google Docs	google-doc-1080p.mp4

📢 2025 Feb: New open-source model choice - UI-TARS and Qwen2.5-VL

Besides the default model GPT-4o, we have added two new recommended open-source models to Midscene.js: UI-TARS and Qwen2.5-VL. (Yes, Open Source models !) They are dedicated models for image recognition and UI automation, which are known for performing well in UI automation scenarios. Read more about it in Choose a model.

💡 Features

Natural Language Interaction 👆: Just describe your goals and steps, and Midscene will plan and operate the user interface for you.
Chrome Extension Experience 🖥️: Start in-browser experience immediately through the Chrome extension, no coding required.
Puppeteer/Playwright Integration 🔧: Supports Puppeteer and Playwright integration, allowing you to combine AI capabilities with these powerful automation tools for easy automation.
Support Open-Source Models 🤖: Supports private deployment of UI-TARS and Qwen2.5-VL, which outperforms closed-source models like GPT-4o and Claude in UI automation scenarios while better protecting data security.
Support General Models 🌟: Supports general large models like GPT-4o and Claude, adapting to various scenario needs.
Visual Reports for Debugging 🎞️: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
Support Caching 🔄: The first time you execute a task through AI, it will be cached, and subsequent executions of the same task will significantly improve execution efficiency.
Completely Open Source 🔥: Experience a whole new automation development experience, enjoy!
Understand UI, JSON Format Responses 🔍: You can specify data format requirements and receive responses in JSON format.
Intuitive Assertions 🤔: Express your assertions in natural language, and AI will understand and process them.

✨ Model Choices

You can use general-purpose LLMs like gpt-4o, it works well for most cases. And also, gemini-1.5-pro, qwen-vl-max-latest are supported.
You can also use UI-TARS model, which is an open-source model dedicated for UI automation. You can deploy it on your own server, and it will dramatically improve the performance and data privacy.
Read more about Choose a model

👀 Comparing to ...

There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?

Debugging Experience: You will soon find that debugging and maintaining automation scripts is the real challenge point. No matter how magic the demo is, you still need to debug the process to make it stable over time. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to debug the entire process. This is what most developers really need. And we're continuing to work on improving the debugging experience.
Open Source, Free, Deploy as you want: Midscene.js is an open-source project. It's decoupled from any cloud service and model provider, you can choose either public or private deployment. There is always a suitable plan for your business.
Integrate with Javascript: You can always bet on Javascript 😎

📄 Resources

Home Page: https://midscenejs.com
Web Browser Automation
- Quick Experience By Chrome Extension, this is where you should get started
- Automate with Scripts in YAML, use this if you prefer to write YAML file instead of code
- Bridge Mode by Chrome Extension, use this to control the desktop Chrome by scripts
- Integrate with Puppeteer
- Integrate with Playwright
Android Automation
- Quick Experience by Android Playground
- Integrate with Android(adb)
API Reference
Choose a model
Config Model and Provider

🤝 Community

📝 Credits

We would like to thank the following projects:

Rsbuild for the build tool.
UI-TARS for the open-source agent model UI-TARS.
Qwen2.5-VL for the open-source VL model Qwen2.5-VL.
scrcpy and yume-chan allow us to control Android devices with browser.
appium-adb for the javascript bridge of adb.
YADB for the yadb tool which improves the performance of text input.

Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Zhou, Xiao and Yu, Tao},
  title = {Midscene.js: Let AI be your browser operator.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

📝 License

Midscene.js is MIT licensed.

If this project helps you or inspires you, please give us a ⭐️

Name		Name	Last commit message	Last commit date
Latest commit History 458 Commits
.github		.github
.husky		.husky
.vscode		.vscode
apps		apps
packages		packages
scripts		scripts
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
biome.json		biome.json
commitlint.config.js		commitlint.config.js
cspell.config.cjs		cspell.config.cjs
nx.json		nx.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Midscene.js

Showcases

📢 2025 Feb: New open-source model choice - UI-TARS and Qwen2.5-VL

💡 Features

✨ Model Choices

👀 Comparing to ...

📄 Resources

🤝 Community

📝 Credits

Citation

📝 License

About

Releases 31

Packages

Contributors 29

Languages

License

web-infra-dev/midscene

Folders and files

Latest commit

History

Repository files navigation

Midscene.js

Showcases

📢 2025 Feb: New open-source model choice - UI-TARS and Qwen2.5-VL

💡 Features

✨ Model Choices

👀 Comparing to ...

📄 Resources

🤝 Community

📝 Credits

Citation

📝 License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 31

Packages 0

Contributors 29

Languages

Packages