OpenOperator - Open-Source LLM Agent for Web Operations in Browser

OpenOperator is an LLM agent that uses web browser to complete tasks, originally based on BrowserUse. This fork is mainly focused on moving the agent backend to LangGraph and LangChain.

Key Features

Vision-Based Navigation: The agent fully relies on vision inputs to navigate web pages. Vision-first approach gives predictable results even with JS-heavy websites.
File Handling: Introduced file handling capabilities, currently supporting PDF files. With vision-driven reasoning, the agent uses universal workflow to access and read data with any layout.
Use with your favorite LLM: Supports any LLM that can handle tool calling and multimodal inputs.

Changes from BrowserUse

Refactored Agent Backbone: Transitioned from vanilla Python to LangGraph and LangChain.
Moved to Full-Page Viewport: Implements full-page screenshots to minimize the need for scrolling tools.
Added File Handling: Currently supports only PDF, but adding handlers is easy.
Adjustable Focus: Agent architecture lets you adjust prompts for task categories (i.e. "shopping", "searching", "applying", etc.). Current version is focused on finding answers on given websites, generalist mode is coming soon.

Why LangGraph and LangChain?

Some developers say LangChain introduces unnecessary abstractions that add complexity to the project. However, each LLM-driven project that is built in vanilla Python eventually brings the same abstractions. It all starts with "Well, I'll just add this tiny little object to handle messages," and then you end up with some hardcore system design.

LangGraph was picked as a backbone for this fork because:

It scales well. LangGraph is really good at parallelizing tasks and completely solves race condition problems.
The LangGraph Server is a really good way to deploy your agent both for users and as a subagent for other LLM agents.
They have a huge community, they ship fast, and their documentation is good.

Prerequisites

LLM API compatible with LangChain
LangSmith account if you want observability (free for small projects)

Installation

Clone the repository
Install Playwright
Create a .env file based on the provided .env.example, setting necessary configurations like API keys, model parameters, etc.

Usage

The current usage is limited to a single URL and a single data extraction task. However, all other tools are available to the agent, and just require you to modify the prompt and graph.

After installation, you can start using OpenOperator by running the main agent script:

uv run src/main.py --url <URL> --query <QUERY>

This will initialize the agent, set up the browser context, and begin the workflow to extract data from specified websites.

Acknowledgements

This project is a fork of BrowserUse. It is a great project, and I'm grateful for the work done by the original authors. I hope this fork will help the original project with inspiration, ideas, and adoption.

Playwright and Chromium are a killer combination for web automation.

LangGraph and LangChain are a great way to build LLM agents.

LangSmith is a great way to observability.

MyPyPDF2 is a great way to handle PDF files.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your enhancements.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 694 Commits
openoperator		openoperator
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE.md		LICENSE.md
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenOperator - Open-Source LLM Agent for Web Operations in Browser

Key Features

Changes from BrowserUse

Why LangGraph and LangChain?

Prerequisites

Installation

Usage

Acknowledgements

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

NickSherrow/openoperator

Folders and files

Latest commit

History

Repository files navigation

OpenOperator - Open-Source LLM Agent for Web Operations in Browser

Key Features

Changes from BrowserUse

Why LangGraph and LangChain?

Prerequisites

Installation

Usage

Acknowledgements

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages