🔥[2025-03-12] The code has been updated.
🔥[2025-02-21] We have released an updated version of PC-Agent. Check the paper for details. The code will be updated soon.
🔥[2024-08-23] We have released the code of PC-Agent, supporting both Mac and Windows platforms.
Download.paper.from.Chorme.mp4
Search.NBA.FMVP.and.send.to.friend.mp4
Write.an.introduction.of.Alibaba.in.Word.mp4
- PC-Agent is a multi-agent collaboration system, which can achieve automated control of productivity scenarios (e.g. Chrome, Word, and WeChat) based on user instructions.
- Active perception module designed for dense and diverse interactive elements are better adapted to the PC platform.
- The hierarchical multi-agent cooperative structure improves the success rate of more complex task sequences.
Now Windows is supported.
conda create --name pcagent python=3.10
source activate pcagent
# For Windows
pip install -r requirements.txt
git clone https://github.com/Topdu/OpenOCR.git
pip install openocr-python
Edit config.json to add your API keys and customize settings:
# API configuration
{
"vl_model_name": "GPT-4o",
"llm_model_name": "GPT-4o",
"token": "sk-...", # Replace with your actual API key
"url": "https://api.openai.com/v1"
}
- Run the run.py with your instruction and your GPT-4o api token. For example,
python run.py --instruction="Create a new doc on Word, write a brief introduction of Alibaba, and save the document."
-
Optionally, you can add specific operational knowledge via the --add_info option to help PC-Agent operate more accurately.
-
To further improve the operation efficiency of PC-Agent, you can set --disable_reflection to skip the reflection process. Note that this may reduce the success rate of the operation.
-
If the task is not very complex, you can set --simple 1 to skip the task decomposition.