Skip to content
View handgpt's full-sized avatar

Block or report handgpt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
handgpt/README.md

HandGPT

HandGPT Logo

HandGPT is an innovative tool designed for developing and testing Large Language Models (LLMs) to control devices such as smartphones and computers. The project combines software applications and intelligent hardware to empower LLMs to interact with devices through simulated vision and action.

Recently, HandGPT has integrated OpenAI's new Operator model, a powerful AI agent capable of automating web tasks through simulated browsing, clicking, and form filling. With this integration, HandGPT enables users to leverage Operator’s advanced capabilities for seamless device and web interaction. Tasks such as submitting reports, managing online services, and executing complex web-based workflows are now achievable through HandGPT’s platform, further enhancing productivity and usability.


Key Features

  • Cross-Platform Support: Start with iOS, with Android, macOS, and Windows apps in development.
  • Device Control via LLMs: Enable LLMs to "see" the device screen and provide action commands.
  • Integrated Intelligent Hardware: A compact Bluetooth mouse and keyboard combination that allows LLMs to interact with devices physically.
  • Web Automation with Operator: Automate web tasks using OpenAI’s Operator, empowering users to navigate the internet and execute actions autonomously.
  • Developer-Friendly APIs: Simplified APIs to facilitate quick development and testing of LLM-based agents for device control.
  • Customizable Agent Development: Freedom to define agent capabilities and explore the performance of different models.

Why HandGPT?

LLMs are poised to revolutionize productivity across industries, yet realizing practical applications remains challenging. HandGPT focuses on one of the most impactful and scalable AI applications—enabling AGI to control devices.

Importance of Device and Web Control

  • Massive Usage Potential: Devices like smartphones dominate user interaction time and numbers, presenting unparalleled opportunities for AI.
  • Core AI Application: Comparable in significance to embodied robotics, device and web control is a fundamental AI application.
  • Collaborative Innovation: By uniting developers and fostering innovation, HandGPT aims to accelerate the integration of AI into daily life and generate significant societal and economic value.

Challenges

While LLMs promise seamless device and web control, significant challenges remain:

  1. Information Gain:

    • Human instructions are often simple yet encode complex implicit information.
    • For example, a command like "move from A to B" in autonomous driving implies avoiding accidents, traffic violations, and battery depletion. Device control involves even more diverse and nuanced implicit information.
  2. Spatial Understanding:

    • LLMs exhibit advanced image recognition but lack mature spatial reasoning, critical for device control.
  3. Logical Reasoning:

    • Effective device control demands robust reasoning capabilities, but longer reasoning chains increase error rates.

Getting Started

Installation and Usage

Install HandGPT using pip:

pip install handgpt

Refer to the Getting Started Guide for detailed setup instructions.

Intelligent Hardware Setup

For hardware setup and configuration, visit the Hardware Guide.


Examples

HandGPT provides example agents to help developers get started quickly. For instance, the Post to Twitter example demonstrates how to automate posting tweets using an LLM-powered agent. Watch the demo video:

Post to Twitter Demo

Click the image above to watch the video on YouTube.


Roadmap

  • Upcoming Features:

    • Android, macOS, and Windows platform support.
    • Agent Marketplace: Allow developers to monetize their agents through a marketplace for end-users.
    • Identity: Assign unique identities to agents, akin to human identification.
  • Ongoing Development:

    • Improving spatial reasoning and implicit information processing.
    • Enhancing LLM reasoning capabilities for complex control tasks.

Contributing

We welcome contributions to HandGPT! Please check out our Contribution Guidelines for information on how to get started.


Support


License

HandGPT is licensed under the MIT License. See the LICENSE file for more details.


Intelligent Hardware

Purchase the intelligent hardware for full functionality: https://handgpt.app


Acknowledgments

HandGPT is a collective effort by a passionate team dedicated to advancing AI applications. We thank our contributors and supporters for helping bring this vision to life.

Popular repositories Loading

  1. HandGPT HandGPT Public

    Python 1