GUI Agent Skill

Mobile GUI automation extension for Claude Code and Codex. It controls Android phones/emulators for multi-step task execution, state capture, and normalized JSON output.

Docs: English | 中文

Highlights

Multi-provider support: local (Ollama GELab), stepfun, zhipu, qwen
Stateful sessions: run execute then continue with session_id
Stateless mode: one-shot minimal actions without local session persistence
Runtime timeout control: --timeout-sec on execute and continue
Direct coordinate tap mode: model-free tap/click using ADB
Unified outputs: session_id, next_action, caption, screenshot_path, timeout metadata

Install

cd D:\project\gui_agent_skill
python install.py

Default install target is --target auto:

If ~/.claude and/or ~/.codex already exist, install to detected targets.
If neither exists, install to both by default.

Specify target explicitly:

python install.py --target claude
python install.py --target codex
python install.py --target both

Set default provider and keys during install:

python install.py --provider zhipu --zhipu-api-key "your-zhipu-api-key"
python install.py --provider qwen --dashscope-api-key "your-dashscope-api-key"
python install.py --provider local --non-interactive
# Providerless mode (Codex controls coordinates directly):
python install.py --tap-only --non-interactive

--tap-only enables providerless mode and disables execute/continue; use tap/click for step-by-step control.

After installing to Codex, restart Codex so new prompts/skills are loaded.

Prerequisites

Python 3.10+
gui_agent_forge
Android adb (platform-tools)
At least one connected Android device/emulator

Check ADB:

adb devices

If adb is not in PATH, configure device.adb_path in ~/.gui_agent_skill/config.yaml.

Quick Start

Claude Code

/gui-agent:execute --task "Open WeChat and enter chat list"
/gui-agent:continue --reply "Select the first contact"
/gui-agent:status
/gui-agent:config

Codex

Use CLI commands directly (recommended):

# Stateful flow
python -m gui_agent_skill.cli execute --task "Open WeChat and enter chat list" --provider local --timeout-sec 60
python -m gui_agent_skill.cli continue --reply "Select the first contact" --timeout-sec 60

# Stateless flow (run execute repeatedly)
python -m gui_agent_skill.cli execute --task "Open WeChat search" --stateless --timeout-sec 45
python -m gui_agent_skill.cli execute --task "Search AI and sample top 3 official-account posts" --stateless --timeout-sec 45

# Direct coordinate tap
python -m gui_agent_skill.cli tap --x 0.5 --y 0.82 --coord-space ratio --timeout-sec 20

You can also mention $gui-agent-mobile in conversation to trigger the skill workflow.

CLI Commands

python cli.py execute --task "task" [--provider local] [--device-id ID] [--max-steps 20] [--stateless] [--timeout-sec 60]
python cli.py continue [--session-id ID] [--reply "text"] [--task "task"] [--timeout-sec 60]
python cli.py status [--device-id ID]
python cli.py tap --x 0.5 --y 0.82 --coord-space ratio [--timeout-sec 20]
python cli.py devices
python cli.py sessions
python cli.py providers

Notes:

execute / continue / status / tap all validate device connectivity before running.
If no ADB devices are connected, CLI returns a clear error with USB-debugging guidance.
When tap_only_mode=true in config, execute and continue return a clear error and only direct coordinate mode is allowed.

Output Schema (Example)

{
  "success": true,
  "session_id": "abc12345",
  "task": "Open WeChat",
  "provider": "local",
  "device_id": "emulator-5554",
  "step_count": 1,
  "caption": "WeChat home screen is visible with bottom tabs.",
  "screenshot_path": "~/.gui_agent_skill/outputs/abc12345/screenshot.png",
  "next_action": "continue",
  "current_app": "com.tencent.mm/.ui.LauncherUI",
  "message": "Task in progress. Current state: ..."
}

Practical Demo Scenarios

WeChat Daily Official Account Trend Analysis (read-only)
Xiaohongshu Keyword Content Research (read-only)
Cross-platform Product Price Comparison (JD/Taobao/Pinduoduo)
Stable live demo pipeline (status -> execute --stateless -> tap)

Ready-made demo prompts are provided in prompt.txt.

Demo Videos and Matching Prompts

Local media/*.mp4 files are removed to keep the repository lightweight. Use Google Drive links below.

Compare Demo (Price Comparison)

Video (Google Drive): compare.mp4

Preview:

Prompt (from prompt.txt, compare scenario):

Use GUI Agent Skill to compare prices for one product across JD, Taobao, and Pinduoduo.

Product:
- "iPhone 17 128G, China version, brand new"

Required output table columns:
- Platform
- Product title
- Final price (after coupons if visible)
- Store type (official/flagship/individual)
- Estimated delivery time
- Return/refund info (if visible)
- Notes (spec mismatch risk)

Constraints:
- Stop before checkout/payment.
- Exclude non-comparable variants (activated/imported/refurbished/spec mismatch).

WeChat Demo (Daily Official Account Trend Analysis)

Video (Google Drive): wechat.mp4

Preview:

Prompt (from prompt.txt, wechat scenario):

Use GUI Agent Skill to complete a daily WeChat official-account trend scan in read-only mode.

Goal:
- Collect article samples and produce a daily trend summary.

Keywords:
- AI Agent
- Cross-border e-commerce
- Private-domain operations

Required output:
- Top 3 high-frequency themes
- Common title patterns
- 3 follow-up content angles

Constraints:
- Read-only. No like/comment/share/follow.
- Use execute --stateless step by step with timeout on each call.

Configuration

User config path: ~/.gui_agent_skill/config.yaml

Common fields:

default_provider
tap_only_mode
default_device_id
default_operation_timeout_sec
providers.<name>.api_key
output/session settings

Uninstall

python install.py --uninstall
python install.py --uninstall --target both

Maintenance

If you make major capability changes, update both AGENTS.md and README.md.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents/gui-agent		agents/gui-agent
codex/prompts		codex/prompts
commands/gui-agent		commands/gui-agent
config		config
core		core
gui_agent_skill		gui_agent_skill
skills/gui-agent		skills/gui-agent
.DS_Store		.DS_Store
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
README.zh.md		README.zh.md
cli.py		cli.py
install.py		install.py
prompt.txt		prompt.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GUI Agent Skill

Highlights

Install

Prerequisites

Quick Start

Claude Code

Codex

CLI Commands

Output Schema (Example)

Practical Demo Scenarios

Demo Videos and Matching Prompts

Compare Demo (Price Comparison)

WeChat Demo (Daily Official Account Trend Analysis)

Configuration

Uninstall

Maintenance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GUI Agent Skill

Highlights

Install

Prerequisites

Quick Start

Claude Code

Codex

CLI Commands

Output Schema (Example)

Practical Demo Scenarios

Demo Videos and Matching Prompts

Compare Demo (Price Comparison)

WeChat Demo (Daily Official Account Trend Analysis)

Configuration

Uninstall

Maintenance

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages