Mobile GUI automation extension for Claude Code and Codex. It controls Android phones/emulators for multi-step task execution, state capture, and normalized JSON output.
- Multi-provider support:
local(Ollama GELab),stepfun,zhipu,qwen - Stateful sessions: run
executethencontinuewithsession_id - Stateless mode: one-shot minimal actions without local session persistence
- Runtime timeout control:
--timeout-seconexecuteandcontinue - Direct coordinate tap mode: model-free
tap/clickusing ADB - Unified outputs:
session_id,next_action,caption,screenshot_path, timeout metadata
cd D:\project\gui_agent_skill
python install.pyDefault install target is --target auto:
- If
~/.claudeand/or~/.codexalready exist, install to detected targets. - If neither exists, install to both by default.
Specify target explicitly:
python install.py --target claude
python install.py --target codex
python install.py --target bothSet default provider and keys during install:
python install.py --provider zhipu --zhipu-api-key "your-zhipu-api-key"
python install.py --provider qwen --dashscope-api-key "your-dashscope-api-key"
python install.py --provider local --non-interactive
# Providerless mode (Codex controls coordinates directly):
python install.py --tap-only --non-interactive--tap-only enables providerless mode and disables execute/continue; use tap/click for step-by-step control.
After installing to Codex, restart Codex so new prompts/skills are loaded.
- Python 3.10+
gui_agent_forge- Android
adb(platform-tools) - At least one connected Android device/emulator
Check ADB:
adb devicesIf adb is not in PATH, configure device.adb_path in ~/.gui_agent_skill/config.yaml.
/gui-agent:execute --task "Open WeChat and enter chat list"
/gui-agent:continue --reply "Select the first contact"
/gui-agent:status
/gui-agent:configUse CLI commands directly (recommended):
# Stateful flow
python -m gui_agent_skill.cli execute --task "Open WeChat and enter chat list" --provider local --timeout-sec 60
python -m gui_agent_skill.cli continue --reply "Select the first contact" --timeout-sec 60
# Stateless flow (run execute repeatedly)
python -m gui_agent_skill.cli execute --task "Open WeChat search" --stateless --timeout-sec 45
python -m gui_agent_skill.cli execute --task "Search AI and sample top 3 official-account posts" --stateless --timeout-sec 45
# Direct coordinate tap
python -m gui_agent_skill.cli tap --x 0.5 --y 0.82 --coord-space ratio --timeout-sec 20You can also mention $gui-agent-mobile in conversation to trigger the skill workflow.
python cli.py execute --task "task" [--provider local] [--device-id ID] [--max-steps 20] [--stateless] [--timeout-sec 60]
python cli.py continue [--session-id ID] [--reply "text"] [--task "task"] [--timeout-sec 60]
python cli.py status [--device-id ID]
python cli.py tap --x 0.5 --y 0.82 --coord-space ratio [--timeout-sec 20]
python cli.py devices
python cli.py sessions
python cli.py providersNotes:
execute/continue/status/tapall validate device connectivity before running.- If no ADB devices are connected, CLI returns a clear error with USB-debugging guidance.
- When
tap_only_mode=truein config,executeandcontinuereturn a clear error and only direct coordinate mode is allowed.
{
"success": true,
"session_id": "abc12345",
"task": "Open WeChat",
"provider": "local",
"device_id": "emulator-5554",
"step_count": 1,
"caption": "WeChat home screen is visible with bottom tabs.",
"screenshot_path": "~/.gui_agent_skill/outputs/abc12345/screenshot.png",
"next_action": "continue",
"current_app": "com.tencent.mm/.ui.LauncherUI",
"message": "Task in progress. Current state: ..."
}- WeChat Daily Official Account Trend Analysis (read-only)
- Xiaohongshu Keyword Content Research (read-only)
- Cross-platform Product Price Comparison (JD/Taobao/Pinduoduo)
- Stable live demo pipeline (
status->execute --stateless->tap)
Ready-made demo prompts are provided in prompt.txt.
Local media/*.mp4 files are removed to keep the repository lightweight. Use Google Drive links below.
Video (Google Drive): compare.mp4
Preview:
Prompt (from prompt.txt, compare scenario):
Use GUI Agent Skill to compare prices for one product across JD, Taobao, and Pinduoduo.
Product:
- "iPhone 17 128G, China version, brand new"
Required output table columns:
- Platform
- Product title
- Final price (after coupons if visible)
- Store type (official/flagship/individual)
- Estimated delivery time
- Return/refund info (if visible)
- Notes (spec mismatch risk)
Constraints:
- Stop before checkout/payment.
- Exclude non-comparable variants (activated/imported/refurbished/spec mismatch).
Video (Google Drive): wechat.mp4
Preview:
Prompt (from prompt.txt, wechat scenario):
Use GUI Agent Skill to complete a daily WeChat official-account trend scan in read-only mode.
Goal:
- Collect article samples and produce a daily trend summary.
Keywords:
- AI Agent
- Cross-border e-commerce
- Private-domain operations
Required output:
- Top 3 high-frequency themes
- Common title patterns
- 3 follow-up content angles
Constraints:
- Read-only. No like/comment/share/follow.
- Use execute --stateless step by step with timeout on each call.
User config path: ~/.gui_agent_skill/config.yaml
Common fields:
default_providertap_only_modedefault_device_iddefault_operation_timeout_secproviders.<name>.api_key- output/session settings
python install.py --uninstall
python install.py --uninstall --target bothIf you make major capability changes, update both AGENTS.md and README.md.
MIT License