Desktop automation CLI for macOS. Controls mouse, keyboard, takes screenshots, finds text via OCR, and automates Chrome via CDP (Chrome DevTools Protocol).
cargo build --releaseBinary: target/release/crusty
# Take a screenshot
crusty screenshot --logical -o shot.png
# Find text on screen (OCR) and get click coordinates
crusty find-text "Submit"
# Output: Submit 450 312
# Click at those coordinates
crusty mouse move-to 450 312
crusty mouse click left
# Type text
crusty keyboard type "hello world"
# Open Chrome and navigate
crusty browser open "https://example.com"crusty screenshot # Print base64 PNG to stdout
crusty screenshot -o shot.png # Save to file
crusty screenshot --logical # 1 pixel = 1 mouse coordinate
crusty screenshot --grid 50 # Overlay coordinate grid
crusty screenshot --max-dim 1280 # ResizeUses macOS Vision framework to find text on screen and return exact click coordinates.
crusty find-text "About This Mac"
# Output: About This Mac 94 58 (text, center_x, center_y in logical px)crusty mouse move-to 500 300 # Absolute move (logical px)
crusty mouse click left # Click (left/right/middle)
crusty mouse position # Print current position
crusty mouse scroll -3 # Scroll downcrusty keyboard type "hello world" # Type text
crusty keyboard combo "meta+c" # Key combination
crusty keyboard tap return # Single key tapcrusty browser open # Launch Chrome with CDP
crusty browser open "https://x.com" # Launch and navigate
crusty browser navigate "https://x.com" # Navigate active tab
crusty browser tabs # List open tabs
crusty browser eval "document.title" # Execute JavaScript
crusty browser find-text "Post" # Find element by text
crusty browser find-selector ".btn" # Find element by CSScrusty permissions # Check macOS permissions
crusty onboard # Interactive config wizard
crusty run "open browser and go to google" # Agent mode (requires LLM config)
crusty mcp # Start MCP server on stdiocrusty mcp exposes desktop automation tools over stdio for use with Claude Code or other MCP clients.
{
"mcpServers": {
"crusty": {
"command": "/path/to/crusty",
"args": ["mcp"]
}
}
}Crusty includes a Claude Code skill at .claude/skills/crusty/SKILL.md for desktop automation directly from Claude Code. The workflow:
- Screenshot to see the screen
find-textto measure exact coordinates via OCRmouse move-to+mouse clickat the measured coordinates- Screenshot to verify
Config file: ~/.crusty/config.toml
[browser]
profile = "claude"
cdp_port = 9222
[desktop]
screenshot_max_dim = 1280
action_delay_ms = 100macOS uses logical pixel coordinates. On Retina (2x) displays, physical pixels are 2x logical. All mouse commands use logical pixels. Use --logical flag on screenshots to get images where 1 pixel = 1 mouse coordinate.
- macOS (Accessibility + Screen Recording permissions)
- Rust toolchain
- Chrome (for browser automation)
MIT