Eye2byte

Your AI coding agent can read every file in your repo.
It just can't see what's on your screen.

Screen-context sidecar for AI coding agents. Captures your screen, voice, and annotations, feeds them to any vision model, and produces structured Context Packs your coding agent can act on — via MCP.

Screen / Voice / Annotations  ──>  Vision Model + Whisper  ──>  Context Pack  ──>  Coding Agent
                                   (Ollama, Gemini,              (goal, errors,    (Claude Code,
                                    OpenRouter, Hyperbolic)       signals, next)     Codex, Gemini CLI)

Use Cases

"Debug what I'm looking at" — Capture your screen + voice-describe the bug. Your agent gets full visual context instead of you copy-pasting error messages.

"See all my monitors" — Agent captures your IDE, browser, and terminal simultaneously. Multi-monitor support: active, specific, or all displays at once.

"Annotate the problem" — Freeze your screen, draw arrows and circles on the exact bug. Agent sees precisely what you mean.

"Watch my phone" — Capture your Android device screen via ADB while developing mobile apps.

"Give remote agents eyes" — SSE server lets cloud agents, CI runners, or SSH dev boxes see your local screen. Bearer token auth included.

"Voice-first workflow" — Hold spacebar, describe what you want while looking at your screen. Agent sees + hears simultaneously.

"Monitor dashboards" — Point it at Grafana, production logs, or any dashboard. Agent captures and analyzes what's on screen.

"Context switch instantly" — Capture your screen state when switching tasks. Agent knows your new context without explanation.

"Click what you see" — OCR finds text elements with coordinates, then click/type/scroll to interact. The full see-locate-act loop for agent automation.

"What did I see before?" — Search past Context Packs by keyword. Agent recalls "last time I saw this error, the fix was X" from your observation history.

Quick Start

1. Install

pip install eye2byte[all]

Granular install options

pip install eye2byte             # Core + MCP server (Pillow + fastmcp)
pip install eye2byte[voice]      # + local voice transcription (openai-whisper)
pip install eye2byte[ui]         # + control panel (customtkinter + pystray)
pip install eye2byte[ocr]        # + coordinate-aware OCR (easyocr)
pip install eye2byte[interact]   # + mouse/keyboard automation (pyautogui)
pip install eye2byte[all]        # Everything

ffmpeg is required for voice/clips — install via your package manager.

2. Configure a vision provider

Provider	Setup	Cost
Ollama	Install Ollama, `ollama pull qwen3-vl:8b`	Free (local)
Gemini	Set `GEMINI_API_KEY` in `.env`	Free tier
OpenRouter	Set `OPENROUTER_API_KEY` in `.env`	Free models available
Hyperbolic	Set `HYPERBOLIC_API_KEY` in `.env`	Pay per use

# .env file — place in project dir, cwd, or ~/.eye2byte/.env
GEMINI_API_KEY=your-key-here

3. Run

eye2byte capture                   # Screenshot + analysis
eye2byte capture --voice           # + voice narration
eye2byte capture --mode window     # Active window only
eye2byte-ui                        # Launch control panel

Or run the scripts directly:

python eye2byte.py capture
python eye2byte_ui.py

How It Works

Eye2byte sits between your screen and your coding agent.

Capture — takes a screenshot (full screen, window, region, or all monitors), optionally records voice and annotations
Process — optimizes the image (~5x smaller, zero quality loss), cleans audio (noise removal + normalization), transcribes speech locally via Whisper
Analyze — sends everything to your chosen vision model
Output — produces a structured Context Pack the agent can act on

Context Pack Format

Every analysis produces a markdown document with structured sections:

## Goal           — what the user appears to be doing
## Environment    — OS, editor, repo, branch, language
## Screen State   — visible panels, files, terminal output
## Signals        — verbatim errors, stack traces, warnings
## Likely Situation — what's probably happening
## Suggested Next Info — what a coding agent needs next

The agent receives this as actionable context — not a raw image dump.

MCP Integration

Eye2byte exposes 11 tools via the Model Context Protocol. Any MCP-compatible agent can use them.

Tool	What it does	Install
`capture_and_summarize`	Screenshot + vision analysis (monitor selection, delay, window targeting)	core
`capture_with_voice`	Screenshot + voice recording + transcription + analysis	core
`record_clip_and_summarize`	Screen clip with keyframe extraction and sequence analysis	core
`summarize_screenshot`	Analyze an existing image file	core
`transcribe_audio`	Local Whisper transcription of any audio file	core
`get_recent_context`	Retrieve recent Context Pack summaries	core
`get_screen_elements`	OCR with coordinates — find text elements and their screen positions	`[ocr]`
`search_context_history`	Full-text search across all past Context Packs	core
`click_element`	Click at screen coordinates (from `get_screen_elements` output)	`[interact]`
`type_text`	Type text at current cursor position	`[interact]`
`press_key`	Press keyboard key or combo (e.g. "ctrl+a", "enter")	`[interact]`
`scroll_screen`	Scroll at a screen position	`[interact]`

OpenClaw

Eye2byte works with OpenClaw out of the box. Add to your openclaw.json:

{
  "mcpServers": {
    "eye2byte": {
      "command": "python",
      "args": ["eye2byte_mcp.py"]
    }
  }
}

Now your OpenClaw can see your screen from any channel — WhatsApp, Telegram, Slack, Discord. An Eye2byte skill is also available on ClawHub.

Local agents (stdio)

For agents running on the same machine (Claude Code, Codex CLI, etc.). Add to .mcp.json:

{
  "mcpServers": {
    "eye2byte": {
      "command": "python",
      "args": ["C:/path/to/eye2byte_mcp.py"]
    }
  }
}

That's it. The agent auto-starts the server. Use full absolute paths.

Remote agents (SSE)

For agents on a different machine (cloud VM, SSH dev box, CI runner).

On your local machine (the one with the screen):

python eye2byte_mcp.py --sse                         # No auth (LAN only)
python eye2byte_mcp.py --sse --token mysecret123     # Bearer token auth
python eye2byte_mcp.py --sse --port 9000 --token abc # Custom port + auth

On the remote machine (where the agent runs) — add to MCP config:

{
  "mcpServers": {
    "eye2byte": {
      "url": "http://YOUR_LOCAL_IP:8808/sse",
      "headers": {"Authorization": "Bearer mysecret123"}
    }
  }
}

Omit headers if the server was started without --token.

Firewall note (Windows)

netsh advfirewall firewall add rule name="Eye2byte MCP" dir=in action=allow protocol=TCP localport=8808

Find your local IP: ipconfig (Windows) or ip addr (Linux/macOS).

Multi-monitor

capture_and_summarize(monitor=0)    # active monitor (default)
capture_and_summarize(monitor=1)    # first monitor
capture_and_summarize(monitor=2)    # second monitor
capture_and_summarize(monitor=-1)   # ALL monitors at once

Control Panel

eye2byte-ui          # or: python eye2byte_ui.py

A small always-on-top floating panel. Drag it anywhere. Global hotkeys work even when the panel isn't focused.

Global Hotkeys (Windows)

Hotkey	Action
`Ctrl+Shift+1`	Capture screenshot (uses current mode)
`Ctrl+Shift+2`	Annotate (freeze screen, open drawing overlay)
`Ctrl+Shift+3`	Toggle voice recording
`Ctrl+Shift+5`	Grab clipboard image

All shortcuts are customizable from Settings > Keyboard Shortcuts.

Panel Controls

Control	Action
`Space` (hold)	Push-to-talk — hold to record, release to stop
Mode selector	Cycle between Full Screen / Window / Region
Settings	Provider, model, image quality, shortcuts, cleanup
Copy @path	Copy session path for `@`-mentioning in chat

Settings Tabs

Tab	What you configure
Provider	Vision provider, model selection, API keys
Media	Image quality, max size, voice cleaning
Shortcuts	All keyboard shortcuts with key capture UI
Maintenance	Auto-cleanup days, cache management

Features Reference

Annotation Overlay

Press Ctrl+Shift+2 or click Annotate to freeze the screen and draw on it.

Key	Tool	How to use
`X`	Arrow	Click and drag
`C`	Circle	Click and drag
`V`	Rectangle	Click and drag
`B`	Freehand	Click and drag
`T`	Text	Click to place, type your text

Action	How
Save	`Enter` — commits annotations, sends to vision model
Cancel	`Escape` — discards all annotations
Undo	Right-click near an annotation to remove it
Newline	`Shift+Enter` (Enter alone commits)
Multi-line	Text box auto-grows up to 6 lines

Voice Recording

Three ways to record:

Method	How
Toggle	`Ctrl+Shift+3` to start, press again to stop
Push-to-talk	Hold `Space` while panel is focused
Mouse PTT	Hold click on the Record button

While recording, any captures you take are automatically bundled with the voice note into a single session.

Platforms

Platform	Screenshot	Voice	Annotation	Hotkeys
Windows	PowerShell .NET	ffmpeg	Pillow	Ctrl+Shift+1-5
macOS	screencapture	ffmpeg	Pillow	—
Linux	scrot/maim/flameshot	ffmpeg	Pillow	—
Android	ADB (Termux)	Termux:API	—	—

Configuration

Config file: ~/.eye2byte/config.json (created on first run or via eye2byte init)

Setting	Default	Description
`provider`	`"ollama"`	Vision provider: ollama, gemini, openrouter, hyperbolic
`model`	`"auto"`	Model name or "auto" for auto-detection
`voice_clean`	`true`	Noise removal + pause trimming + volume normalization
`auto_cleanup_days`	`7`	Delete old captures/summaries after N days (0 = disabled)
`image_max_size`	`1920`	Max image dimension before LLM processing
`image_quality`	`90`	JPEG quality (1-100)

Files

File	Purpose
`eye2byte.py`	Core engine — capture, voice, clip, summarize
`eye2byte_ui.py`	Control panel with hotkeys and annotation overlay
`eye2byte_mcp.py`	MCP server for coding agent integration
`eye2byte_ocr.py`	Coordinate-aware OCR via easyocr
`eye2byte_interact.py`	Mouse/keyboard automation via pyautogui
`eye2byte_history.py`	Searchable context history via SQLite FTS5

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.claude		.claude
.serena		.serena
docs		docs
openclaw-skill/eye2byte		openclaw-skill/eye2byte
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
README.md		README.md
eye2byte.ico		eye2byte.ico
eye2byte.png		eye2byte.png
eye2byte.py		eye2byte.py
eye2byte_history.py		eye2byte_history.py
eye2byte_interact.py		eye2byte_interact.py
eye2byte_mcp.py		eye2byte_mcp.py
eye2byte_ocr.py		eye2byte_ocr.py
eye2byte_ui.py		eye2byte_ui.py
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eye2byte

Use Cases

Quick Start

1. Install

2. Configure a vision provider

3. Run

How It Works

Context Pack Format

MCP Integration

OpenClaw

Local agents (stdio)

Remote agents (SSE)

Multi-monitor

Control Panel

Global Hotkeys (Windows)

Panel Controls

Settings Tabs

Features Reference

Annotation Overlay

Voice Recording

Platforms

Configuration

Files

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

wolverin0/Eye2byte

Folders and files

Latest commit

History

Repository files navigation

Eye2byte

Use Cases

Quick Start

1. Install

2. Configure a vision provider

3. Run

How It Works

Context Pack Format

MCP Integration

OpenClaw

Local agents (stdio)

Remote agents (SSE)

Multi-monitor

Control Panel

Global Hotkeys (Windows)

Panel Controls

Settings Tabs

Features Reference

Annotation Overlay

Voice Recording

Platforms

Configuration

Files

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages