EventReader

Project Overview

This repository documents an event-driven computer vision system designed for efficient computer interaction. Inspired by Anthropic's Computer Use research, the system implements an alternative approach to continuous screenshot analysis, focusing on event-based processing for reduced resource utilization.

Project Evolution & Technical Components

1. Event Reader System (`eventreader4.py`)

My initial implementation focuses on efficient screen monitoring. The system implements native OS-level screen capture using ScreenCaptureKit for macOS and Win32API for Windows environments. I developed pixel-level delta detection using NumPy-based grayscale intensity analysis, utilizing memory-mapped arrays for efficient data handling. The system implements spatial-temporal filtering through scipy.signal processing for event detection.

2. Command Control Architecture (`cursorcommander.py`)

Building on the event detection system, I developed a cursor control system that translates high-level intentions into precise movements. The implementation features command parsing using regex-based pattern matching and coordinate validation through NumPy-based boundary checking. The system implements basic screen-space transformations and integrates with external LLM services through HTTP requests.

3. Neural Inference System (`inference_server.py`)

The inference system implements machine learning capabilities using PyTorch and modern vision-language models. Running on RTX 3090 hardware, I implemented integration with BakLLaVA-1 and Ollama models for visual understanding and command generation. The system features basic API endpoints for model inference and implements error handling for robust operation.

4. State Management & Integration (`integrator.py`)

I developed an event processing system that maintains temporal context using circular buffers. The implementation features activity tracking and basic rate limiting. The system manages state transitions and implements error recovery mechanisms for stable operation.

Technical Achievements & Innovations

Vision System Architecture

The system implements an efficient alternative to screenshot-based approaches through event-driven processing. By focusing on pixel-level changes and efficient memory management, the architecture provides a foundation for lightweight computer interaction systems.

Model Integration

The system integrates with modern vision-language models, utilizing BakLLaVA's ViT-L/14 vision encoder for visual understanding. I implemented custom Ollama model serving with basic API communication and response handling, allowing for flexible model deployment.

Processing Pipeline

The system implements input processing with pixel-level delta detection and basic thresholding. I developed event stream processing using temporal filtering and spatial analysis. The system features ROI detection for focused processing of relevant screen regions.

Technologies & Dependencies

PyTorch with CUDA support
FastAPI for basic API endpoints
Transformers library for model integration
NumPy for numerical operations
OpenCV for image processing
Scikit-learn for basic analysis

Future Development

The project roadmap includes:

Implementation of OSWorld benchmark testing
Development of LLM fine-tuning for cursor commands
Enhancement of event processing algorithms
Integration with additional vision-language models
Implementation of comprehensive performance metrics

Contact

archit.kalra@rice.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EventReader

Project Overview

Project Evolution & Technical Components

1. Event Reader System (`eventreader4.py`)

2. Command Control Architecture (`cursorcommander.py`)

3. Neural Inference System (`inference_server.py`)

4. State Management & Integration (`integrator.py`)

Technical Achievements & Innovations

Vision System Architecture

Model Integration

Processing Pipeline

Technologies & Dependencies

Future Development

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
cursorcommander.py		cursorcommander.py
eventreader4.py		eventreader4.py
inference_server.py		inference_server.py
integrator.py		integrator.py

aka133/EventReader

Folders and files

Latest commit

History

Repository files navigation

EventReader

Project Overview

Project Evolution & Technical Components

1. Event Reader System (eventreader4.py)

2. Command Control Architecture (cursorcommander.py)

3. Neural Inference System (inference_server.py)

4. State Management & Integration (integrator.py)

Technical Achievements & Innovations

Vision System Architecture

Model Integration

Processing Pipeline

Technologies & Dependencies

Future Development

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Event Reader System (`eventreader4.py`)

2. Command Control Architecture (`cursorcommander.py`)

3. Neural Inference System (`inference_server.py`)

4. State Management & Integration (`integrator.py`)

Packages