Skip to content

tanish2007/Ergo

Repository files navigation

ERGO

Emergency Responsive Ground Observer

Real-time distress detection and emergency guidance system powered by computer vision and AI

Python YOLOv8 Gemini ElevenLabs Raspberry Pi


Overview

ERGO is a real-time emergency response system that uses a Raspberry Pi 5 with a NoIR camera to continuously monitor a scene for distress postures. When a person is detected in distress (fallen, collapsed, hands on head, etc.), the system triggers an audible alarm and provides AI-generated first-aid guidance through a Bluetooth speaker — all hands-free.


How It Works

Raspberry Pi 5                          Laptop
+---------------------------+           +----------------------------------+
| NoIR Camera (MJPEG stream)|  --SSH--> | YOLOv8n-Pose (real-time)         |
| GPIO Button (trigger)     |  --SSH--> | Distress Detection Engine        |
| Bluetooth Speaker (output)|  <--SSH-- | Gemini 2.5 Flash (scene analysis)|
+---------------------------+           | ElevenLabs TTS (voice guidance)  |
                                        +----------------------------------+

Detection Flow

MONITORING ──[5 distress frames]──> ALARM ──[distress clears]──> MONITORING
                                      │
                                      │ [button press]
                                      v
                                   PIPELINE
                                      │
                          ┌───────────┼───────────┐
                          v           v           v
                     Frame Saved   Gemini     ElevenLabs
                      as .jpg     Analysis   TTS -> MP3
                                      │
                                      v
                              Play guidance on
                              Bluetooth speaker
                                      │
                                      v
                                  MONITORING
  1. Continuous Monitoring — Live MJPEG stream from Pi camera analyzed frame-by-frame with YOLOv8 Pose estimation
  2. Distress Detection — 5 consecutive distress frames trigger the alarm (prevents false positives)
  3. Emergency Alarm — Pre-recorded alert loops on the Bluetooth speaker
  4. Auto-Recovery — Alarm stops automatically when distress posture clears
  5. Button Trigger — Physical GPIO button activates the AI guidance pipeline
  6. AI Analysis — Gemini 2.5 Flash analyzes the captured frame with pose context
  7. Voice Guidance — ElevenLabs converts Gemini's medical instructions to natural speech
  8. Audio Playback — Guidance plays through the Bluetooth speaker on the Pi

Distress Postures Detected

Check Condition Indicator
Fall Detection Head below hip level Possible fall or collapse
Horizontal Torso Shoulder-hip line is flat Person lying down
Body Spread Width-to-height ratio > 1.8 On the ground
Hands Above Head Wrists above nose Distress signal
Face Covering Both hands near face Panic or pain response
Crouched/Curled Compressed torso + head Fetal position
Low in Frame Body center below 75% of frame height Collapsed

Anatomical Landmark Mapping

ERGO maps medical landmarks from pose keypoints for precise first-aid guidance:

Landmark Location Use Case
Sternum Midpoint between shoulders CPR compression target
Outer Thigh Mid hip-to-knee EpiPen injection site
Neck/Carotid Between ear and shoulder Pulse check
Chest Center Between sternum and mid-hip AED pad placement

Tech Stack

Component Technology Purpose
Pose Estimation YOLOv8n-Pose (Ultralytics) Real-time body keypoint detection (~12ms inference)
Scene Analysis Google Gemini 2.5 Flash Multimodal AI for emergency scene understanding
Text-to-Speech ElevenLabs Multilingual v2 Natural voice synthesis for medical instructions
Edge Device Raspberry Pi 5 Camera capture, GPIO input, Bluetooth audio output
Camera Pi NoIR Camera Module Night-capable video streaming
Audio Output Bluetooth Speaker via PulseAudio Wireless audio playback
Connectivity Paramiko (SSH/SFTP) Persistent, low-latency Pi communication
Computer Vision OpenCV Frame processing, MJPEG decoding, display
Hardware Input GPIO Button (gpiozero) Physical panic/trigger button

Project Structure

ERGO/
├── live_view.py          # Main application — state machine, alarm, live display
├── pose_detector.py      # YOLOv8 Pose distress detection + anatomical mapping
├── vision.py             # Gemini API integration for scene analysis
├── speech.py             # ElevenLabs TTS + Pi audio playback
├── pi_connection.py      # Persistent SSH/SFTP connection manager
├── pipeline.py           # Standalone pipeline (button mode / single-shot)
├── alarm.mp3             # Pre-generated emergency alert audio
├── yolov8n-pose.pt       # YOLOv8 Nano Pose model weights
├── requirements.txt      # Python dependencies
├── .env                  # API keys and Pi credentials (not committed)
└── captures/             # Saved frames, Gemini responses, and TTS audio

Hardware Architecture

Components

Component Model / Spec Role
Single-Board Computer Raspberry Pi 5 Edge device — camera capture, GPIO input, Bluetooth audio output
Camera Pi NoIR Camera Module v2 Night-vision capable, infrared-sensitive MJPEG streaming
Panic Button Tactile push button on GPIO 17 Physical trigger for AI guidance pipeline
Audio Output Bluetooth speaker via PulseAudio Wireless alarm and voice guidance playback
Processing Unit Laptop/PC (GPU recommended) Runs YOLOv8 pose inference, Gemini API, ElevenLabs TTS

Wiring Diagram

Raspberry Pi 5 GPIO Header
─────────────────────────────────────────
              3V3  (1) (2)  5V
            GPIO2  (3) (4)  5V
            GPIO3  (5) (6)  GND
            GPIO4  (7) (8)  GPIO14
              GND  (9) (10) GPIO15
     ┌───► GPIO17 (11) (12) GPIO18
     │    GPIO27 (13) (14) GND
     │    GPIO22 (15) (16) GPIO23
     │      3V3 (17) (18) GPIO24
     │    GPIO10 (19) (20) GND
     │
     │    ┌──────────┐
     └────┤  Button   ├──── GND (Pin 9)
          └──────────┘
          (internal pull-up enabled)

NoIR Camera ──► CSI Ribbon Cable ──► Pi Camera Port
BT Speaker  ──► Bluetooth A2DP  ──► PulseAudio Sink

Camera Specifications

Parameter Value
Sensor Sony IMX219 NoIR (no infrared filter)
Resolution 640x480 (default, configurable)
Frame Rate 15 FPS (default, configurable)
Codec MJPEG (Motion JPEG)
Stream Tool rpicam-vid (native Pi camera stack)
Transport MJPEG over SSH pipe
Rotation 90° counter-clockwise (software)
Night Vision Yes — IR-sensitive sensor for low-light monitoring

Audio Pipeline

┌────────────┐     SFTP      ┌─────────────┐    PulseAudio    ┌──────────────────┐
│   Laptop   │ ──────────► │  Raspberry   │ ──────────────► │    Bluetooth     │
│            │  .mp3 file   │    Pi 5      │   -o pulse       │     Speaker      │
│ ElevenLabs │              │   mpg123     │                  │  (A2DP Sink)     │
│    TTS     │              │              │                  │                  │
└────────────┘              └─────────────┘                  └──────────────────┘
  • Player: mpg123 with -o pulse for PulseAudio routing
  • Alarm: Pre-generated alarm.mp3 — uploaded once at startup, loops continuously during alert state
  • Guidance: Dynamically generated MP3 via ElevenLabs, uploaded per-event via SFTP
  • Latency: No re-upload needed for alarm — only guidance audio transfers per trigger

Edge-Cloud Split

Runs on Pi Runs on Laptop
Camera capture (rpicam-vid) YOLOv8n-Pose inference (~12ms)
GPIO button listener Distress detection logic
Audio playback (mpg123) Gemini 2.5 Flash scene analysis
Bluetooth audio routing ElevenLabs TTS generation
OpenCV frame processing

The Pi handles all I/O (camera, button, speaker) while the laptop handles all compute (ML inference, API calls). Communication is over a single persistent SSH connection via Paramiko, eliminating the ~3s handshake overhead per operation.


Prerequisites

Hardware

  • Raspberry Pi 5 with Raspberry Pi OS
  • Pi NoIR Camera Module v2 — connected via CSI ribbon cable
  • Bluetooth Speaker — paired and connected via PulseAudio (A2DP profile)
  • Tactile Push Button — connected between GPIO 17 and GND (internal pull-up used)
  • Jumper Wires — for button to GPIO header connection
  • Laptop/PC — Python 3.10+, GPU recommended for real-time YOLOv8 inference

Raspberry Pi Setup

# Install audio and Bluetooth packages
sudo apt-get update
sudo apt-get install -y pulseaudio pulseaudio-module-bluetooth mpg123

# Start PulseAudio
pulseaudio --start

# Pair Bluetooth speaker
bluetoothctl scan on
bluetoothctl pair <MAC_ADDRESS>
bluetoothctl trust <MAC_ADDRESS>
bluetoothctl connect <MAC_ADDRESS>

# Set as default audio sink
pactl set-default-sink bluez_sink.<MAC_UNDERSCORED>.a2dp_sink

Installation

1. Clone the Repository

git clone https://github.com/yourusername/aegis.git
cd aegis

2. Install Dependencies

pip install -r requirements.txt
pip install ultralytics opencv-python paramiko

3. Configure Environment Variables

Create a .env file in the project root:

GEMINI_API_KEY=your_gemini_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
PI_USER=your_pi_username
PI_HOST=your_pi_hostname.local
PI_PASSWORD=your_pi_password

4. Set Up Passwordless SSH (Recommended)

ssh-copy-id your_pi_username@your_pi_hostname.local

Usage

Live Monitoring Mode (Primary)

python live_view.py

Options:

--width    Frame width (default: 640)
--height   Frame height (default: 480)
--fps      Camera framerate (default: 15)
--pin      GPIO pin for button (default: 17)
--out-dir  Output directory (default: captures)

Standalone Pipeline Mode

# Button-triggered mode (waits for GPIO press)
python pipeline.py --mode button

# Single-shot with existing image
python pipeline.py --mode once --image test_photo.jpg

# Text-only (no audio playback)
python pipeline.py --mode once --image test_photo.jpg --text-only

Controls

Key Action
Q / ESC Quit application
GPIO Button Trigger Gemini + TTS pipeline

State Machine

State Description Transitions
MONITORING Analyzing frames for distress -> ALARM (5 consecutive distress frames)
ALARM Emergency audio looping -> MONITORING (distress clears) / -> PIPELINE (button press)
PIPELINE Running Gemini analysis + TTS -> MONITORING (pipeline complete)

Architecture Highlights

  • Persistent SSH Connections — Single paramiko connection per component eliminates ~3s handshake overhead per operation
  • Edge-Cloud Hybrid — Heavy ML inference on laptop, lightweight I/O on Pi
  • Bluetooth Audio via PulseAudio — Wireless speaker support with automatic sink routing
  • Debounced Detection — 5-frame consecutive threshold prevents false alarm triggers
  • Auto-Recovery — Alarm automatically stops when the person is no longer in distress
  • 5s Button Cooldown — Prevents duplicate pipeline triggers from accidental presses
  • Pre-generated Alarm — Static MP3 avoids API calls during emergencies

License

This project is licensed under the MIT License — see the LICENSE file for details.

Copyright (c) 2026 Ergo

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages