ERGO

Emergency Responsive Ground Observer

Real-time distress detection and emergency guidance system powered by computer vision and AI

Overview

ERGO is a real-time emergency response system that uses a Raspberry Pi 5 with a NoIR camera to continuously monitor a scene for distress postures. When a person is detected in distress (fallen, collapsed, hands on head, etc.), the system triggers an audible alarm and provides AI-generated first-aid guidance through a Bluetooth speaker — all hands-free.

How It Works

Raspberry Pi 5                          Laptop
+---------------------------+           +----------------------------------+
| NoIR Camera (MJPEG stream)|  --SSH--> | YOLOv8n-Pose (real-time)         |
| GPIO Button (trigger)     |  --SSH--> | Distress Detection Engine        |
| Bluetooth Speaker (output)|  <--SSH-- | Gemini 2.5 Flash (scene analysis)|
+---------------------------+           | ElevenLabs TTS (voice guidance)  |
                                        +----------------------------------+

Detection Flow

MONITORING ──[5 distress frames]──> ALARM ──[distress clears]──> MONITORING
                                      │
                                      │ [button press]
                                      v
                                   PIPELINE
                                      │
                          ┌───────────┼───────────┐
                          v           v           v
                     Frame Saved   Gemini     ElevenLabs
                      as .jpg     Analysis   TTS -> MP3
                                      │
                                      v
                              Play guidance on
                              Bluetooth speaker
                                      │
                                      v
                                  MONITORING

Continuous Monitoring — Live MJPEG stream from Pi camera analyzed frame-by-frame with YOLOv8 Pose estimation
Distress Detection — 5 consecutive distress frames trigger the alarm (prevents false positives)
Emergency Alarm — Pre-recorded alert loops on the Bluetooth speaker
Auto-Recovery — Alarm stops automatically when distress posture clears
Button Trigger — Physical GPIO button activates the AI guidance pipeline
AI Analysis — Gemini 2.5 Flash analyzes the captured frame with pose context
Voice Guidance — ElevenLabs converts Gemini's medical instructions to natural speech
Audio Playback — Guidance plays through the Bluetooth speaker on the Pi

Distress Postures Detected

Check	Condition	Indicator
Fall Detection	Head below hip level	Possible fall or collapse
Horizontal Torso	Shoulder-hip line is flat	Person lying down
Body Spread	Width-to-height ratio > 1.8	On the ground
Hands Above Head	Wrists above nose	Distress signal
Face Covering	Both hands near face	Panic or pain response
Crouched/Curled	Compressed torso + head	Fetal position
Low in Frame	Body center below 75% of frame height	Collapsed

Anatomical Landmark Mapping

ERGO maps medical landmarks from pose keypoints for precise first-aid guidance:

Landmark	Location	Use Case
Sternum	Midpoint between shoulders	CPR compression target
Outer Thigh	Mid hip-to-knee	EpiPen injection site
Neck/Carotid	Between ear and shoulder	Pulse check
Chest Center	Between sternum and mid-hip	AED pad placement

Tech Stack

Component	Technology	Purpose
Pose Estimation	YOLOv8n-Pose (Ultralytics)	Real-time body keypoint detection (~12ms inference)
Scene Analysis	Google Gemini 2.5 Flash	Multimodal AI for emergency scene understanding
Text-to-Speech	ElevenLabs Multilingual v2	Natural voice synthesis for medical instructions
Edge Device	Raspberry Pi 5	Camera capture, GPIO input, Bluetooth audio output
Camera	Pi NoIR Camera Module	Night-capable video streaming
Audio Output	Bluetooth Speaker via PulseAudio	Wireless audio playback
Connectivity	Paramiko (SSH/SFTP)	Persistent, low-latency Pi communication
Computer Vision	OpenCV	Frame processing, MJPEG decoding, display
Hardware Input	GPIO Button (gpiozero)	Physical panic/trigger button

Project Structure

ERGO/
├── live_view.py          # Main application — state machine, alarm, live display
├── pose_detector.py      # YOLOv8 Pose distress detection + anatomical mapping
├── vision.py             # Gemini API integration for scene analysis
├── speech.py             # ElevenLabs TTS + Pi audio playback
├── pi_connection.py      # Persistent SSH/SFTP connection manager
├── pipeline.py           # Standalone pipeline (button mode / single-shot)
├── alarm.mp3             # Pre-generated emergency alert audio
├── yolov8n-pose.pt       # YOLOv8 Nano Pose model weights
├── requirements.txt      # Python dependencies
├── .env                  # API keys and Pi credentials (not committed)
└── captures/             # Saved frames, Gemini responses, and TTS audio

Hardware Architecture

Components

Component	Model / Spec	Role
Single-Board Computer	Raspberry Pi 5	Edge device — camera capture, GPIO input, Bluetooth audio output
Camera	Pi NoIR Camera Module v2	Night-vision capable, infrared-sensitive MJPEG streaming
Panic Button	Tactile push button on GPIO 17	Physical trigger for AI guidance pipeline
Audio Output	Bluetooth speaker via PulseAudio	Wireless alarm and voice guidance playback
Processing Unit	Laptop/PC (GPU recommended)	Runs YOLOv8 pose inference, Gemini API, ElevenLabs TTS

Wiring Diagram

Raspberry Pi 5 GPIO Header
─────────────────────────────────────────
              3V3  (1) (2)  5V
            GPIO2  (3) (4)  5V
            GPIO3  (5) (6)  GND
            GPIO4  (7) (8)  GPIO14
              GND  (9) (10) GPIO15
     ┌───► GPIO17 (11) (12) GPIO18
     │    GPIO27 (13) (14) GND
     │    GPIO22 (15) (16) GPIO23
     │      3V3 (17) (18) GPIO24
     │    GPIO10 (19) (20) GND
     │
     │    ┌──────────┐
     └────┤  Button   ├──── GND (Pin 9)
          └──────────┘
          (internal pull-up enabled)

NoIR Camera ──► CSI Ribbon Cable ──► Pi Camera Port
BT Speaker  ──► Bluetooth A2DP  ──► PulseAudio Sink

Camera Specifications

Parameter	Value
Sensor	Sony IMX219 NoIR (no infrared filter)
Resolution	640x480 (default, configurable)
Frame Rate	15 FPS (default, configurable)
Codec	MJPEG (Motion JPEG)
Stream Tool	`rpicam-vid` (native Pi camera stack)
Transport	MJPEG over SSH pipe
Rotation	90° counter-clockwise (software)
Night Vision	Yes — IR-sensitive sensor for low-light monitoring

Audio Pipeline

┌────────────┐     SFTP      ┌─────────────┐    PulseAudio    ┌──────────────────┐
│   Laptop   │ ──────────► │  Raspberry   │ ──────────────► │    Bluetooth     │
│            │  .mp3 file   │    Pi 5      │   -o pulse       │     Speaker      │
│ ElevenLabs │              │   mpg123     │                  │  (A2DP Sink)     │
│    TTS     │              │              │                  │                  │
└────────────┘              └─────────────┘                  └──────────────────┘

Player: mpg123 with -o pulse for PulseAudio routing
Alarm: Pre-generated alarm.mp3 — uploaded once at startup, loops continuously during alert state
Guidance: Dynamically generated MP3 via ElevenLabs, uploaded per-event via SFTP
Latency: No re-upload needed for alarm — only guidance audio transfers per trigger

Edge-Cloud Split

Runs on Pi	Runs on Laptop
Camera capture (`rpicam-vid`)	YOLOv8n-Pose inference (~12ms)
GPIO button listener	Distress detection logic
Audio playback (`mpg123`)	Gemini 2.5 Flash scene analysis
Bluetooth audio routing	ElevenLabs TTS generation
	OpenCV frame processing

The Pi handles all I/O (camera, button, speaker) while the laptop handles all compute (ML inference, API calls). Communication is over a single persistent SSH connection via Paramiko, eliminating the ~3s handshake overhead per operation.

Prerequisites

Hardware

Raspberry Pi 5 with Raspberry Pi OS
Pi NoIR Camera Module v2 — connected via CSI ribbon cable
Bluetooth Speaker — paired and connected via PulseAudio (A2DP profile)
Tactile Push Button — connected between GPIO 17 and GND (internal pull-up used)
Jumper Wires — for button to GPIO header connection
Laptop/PC — Python 3.10+, GPU recommended for real-time YOLOv8 inference

Raspberry Pi Setup

# Install audio and Bluetooth packages
sudo apt-get update
sudo apt-get install -y pulseaudio pulseaudio-module-bluetooth mpg123

# Start PulseAudio
pulseaudio --start

# Pair Bluetooth speaker
bluetoothctl scan on
bluetoothctl pair <MAC_ADDRESS>
bluetoothctl trust <MAC_ADDRESS>
bluetoothctl connect <MAC_ADDRESS>

# Set as default audio sink
pactl set-default-sink bluez_sink.<MAC_UNDERSCORED>.a2dp_sink

Installation

1. Clone the Repository

git clone https://github.com/yourusername/aegis.git
cd aegis

2. Install Dependencies

pip install -r requirements.txt
pip install ultralytics opencv-python paramiko

3. Configure Environment Variables

Create a .env file in the project root:

GEMINI_API_KEY=your_gemini_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
PI_USER=your_pi_username
PI_HOST=your_pi_hostname.local
PI_PASSWORD=your_pi_password

4. Set Up Passwordless SSH (Recommended)

ssh-copy-id your_pi_username@your_pi_hostname.local

Usage

Live Monitoring Mode (Primary)

python live_view.py

Options:

--width    Frame width (default: 640)
--height   Frame height (default: 480)
--fps      Camera framerate (default: 15)
--pin      GPIO pin for button (default: 17)
--out-dir  Output directory (default: captures)

Standalone Pipeline Mode

# Button-triggered mode (waits for GPIO press)
python pipeline.py --mode button

# Single-shot with existing image
python pipeline.py --mode once --image test_photo.jpg

# Text-only (no audio playback)
python pipeline.py --mode once --image test_photo.jpg --text-only

Controls

Key	Action
`Q` / `ESC`	Quit application
GPIO Button	Trigger Gemini + TTS pipeline

State Machine

State	Description	Transitions
`MONITORING`	Analyzing frames for distress	-> `ALARM` (5 consecutive distress frames)
`ALARM`	Emergency audio looping	-> `MONITORING` (distress clears) / -> `PIPELINE` (button press)
`PIPELINE`	Running Gemini analysis + TTS	-> `MONITORING` (pipeline complete)

Architecture Highlights

Persistent SSH Connections — Single paramiko connection per component eliminates ~3s handshake overhead per operation
Edge-Cloud Hybrid — Heavy ML inference on laptop, lightweight I/O on Pi
Bluetooth Audio via PulseAudio — Wireless speaker support with automatic sink routing
Debounced Detection — 5-frame consecutive threshold prevents false alarm triggers
Auto-Recovery — Alarm automatically stops when the person is no longer in distress
5s Button Cooldown — Prevents duplicate pipeline triggers from accidental presses
Pre-generated Alarm — Static MP3 avoids API calls during emergencies

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alarm.mp3		alarm.mp3
download.jpg		download.jpg
live_view.py		live_view.py
pi_connection.py		pi_connection.py
pipeline.py		pipeline.py
pose_detector.py		pose_detector.py
requirements.txt		requirements.txt
speech.py		speech.py
test_button.py		test_button.py
vision.py		vision.py
yolov8n-pose.pt		yolov8n-pose.pt

Folders and files

Latest commit

History

Repository files navigation

ERGO

Overview

How It Works

Detection Flow

Distress Postures Detected

Anatomical Landmark Mapping

Tech Stack

Project Structure

Hardware Architecture

Components

Wiring Diagram

Camera Specifications

Audio Pipeline

Edge-Cloud Split

Prerequisites

Hardware

Raspberry Pi Setup

Installation

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Set Up Passwordless SSH (Recommended)

Usage

Live Monitoring Mode (Primary)

Standalone Pipeline Mode

Controls

State Machine

Architecture Highlights

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages