Emergency Responsive Ground Observer
Real-time distress detection and emergency guidance system powered by computer vision and AI
ERGO is a real-time emergency response system that uses a Raspberry Pi 5 with a NoIR camera to continuously monitor a scene for distress postures. When a person is detected in distress (fallen, collapsed, hands on head, etc.), the system triggers an audible alarm and provides AI-generated first-aid guidance through a Bluetooth speaker — all hands-free.
Raspberry Pi 5 Laptop
+---------------------------+ +----------------------------------+
| NoIR Camera (MJPEG stream)| --SSH--> | YOLOv8n-Pose (real-time) |
| GPIO Button (trigger) | --SSH--> | Distress Detection Engine |
| Bluetooth Speaker (output)| <--SSH-- | Gemini 2.5 Flash (scene analysis)|
+---------------------------+ | ElevenLabs TTS (voice guidance) |
+----------------------------------+
MONITORING ──[5 distress frames]──> ALARM ──[distress clears]──> MONITORING
│
│ [button press]
v
PIPELINE
│
┌───────────┼───────────┐
v v v
Frame Saved Gemini ElevenLabs
as .jpg Analysis TTS -> MP3
│
v
Play guidance on
Bluetooth speaker
│
v
MONITORING
- Continuous Monitoring — Live MJPEG stream from Pi camera analyzed frame-by-frame with YOLOv8 Pose estimation
- Distress Detection — 5 consecutive distress frames trigger the alarm (prevents false positives)
- Emergency Alarm — Pre-recorded alert loops on the Bluetooth speaker
- Auto-Recovery — Alarm stops automatically when distress posture clears
- Button Trigger — Physical GPIO button activates the AI guidance pipeline
- AI Analysis — Gemini 2.5 Flash analyzes the captured frame with pose context
- Voice Guidance — ElevenLabs converts Gemini's medical instructions to natural speech
- Audio Playback — Guidance plays through the Bluetooth speaker on the Pi
| Check | Condition | Indicator |
|---|---|---|
| Fall Detection | Head below hip level | Possible fall or collapse |
| Horizontal Torso | Shoulder-hip line is flat | Person lying down |
| Body Spread | Width-to-height ratio > 1.8 | On the ground |
| Hands Above Head | Wrists above nose | Distress signal |
| Face Covering | Both hands near face | Panic or pain response |
| Crouched/Curled | Compressed torso + head | Fetal position |
| Low in Frame | Body center below 75% of frame height | Collapsed |
ERGO maps medical landmarks from pose keypoints for precise first-aid guidance:
| Landmark | Location | Use Case |
|---|---|---|
| Sternum | Midpoint between shoulders | CPR compression target |
| Outer Thigh | Mid hip-to-knee | EpiPen injection site |
| Neck/Carotid | Between ear and shoulder | Pulse check |
| Chest Center | Between sternum and mid-hip | AED pad placement |
| Component | Technology | Purpose |
|---|---|---|
| Pose Estimation | YOLOv8n-Pose (Ultralytics) | Real-time body keypoint detection (~12ms inference) |
| Scene Analysis | Google Gemini 2.5 Flash | Multimodal AI for emergency scene understanding |
| Text-to-Speech | ElevenLabs Multilingual v2 | Natural voice synthesis for medical instructions |
| Edge Device | Raspberry Pi 5 | Camera capture, GPIO input, Bluetooth audio output |
| Camera | Pi NoIR Camera Module | Night-capable video streaming |
| Audio Output | Bluetooth Speaker via PulseAudio | Wireless audio playback |
| Connectivity | Paramiko (SSH/SFTP) | Persistent, low-latency Pi communication |
| Computer Vision | OpenCV | Frame processing, MJPEG decoding, display |
| Hardware Input | GPIO Button (gpiozero) | Physical panic/trigger button |
ERGO/
├── live_view.py # Main application — state machine, alarm, live display
├── pose_detector.py # YOLOv8 Pose distress detection + anatomical mapping
├── vision.py # Gemini API integration for scene analysis
├── speech.py # ElevenLabs TTS + Pi audio playback
├── pi_connection.py # Persistent SSH/SFTP connection manager
├── pipeline.py # Standalone pipeline (button mode / single-shot)
├── alarm.mp3 # Pre-generated emergency alert audio
├── yolov8n-pose.pt # YOLOv8 Nano Pose model weights
├── requirements.txt # Python dependencies
├── .env # API keys and Pi credentials (not committed)
└── captures/ # Saved frames, Gemini responses, and TTS audio
| Component | Model / Spec | Role |
|---|---|---|
| Single-Board Computer | Raspberry Pi 5 | Edge device — camera capture, GPIO input, Bluetooth audio output |
| Camera | Pi NoIR Camera Module v2 | Night-vision capable, infrared-sensitive MJPEG streaming |
| Panic Button | Tactile push button on GPIO 17 | Physical trigger for AI guidance pipeline |
| Audio Output | Bluetooth speaker via PulseAudio | Wireless alarm and voice guidance playback |
| Processing Unit | Laptop/PC (GPU recommended) | Runs YOLOv8 pose inference, Gemini API, ElevenLabs TTS |
Raspberry Pi 5 GPIO Header
─────────────────────────────────────────
3V3 (1) (2) 5V
GPIO2 (3) (4) 5V
GPIO3 (5) (6) GND
GPIO4 (7) (8) GPIO14
GND (9) (10) GPIO15
┌───► GPIO17 (11) (12) GPIO18
│ GPIO27 (13) (14) GND
│ GPIO22 (15) (16) GPIO23
│ 3V3 (17) (18) GPIO24
│ GPIO10 (19) (20) GND
│
│ ┌──────────┐
└────┤ Button ├──── GND (Pin 9)
└──────────┘
(internal pull-up enabled)
NoIR Camera ──► CSI Ribbon Cable ──► Pi Camera Port
BT Speaker ──► Bluetooth A2DP ──► PulseAudio Sink
| Parameter | Value |
|---|---|
| Sensor | Sony IMX219 NoIR (no infrared filter) |
| Resolution | 640x480 (default, configurable) |
| Frame Rate | 15 FPS (default, configurable) |
| Codec | MJPEG (Motion JPEG) |
| Stream Tool | rpicam-vid (native Pi camera stack) |
| Transport | MJPEG over SSH pipe |
| Rotation | 90° counter-clockwise (software) |
| Night Vision | Yes — IR-sensitive sensor for low-light monitoring |
┌────────────┐ SFTP ┌─────────────┐ PulseAudio ┌──────────────────┐
│ Laptop │ ──────────► │ Raspberry │ ──────────────► │ Bluetooth │
│ │ .mp3 file │ Pi 5 │ -o pulse │ Speaker │
│ ElevenLabs │ │ mpg123 │ │ (A2DP Sink) │
│ TTS │ │ │ │ │
└────────────┘ └─────────────┘ └──────────────────┘
- Player:
mpg123with-o pulsefor PulseAudio routing - Alarm: Pre-generated
alarm.mp3— uploaded once at startup, loops continuously during alert state - Guidance: Dynamically generated MP3 via ElevenLabs, uploaded per-event via SFTP
- Latency: No re-upload needed for alarm — only guidance audio transfers per trigger
| Runs on Pi | Runs on Laptop |
|---|---|
Camera capture (rpicam-vid) |
YOLOv8n-Pose inference (~12ms) |
| GPIO button listener | Distress detection logic |
Audio playback (mpg123) |
Gemini 2.5 Flash scene analysis |
| Bluetooth audio routing | ElevenLabs TTS generation |
| OpenCV frame processing |
The Pi handles all I/O (camera, button, speaker) while the laptop handles all compute (ML inference, API calls). Communication is over a single persistent SSH connection via Paramiko, eliminating the ~3s handshake overhead per operation.
- Raspberry Pi 5 with Raspberry Pi OS
- Pi NoIR Camera Module v2 — connected via CSI ribbon cable
- Bluetooth Speaker — paired and connected via PulseAudio (A2DP profile)
- Tactile Push Button — connected between GPIO 17 and GND (internal pull-up used)
- Jumper Wires — for button to GPIO header connection
- Laptop/PC — Python 3.10+, GPU recommended for real-time YOLOv8 inference
# Install audio and Bluetooth packages
sudo apt-get update
sudo apt-get install -y pulseaudio pulseaudio-module-bluetooth mpg123
# Start PulseAudio
pulseaudio --start
# Pair Bluetooth speaker
bluetoothctl scan on
bluetoothctl pair <MAC_ADDRESS>
bluetoothctl trust <MAC_ADDRESS>
bluetoothctl connect <MAC_ADDRESS>
# Set as default audio sink
pactl set-default-sink bluez_sink.<MAC_UNDERSCORED>.a2dp_sinkgit clone https://github.com/yourusername/aegis.git
cd aegispip install -r requirements.txt
pip install ultralytics opencv-python paramikoCreate a .env file in the project root:
GEMINI_API_KEY=your_gemini_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
PI_USER=your_pi_username
PI_HOST=your_pi_hostname.local
PI_PASSWORD=your_pi_passwordssh-copy-id your_pi_username@your_pi_hostname.localpython live_view.pyOptions:
--width Frame width (default: 640)
--height Frame height (default: 480)
--fps Camera framerate (default: 15)
--pin GPIO pin for button (default: 17)
--out-dir Output directory (default: captures)
# Button-triggered mode (waits for GPIO press)
python pipeline.py --mode button
# Single-shot with existing image
python pipeline.py --mode once --image test_photo.jpg
# Text-only (no audio playback)
python pipeline.py --mode once --image test_photo.jpg --text-only| Key | Action |
|---|---|
Q / ESC |
Quit application |
| GPIO Button | Trigger Gemini + TTS pipeline |
| State | Description | Transitions |
|---|---|---|
MONITORING |
Analyzing frames for distress | -> ALARM (5 consecutive distress frames) |
ALARM |
Emergency audio looping | -> MONITORING (distress clears) / -> PIPELINE (button press) |
PIPELINE |
Running Gemini analysis + TTS | -> MONITORING (pipeline complete) |
- Persistent SSH Connections — Single paramiko connection per component eliminates ~3s handshake overhead per operation
- Edge-Cloud Hybrid — Heavy ML inference on laptop, lightweight I/O on Pi
- Bluetooth Audio via PulseAudio — Wireless speaker support with automatic sink routing
- Debounced Detection — 5-frame consecutive threshold prevents false alarm triggers
- Auto-Recovery — Alarm automatically stops when the person is no longer in distress
- 5s Button Cooldown — Prevents duplicate pipeline triggers from accidental presses
- Pre-generated Alarm — Static MP3 avoids API calls during emergencies
This project is licensed under the MIT License — see the LICENSE file for details.
Copyright (c) 2026 Ergo