A real-time multi-modal interview analysis system that combines pose detection, speech-to-text, and emotion recognition to provide comprehensive behavioral insights during interviews.
For CPU-only systems:
python interview_system_v6_optimized.pyDelivers 18-22 FPS (3-4x faster than baseline) with frame skipping and lightweight models
For GPU systems:
python interview_system_onnx_full.pyDelivers 35-60 FPS with ONNX Runtime and hardware acceleration
NEW Performance Optimized Versions Available!
| Version | FPS (CPU) | FPS (GPU) | Memory | Best For |
|---|---|---|---|---|
| v5 (baseline) | 6-10 | 30-35 | 900 MB | Reference |
| v6 optimized | 18-22 | 30-35 | 550 MB | CPU users |
| ONNX full | 12-15 | 35-60 | 280 MB | GPU users |
Key Improvements:
- β 3-6x FPS improvement (6-10 β 18-60 FPS)
- β 40% memory reduction (900 β 280-550 MB)
- β 60% faster startup (10s β 3-5s)
- β Configurable performance vs accuracy
- β GPU acceleration (CUDA, DirectML)
π Performance Results & Benchmarks
-
Real-time Pose Detection: Detects 10+ body language actions
- Arms crossed, hands clasped, chin rest
- Lean forward/backward, head down
- Touch face/nose, fix hair, fidget hands
-
Speech-to-Text: Live transcription using Whisper
- Multi-language support
- Accurate transcription with timestamps
-
Emotion Analysis:
- Facial emotion detection (optional, configurable)
- Voice emotion analysis (energy-based)
-
Multi-threading: Concurrent video and audio processing
-
Comprehensive Logging: JSON exports with timestamps
- Individual logs per feature
- Combined log for integrated analysis
| Version | Description | FPS | Use Case |
|---|---|---|---|
| v6_optimized.py | Performance optimized with frame skipping | 18-22 | CPU-only systems |
| onnx_full.py | ONNX Runtime with GPU acceleration | 35-60 | GPU systems |
| Version | Description | FPS | Status |
|---|---|---|---|
| interview_system.py (v1) | Basic pose + STT | 6-10 | Stable |
| interview_system_v2.py | + Facial emotion | 3-4 | Laggy |
| interview_system_v3.py | Enhanced emotion | 3-4 | Laggy |
| interview_system_v4.py | + Voice emotion | 6-10 | Stable |
| interview_system_v5.py | Separate logs | 6-10 | Stable |
pip install opencv-python numpy ultralytics
pip install faster-whisper sounddevice
pip install deepface # Optional, for facial emotionCPU or Any GPU (Windows):
pip install onnxruntime-directml # DirectML supportNVIDIA GPU (Best Performance):
pip install onnxruntime-gpu # CUDA supportCPU Only:
pip install onnxruntime # Fallbackyolov8n-pose.pt- Lightweight pose model (6.6 MB)yolo11m-pose.pt- Standard pose model (41 MB)yolo11m-pose.onnx- ONNX pose model (81 MB)
Models are downloaded automatically on first run.
python interview_system_v6_optimized.pyConfiguration Options:
USE_LIGHTWEIGHT_MODEL = True # YOLOv8n-pose (3x faster)
SKIP_FRAMES = 2 # Process every 3rd frame
EMOTION_CHECK_INTERVAL = 30 # Check emotion every 30 frames
ENABLE_FACIAL_EMOTION = True # Set False to disablePerformance: 18-22 FPS on CPU
python interview_system_onnx_full.pyAuto-Detects:
- CUDA (NVIDIA GPU) - Fastest
- DirectML (Any GPU on Windows) - Fast
- CPU - Fallback
Performance: 35-60 FPS with GPU
python interview_system_v5.pyPerformance: 6-10 FPS on CPU (baseline)
# v6_optimized.py
USE_LIGHTWEIGHT_MODEL = True
SKIP_FRAMES = 2
ENABLE_FACIAL_EMOTION = FalseExpected: 15-18 FPS, 250 MB RAM
# v6_optimized.py or onnx_full.py
SKIP_FRAMES = 1
ENABLE_FACIAL_EMOTION = TrueExpected: 20-30 FPS, 280-350 MB RAM
# onnx_full.py
SKIP_FRAMES = 0 # No skipping needed
ENABLE_FACIAL_EMOTION = TrueExpected: 40-60 FPS, 250-300 MB RAM
All versions produce JSON logs with timestamps:
action_log.json- Detected body language actionstranscription_log.json- Speech-to-text resultsvoice_emotion_log.json- Voice emotion analysisfacial_emotion_log.json- Facial emotion analysiscombined_log.json- Merged data per second
Example:
{
"time": "00:05",
"timestamp_seconds": 5.0,
"actions": ["arms_crossed", "lean_back"],
"texts": ["I'm ready for the interview"],
"facial_emotions": ["neutral"],
"voice_emotions": ["calm"]
}Test your system's performance:
python benchmark_performance.pyOutput:
- YOLOv8n-pose vs YOLO11m-pose comparison
- PyTorch vs ONNX comparison
- CPU vs GPU performance
- Improvement percentages
- π Performance Optimized Usage Guide
- π Performance Results & Benchmarks
- π Performance Analysis
- π§ Improvement Plan
- π» Code Examples
- ποΈ Architecture Overview
- π€ Model Inference
- π― Motion Recognition
- β Adding New Actions
- βοΈ Configuration
- β‘ ONNX Acceleration
| Key | Action |
|---|---|
q |
Quit and save logs |
| ESC | Alternative quit |
- CPU: Intel i5-8th gen or AMD Ryzen 5
- RAM: 8 GB
- Storage: 500 MB for models
- OS: Windows 10+, Linux, macOS
Configuration: v6 with YOLOv8n + skip=2
Performance: 15-18 FPS
- CPU: Intel i7-10th gen or AMD Ryzen 7
- GPU: NVIDIA GTX 1650+ or AMD RX 6600+
- RAM: 8 GB
- VRAM: 2 GB
Configuration: ONNX full with DirectML/CUDA
Performance: 35-40 FPS
- CPU: Intel i7-12th gen or AMD Ryzen 7 5000+
- GPU: NVIDIA RTX 3060+ or AMD RX 6700+
- RAM: 16 GB
- VRAM: 4 GB
Configuration: ONNX full with CUDA
Performance: 50-60 FPS
- Switch to v6 optimized
- Enable lightweight model:
USE_LIGHTWEIGHT_MODEL = True - Increase frame skipping:
SKIP_FRAMES = 3 - Disable facial emotion:
ENABLE_FACIAL_EMOTION = False
- Install correct ONNX Runtime:
- NVIDIA:
pip install onnxruntime-gpu - Windows (any GPU):
pip install onnxruntime-directml
- NVIDIA:
- Update GPU drivers
- Check providers at startup (printed in console)
- Use lightweight model:
USE_LIGHTWEIGHT_MODEL = True - Disable facial emotion:
ENABLE_FACIAL_EMOTION = False - Use ONNX version (more efficient)
- 3-5% lower accuracy vs YOLO11m-pose
- Sufficient for interview body language
- Not suitable for fine-grained hand gestures
- 33-66ms delay in action updates
- Acceptable for sitting subjects
- Not suitable for fast-paced activities
- Requires good lighting
- Can be slow on CPU (120ms per check)
- Consider disabling on low-end hardware
- Adaptive frame skipping (motion-based)
- Async emotion detection (non-blocking)
- FP16 quantized models (2x speedup)
- TensorRT optimization (NVIDIA)
- Batch processing support
- Multi-person interview support
- Action heatmap visualization
- Real-time feedback dashboard
- Cloud integration options
| Optimization | Speedup | Memory Saved | Effort |
|---|---|---|---|
| Frame Skipping | 3x | - | Low |
| Lightweight Model | 3x | 200 MB | Minimal |
| ONNX Runtime | 1.5-6x | 100 MB | Low |
| Emotion Frequency | Fixes v2-v3 | - | Low |
| Combined (CPU) | 3-4x | 300-350 MB | Low |
| Combined (GPU) | 4-6x | 400-600 MB | Low |
- Profile with
benchmark_performance.py - Implement optimization
- Measure improvement
- Document in performance docs
- Update relevant markdown files
- Include performance metrics
- Provide code examples
- Note any trade-offs
This project is available for educational and research purposes.
If you use this system in your research, please cite:
Interview System - Real-time Multi-modal Interview Analysis
https://github.com/yb235/Interview_System
- β¨ Frame skipping implementation (3x effective speedup)
- β¨ Lightweight model option (YOLOv8n-pose)
- β¨ Configurable emotion detection frequency
- β¨ Lazy model loading (50% faster startup)
- π 18-22 FPS on CPU (vs 6-10 baseline)
- β¨ ONNX Runtime integration
- β¨ Hardware acceleration (CUDA, DirectML)
- β¨ Auto-detect best provider
- β¨ Optimized preprocessing
- π 35-60 FPS with GPU (vs 30-35 PyTorch)
- π Comprehensive performance analysis
- π Improvement plan with metrics
- π Code examples and templates
- π Performance results with benchmarks
- π Usage guide with troubleshooting
For questions, issues, or contributions:
- Open an issue on GitHub
- Include system specs and benchmark results
- Reference relevant documentation
Status: β
Production Ready
Latest Version: v6 Optimized + ONNX Full
Last Updated: 2025-11-23