Skip to content

yb235/Interview_System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Interview System

A real-time multi-modal interview analysis system that combines pose detection, speech-to-text, and emotion recognition to provide comprehensive behavioral insights during interviews.

πŸš€ Quick Start

Performance Optimized Versions (NEW!)

For CPU-only systems:

python interview_system_v6_optimized.py

Delivers 18-22 FPS (3-4x faster than baseline) with frame skipping and lightweight models

For GPU systems:

python interview_system_onnx_full.py

Delivers 35-60 FPS with ONNX Runtime and hardware acceleration

πŸ“– Full Performance Guide


⚑ Performance Improvements

NEW Performance Optimized Versions Available!

Version FPS (CPU) FPS (GPU) Memory Best For
v5 (baseline) 6-10 30-35 900 MB Reference
v6 optimized 18-22 30-35 550 MB CPU users
ONNX full 12-15 35-60 280 MB GPU users

Key Improvements:

  • βœ… 3-6x FPS improvement (6-10 β†’ 18-60 FPS)
  • βœ… 40% memory reduction (900 β†’ 280-550 MB)
  • βœ… 60% faster startup (10s β†’ 3-5s)
  • βœ… Configurable performance vs accuracy
  • βœ… GPU acceleration (CUDA, DirectML)

πŸ“Š Performance Results & Benchmarks


Features

Core Capabilities

  • Real-time Pose Detection: Detects 10+ body language actions

    • Arms crossed, hands clasped, chin rest
    • Lean forward/backward, head down
    • Touch face/nose, fix hair, fidget hands
  • Speech-to-Text: Live transcription using Whisper

    • Multi-language support
    • Accurate transcription with timestamps
  • Emotion Analysis:

    • Facial emotion detection (optional, configurable)
    • Voice emotion analysis (energy-based)
  • Multi-threading: Concurrent video and audio processing

  • Comprehensive Logging: JSON exports with timestamps

    • Individual logs per feature
    • Combined log for integrated analysis

Available Versions

Production Versions

Version Description FPS Use Case
v6_optimized.py Performance optimized with frame skipping 18-22 CPU-only systems
onnx_full.py ONNX Runtime with GPU acceleration 35-60 GPU systems

Legacy Versions

Version Description FPS Status
interview_system.py (v1) Basic pose + STT 6-10 Stable
interview_system_v2.py + Facial emotion 3-4 Laggy
interview_system_v3.py Enhanced emotion 3-4 Laggy
interview_system_v4.py + Voice emotion 6-10 Stable
interview_system_v5.py Separate logs 6-10 Stable

Installation

Basic Dependencies

pip install opencv-python numpy ultralytics
pip install faster-whisper sounddevice
pip install deepface  # Optional, for facial emotion

For ONNX Acceleration (Recommended)

CPU or Any GPU (Windows):

pip install onnxruntime-directml  # DirectML support

NVIDIA GPU (Best Performance):

pip install onnxruntime-gpu  # CUDA support

CPU Only:

pip install onnxruntime  # Fallback

Model Files

  • yolov8n-pose.pt - Lightweight pose model (6.6 MB)
  • yolo11m-pose.pt - Standard pose model (41 MB)
  • yolo11m-pose.onnx - ONNX pose model (81 MB)

Models are downloaded automatically on first run.


Usage

v6 Optimized (CPU-Friendly)

python interview_system_v6_optimized.py

Configuration Options:

USE_LIGHTWEIGHT_MODEL = True   # YOLOv8n-pose (3x faster)
SKIP_FRAMES = 2                # Process every 3rd frame
EMOTION_CHECK_INTERVAL = 30    # Check emotion every 30 frames
ENABLE_FACIAL_EMOTION = True   # Set False to disable

Performance: 18-22 FPS on CPU


ONNX Full (GPU-Accelerated)

python interview_system_onnx_full.py

Auto-Detects:

  • CUDA (NVIDIA GPU) - Fastest
  • DirectML (Any GPU on Windows) - Fast
  • CPU - Fallback

Performance: 35-60 FPS with GPU


Legacy Version (v5)

python interview_system_v5.py

Performance: 6-10 FPS on CPU (baseline)


Configuration Guide

For Different Hardware

Budget Laptop (CPU-only)

# v6_optimized.py
USE_LIGHTWEIGHT_MODEL = True
SKIP_FRAMES = 2
ENABLE_FACIAL_EMOTION = False

Expected: 15-18 FPS, 250 MB RAM

Mid-Range Desktop

# v6_optimized.py or onnx_full.py
SKIP_FRAMES = 1
ENABLE_FACIAL_EMOTION = True

Expected: 20-30 FPS, 280-350 MB RAM

Gaming Laptop/Desktop (GPU)

# onnx_full.py
SKIP_FRAMES = 0  # No skipping needed
ENABLE_FACIAL_EMOTION = True

Expected: 40-60 FPS, 250-300 MB RAM


Output Files

Generated Logs

All versions produce JSON logs with timestamps:

  • action_log.json - Detected body language actions
  • transcription_log.json - Speech-to-text results
  • voice_emotion_log.json - Voice emotion analysis
  • facial_emotion_log.json - Facial emotion analysis
  • combined_log.json - Merged data per second

Example:

{
  "time": "00:05",
  "timestamp_seconds": 5.0,
  "actions": ["arms_crossed", "lean_back"],
  "texts": ["I'm ready for the interview"],
  "facial_emotions": ["neutral"],
  "voice_emotions": ["calm"]
}

Performance Benchmarking

Test your system's performance:

python benchmark_performance.py

Output:

  • YOLOv8n-pose vs YOLO11m-pose comparison
  • PyTorch vs ONNX comparison
  • CPU vs GPU performance
  • Improvement percentages

Documentation

Performance Optimization

Video Processing


Keyboard Controls

Key Action
q Quit and save logs
ESC Alternative quit

System Requirements

Minimum (CPU-only)

  • CPU: Intel i5-8th gen or AMD Ryzen 5
  • RAM: 8 GB
  • Storage: 500 MB for models
  • OS: Windows 10+, Linux, macOS

Configuration: v6 with YOLOv8n + skip=2
Performance: 15-18 FPS

Recommended (GPU)

  • CPU: Intel i7-10th gen or AMD Ryzen 7
  • GPU: NVIDIA GTX 1650+ or AMD RX 6600+
  • RAM: 8 GB
  • VRAM: 2 GB

Configuration: ONNX full with DirectML/CUDA
Performance: 35-40 FPS

Optimal (High-end)

  • CPU: Intel i7-12th gen or AMD Ryzen 7 5000+
  • GPU: NVIDIA RTX 3060+ or AMD RX 6700+
  • RAM: 16 GB
  • VRAM: 4 GB

Configuration: ONNX full with CUDA
Performance: 50-60 FPS


Troubleshooting

Low FPS (< 10 FPS)

  1. Switch to v6 optimized
  2. Enable lightweight model: USE_LIGHTWEIGHT_MODEL = True
  3. Increase frame skipping: SKIP_FRAMES = 3
  4. Disable facial emotion: ENABLE_FACIAL_EMOTION = False

GPU Not Being Used (ONNX)

  1. Install correct ONNX Runtime:
    • NVIDIA: pip install onnxruntime-gpu
    • Windows (any GPU): pip install onnxruntime-directml
  2. Update GPU drivers
  3. Check providers at startup (printed in console)

High Memory Usage

  1. Use lightweight model: USE_LIGHTWEIGHT_MODEL = True
  2. Disable facial emotion: ENABLE_FACIAL_EMOTION = False
  3. Use ONNX version (more efficient)

Known Limitations

YOLOv8n-pose (Lightweight Model)

  • 3-5% lower accuracy vs YOLO11m-pose
  • Sufficient for interview body language
  • Not suitable for fine-grained hand gestures

Frame Skipping

  • 33-66ms delay in action updates
  • Acceptable for sitting subjects
  • Not suitable for fast-paced activities

Facial Emotion

  • Requires good lighting
  • Can be slow on CPU (120ms per check)
  • Consider disabling on low-end hardware

Future Enhancements

Planned Optimizations

  • Adaptive frame skipping (motion-based)
  • Async emotion detection (non-blocking)
  • FP16 quantized models (2x speedup)
  • TensorRT optimization (NVIDIA)
  • Batch processing support

Additional Features

  • Multi-person interview support
  • Action heatmap visualization
  • Real-time feedback dashboard
  • Cloud integration options

Performance Summary

Optimization Speedup Memory Saved Effort
Frame Skipping 3x - Low
Lightweight Model 3x 200 MB Minimal
ONNX Runtime 1.5-6x 100 MB Low
Emotion Frequency Fixes v2-v3 - Low
Combined (CPU) 3-4x 300-350 MB Low
Combined (GPU) 4-6x 400-600 MB Low

Contributing

Adding New Actions

See: Adding New Actions Guide

Improving Performance

  1. Profile with benchmark_performance.py
  2. Implement optimization
  3. Measure improvement
  4. Document in performance docs

Documentation

  • Update relevant markdown files
  • Include performance metrics
  • Provide code examples
  • Note any trade-offs

License

This project is available for educational and research purposes.


Citation

If you use this system in your research, please cite:

Interview System - Real-time Multi-modal Interview Analysis
https://github.com/yb235/Interview_System

Changelog

v6 Optimized (2025-11-23)

  • ✨ Frame skipping implementation (3x effective speedup)
  • ✨ Lightweight model option (YOLOv8n-pose)
  • ✨ Configurable emotion detection frequency
  • ✨ Lazy model loading (50% faster startup)
  • πŸ“Š 18-22 FPS on CPU (vs 6-10 baseline)

ONNX Full (2025-11-23)

  • ✨ ONNX Runtime integration
  • ✨ Hardware acceleration (CUDA, DirectML)
  • ✨ Auto-detect best provider
  • ✨ Optimized preprocessing
  • πŸ“Š 35-60 FPS with GPU (vs 30-35 PyTorch)

Performance Documentation (2025-11-23)

  • πŸ“ Comprehensive performance analysis
  • πŸ“ Improvement plan with metrics
  • πŸ“ Code examples and templates
  • πŸ“ Performance results with benchmarks
  • πŸ“ Usage guide with troubleshooting

Contact

For questions, issues, or contributions:

  • Open an issue on GitHub
  • Include system specs and benchmark results
  • Reference relevant documentation

Status: βœ… Production Ready
Latest Version: v6 Optimized + ONNX Full
Last Updated: 2025-11-23

About

It is designed for get people's action, facial emotion, voice emotion and transcription by mic and camera in your own PC.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages