A comprehensive real-time object detection and tracking system built with deep learning. This system supports multiple state-of-the-art detection models (YOLO, SSD, Faster R-CNN), object tracking algorithms, custom model training, and provides both REST API and web interface for easy integration.
-
Multiple Detection Models
- YOLOv8 (Nano, Small, Medium, Large, XL variants)
- Faster R-CNN with ResNet-50 FPN backbone
- SSD300 with VGG16 backbone
-
Object Tracking
- SORT (Simple Online and Realtime Tracking)
- DeepSORT with appearance features
- Multi-object tracking with trajectory visualization
-
Real-time Processing
- Live video stream processing
- Webcam support
- Video file processing with output
- Frame-by-frame image detection
-
Analytics & Counting
- Object counting and tracking
- Class-wise statistics
- Trajectory heatmap generation
- FPS monitoring
-
Custom Model Training
- YOLO model training pipeline
- Dataset organization utilities
- Model export (ONNX, TensorRT, TorchScript)
-
Model Evaluation
- Precision and Recall metrics
- mAP (mean Average Precision) calculation
- FPS benchmarking
-
Performance Optimization
- ONNX export for cross-platform deployment
- TensorRT optimization for NVIDIA GPUs
- FP16 half-precision support
- Model quantization
-
REST API
- Image detection endpoint
- Video processing endpoint
- WebSocket streaming support
- Analytics API
-
Web Interface
- Live webcam detection
- Image upload and visualization
- Video processing with download
- Real-time analytics dashboard
- Python 3.8 or higher
- CUDA-capable GPU (recommended for real-time performance)
- CUDA Toolkit 11.8+ and cuDNN (for GPU acceleration)
# Clone the repository
git clone https://github.com/xclusivecyberdev/Computer-Vision---Object-Detection-Tracking.git
cd Computer-Vision---Object-Detection-Tracking
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install the package
pip install -e .docker build -t object-detection .
docker run -p 8000:8000 object-detectionDetect objects in a single image:
python demo_image.py path/to/image.jpg --output result.jpgOptions:
--model: Model type (yolov8, faster_rcnn, ssd)--variant: Model variant for YOLO (n, s, m, l, x)--conf: Confidence threshold (default: 0.5)--device: Device (cuda or cpu)
Process a video file:
python demo_video.py path/to/video.mp4 --output output.mp4Use webcam (camera index 0):
python demo_video.py 0Options:
--tracker: Tracker type (sort, deepsort)--no-display: Disable video display--model,--variant,--conf,--device: Same as image detection
Start the API server:
python run_api.pyThe server will start at http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Web Interface: http://localhost:8000/static/index.html
from models import create_detector
from tracking import create_tracker
from processor import VideoProcessor
import cv2
# Create detector
detector = create_detector(
model_type='yolov8',
variant='n',
confidence_threshold=0.5,
device='cuda'
)
# Detect objects in image
image = cv2.imread('image.jpg')
detections = detector.detect(image)
print(f"Detected {len(detections['boxes'])} objects")
# Process video with tracking
processor = VideoProcessor()
stats = processor.process_video(
source='video.mp4',
output_path='output.mp4',
show_display=True
)Edit configs/config.yaml to customize:
- Model settings (type, confidence, IOU thresholds)
- Tracking parameters (max age, min hits)
- Video processing options
- Analytics settings
- API configuration
Example configuration:
model:
type: "yolov8"
variant: "n"
confidence_threshold: 0.5
device: "cuda"
tracking:
type: "sort"
max_age: 30
min_hits: 3
video:
output_path: "outputs/videos"
save_output: true
display_fps: trueOrganize your dataset in YOLO format:
dataset/
├── images/
│ ├── train/
│ ├── val/
│ └── test/
└── labels/
├── train/
├── val/
└── test/
Create dataset configuration:
from training import ModelTrainer
trainer = ModelTrainer()
trainer.create_dataset_config(
dataset_path='data/my_dataset',
class_names=['person', 'car', 'bike'],
output_path='data/dataset.yaml'
)python demo_train.py data/dataset.yaml --model yolov8n.pt --epochs 100Options:
--model: Base model to start from--epochs: Number of training epochs--batch: Batch size--imgsz: Input image size
from training import ModelTrainer
trainer = ModelTrainer()
# Export to ONNX
trainer.export_model('runs/train/exp/weights/best.pt', format='onnx')
# Export to TensorRT
trainer.export_model('runs/train/exp/weights/best.pt', format='tensorrt')from evaluation import ModelEvaluator
from models import create_detector
evaluator = ModelEvaluator()
detector = create_detector('yolov8')
# Prepare test data
test_data = [
{
'image': image,
'annotations': [
{'box': [x1, y1, x2, y2], 'class_id': 0},
# ...
]
}
]
# Evaluate model
results = evaluator.evaluate_model(detector, test_data)
print(f"mAP@0.5: {results['mAP']['mAP@0.5']:.3f}")
print(f"Precision: {results['precision']:.3f}")
print(f"Recall: {results['recall']:.3f}")
print(f"FPS: {results['fps']['mean_fps']:.1f}")POST /detect/image
- Upload image and get detection results (JSON)
POST /detect/image/visualized
- Upload image and get visualized result (image)
POST /detect/video
- Upload video for processing
- Returns statistics and download link
GET /analytics
- Get current analytics data
POST /reset
- Reset analytics counters
WS /ws/stream
- Real-time video streaming endpoint
from optimization import ModelOptimizer
optimizer = ModelOptimizer()
onnx_path = optimizer.export_to_onnx(
'yolov8n.pt',
'model.onnx',
img_size=640
)trt_path = optimizer.export_to_tensorrt(
'yolov8n.pt',
'model.engine',
img_size=640,
fp16=True
)results = optimizer.benchmark_optimization(
original_model='yolov8n.pt',
optimized_model='model.engine',
test_images=images,
num_runs=100
)
print(f"Speedup: {results['speedup']:.2f}x")Computer-Vision---Object-Detection-Tracking/
├── src/
│ ├── models/ # Detection models
│ │ ├── yolo_detector.py
│ │ ├── faster_rcnn_detector.py
│ │ └── ssd_detector.py
│ ├── tracking/ # Tracking algorithms
│ │ ├── sort_tracker.py
│ │ └── deepsort_tracker.py
│ ├── training/ # Training pipeline
│ ├── evaluation/ # Evaluation metrics
│ ├── api/ # REST API
│ ├── utils/ # Utilities
│ ├── processor.py # Video processor
│ └── optimization.py # Performance optimization
├── static/ # Web interface assets
├── templates/ # HTML templates
├── configs/ # Configuration files
├── data/ # Data directory
├── outputs/ # Output directory
├── demo_image.py # Image detection demo
├── demo_video.py # Video processing demo
├── demo_train.py # Training demo
├── run_api.py # API server
└── requirements.txt # Dependencies
- YOLOv8n: Nano - Fastest, lowest accuracy
- YOLOv8s: Small - Balanced
- YOLOv8m: Medium - Good accuracy
- YOLOv8l: Large - High accuracy
- YOLOv8x: Extra Large - Best accuracy, slowest
- ResNet-50 FPN backbone
- Pre-trained on COCO dataset
- SSD300 with VGG16 backbone
- Pre-trained on COCO dataset
Tested on NVIDIA RTX 3080 with CUDA 11.8:
| Model | Input Size | FPS (TensorRT FP16) | mAP@0.5 |
|---|---|---|---|
| YOLOv8n | 640x640 | 120+ | 0.52 |
| YOLOv8s | 640x640 | 85+ | 0.60 |
| YOLOv8m | 640x640 | 55+ | 0.67 |
| Faster R-CNN | 800x800 | 25+ | 0.58 |
| SSD300 | 300x300 | 90+ | 0.50 |
- Reduce batch size
- Use smaller model variant (e.g., YOLOv8n instead of YOLOv8x)
- Lower input image resolution
- Use GPU acceleration (set
device: cuda) - Export model to TensorRT
- Enable FP16 precision
- Reduce input resolution
- Skip frames (
frame_skipin config)
- Adjust confidence threshold
- Use larger model variant
- Train custom model on your specific dataset
- Check lighting and image quality
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details
If you use this project in your research, please cite:
@software{object_detection_tracking,
title={Real-time Object Detection and Tracking System},
author={Computer Vision Team},
year={2024},
url={https://github.com/xclusivecyberdev/Computer-Vision---Object-Detection-Tracking}
}- YOLOv8 by Ultralytics
- PyTorch and TorchVision
- OpenCV
- FastAPI
- SORT and DeepSORT algorithms
For issues and questions:
- Create an issue on GitHub
- Check documentation and examples
- Review configuration options
- Support for additional models (YOLOv9, YOLOv10)
- Multi-camera support
- Real-time video streaming from RTSP
- Docker containerization
- Cloud deployment guides (AWS, GCP, Azure)
- Mobile deployment (TensorFlow Lite, ONNX Runtime Mobile)
- Advanced analytics (people counting, zone crossing)
- Database integration for analytics storage
Built with ❤️ by the Computer Vision Team