This project explores the enhancement of Yolo-v4 performance through Scalar Matrix Multiplication (MM) in oneAPI. We focus on optimizing convolution layers in the Yolo-v4 model, integrating oneAPI with Python.
We utilize oneAPI to optimize deep learning computations in the Yolo-v4 model, aiming for improved efficiency and accuracy in object detection.
Our approach covers:
- Scalar Matrix Multiplication in oneAPI
- Python-C++ Integration
- Convolution Layer Wrapper
- Yolo-v4 Modification and Implementation
- Performance Analysis
- Scalar MM:
icpx -fsycl smm.cpp -o smm
- Python-C++ Integration:
icpx -fsycl -fPIC -shared -o libsmm.so shared.cpp
- Convolution Wrapper:
python3 wrapper.py
- Yolo-v4:
python3 yoto.py
,python3 yolo.py
- Performance Analysis:
python3 compare.py
Demonstrates minimal speedup in convolution layers of the Yolo-v4 model on CPU devices.
Discusses challenges in FLOPs calculation for complex models with custom implementations.
Highlights the potential of Scalar MM in oneAPI for deep learning optimization.
- Vikash Singh (vxs465)
- Thomas Bornhorst (thb34)
Case Western Reserve University