Rutgers University - ECE - Parallel and Distributed Computing - Project 1
-
Updated
Oct 18, 2018 - C++
Rutgers University - ECE - Parallel and Distributed Computing - Project 1
What features does your CPU and OS support?
Implementation of 2D Convolution operation for Neural Networks using Intel x86(i368)/x86-6(amd64) AVX-256 instructions. All data flow methods, i.e input stationary, weight stationary and output stationary are implemented. The forward pass of Alexnet architecture is constructed using it.
AVX2 and SSE2 usecases and benchmarks
an exercise in SIMD-optimization
Performance-portable, length-agnostic SIMD with runtime dispatch
Chromium fork named after radioactive element No. 90. Windows and MacOS/Raspi/Android/Special builds are in different repositories, links are towards the top of the README.md.
Add a description, image, and links to the avx-instructions topic page so that developers can more easily learn about it.
To associate your repository with the avx-instructions topic, visit your repo's landing page and select "manage topics."