An academic project on accelerating Neural Network training by optimizing the GEMM kernel on multi-core CPUs and GPUs. (NTUA)
python machine-learning deep-learning multiprocessing parallel-computing cuda tiling neural-networks high-performance-computing numba shared-memory gemm performance-optimization gpu-programming memory-coalescing
-
Updated
Dec 16, 2025 - Python