matmul
Here are 4 public repositories matching this topic...
Raspberry Pi Pico (RP2040) and Adafruit Metro M7 (NXP IMXRT10XX) benchmark
-
Updated
Jan 12, 2024 - Python
This project integrates a custom CUDA-based matrix multiplication kernel into a PyTorch deep learning model, leveraging GPU acceleration for matrix operations. The goal is to compare the performance of this custom kernel with PyTorch's built-in matrix multiplication and demonstrate how custom CUDA kernels can optimize compute-intensive operations.
-
Updated
Aug 26, 2024 - Python
Improve this page
Add a description, image, and links to the matmul topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the matmul topic, visit your repo's landing page and select "manage topics."