LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
-
Updated
Jun 17, 2024 - Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Spiral's Machine Learning Library
🚀 TensorRT-YOLO: Supports YOLOv3, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, and PP-YOLOE using TensorRT acceleration with EfficientNMS, CUDA Kernels and CUDA Graphs!
Kernel Tuner
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
This is a cross-chip platform collection of operators and a unified neural network library.
SParry is a shortest path calculating Python tool using some algorithms with CUDA to speedup.
CNNs for spectrogram-based music recommendation (Undergraduate dissertation)
Pytorch implementation of a message passing neural network with RNN sub-units
Implementation of ConjugateGradients method using C and Nvidia CUDA
Heurestic Reasoning Agent with python interface (PyHERA)
This repository contains examples CUDA usage in Cython code.
Cloud Computing
Object Tracking using GPU acceleration.
Add a description, image, and links to the cuda-kernels topic page so that developers can more easily learn about it.
To associate your repository with the cuda-kernels topic, visit your repo's landing page and select "manage topics."