A General-purpose Parallel and Heterogeneous Task Programming System
-
Updated
Jun 10, 2024 - C++
A General-purpose Parallel and Heterogeneous Task Programming System
CUDA C++ Core Libraries
Thin, unified, C++-flavored wrappers for the CUDA APIs
TinyChatEngine: On-Device LLM Inference Library
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
An implementation of HIP that works on CPUs, across OSes.
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
simple GPU ransac fitting of multiple lines on 2d/3d point cloud
Reconstruct mesh from point cloud data generated by 3D scanner
Converts an RGB image to greyscale using parallel programming.
A simple ray-tracing program implemented with CUDA.
CUDA Programming Starter Kit for VSCode and CLion
Parallel LiDAR Point Cloud Preprocessing for Autonomous Driving Applications
Based on TensorRT v8.2, build network for YOLOv5-v5.0 by myself, speed up YOLOv5-v5.0 inferencing
A simple image filter example for those who study GPU/CUDA programming
📀NVIDIA DeepStream integrated GStreamer Plugin. It can blur objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎
CUDA solutions for the lab assignments in the UIUC-ECE408 Applied Parallel Programming course.
CUDA Gemm Convolution implementation
High-Performance Memory Optimal CNN
Add a description, image, and links to the cuda-programming topic page so that developers can more easily learn about it.
To associate your repository with the cuda-programming topic, visit your repo's landing page and select "manage topics."