LeetGPU

A collection of exercises from LeetGPU (GitHub), featuring implementations in CUDA, PyTorch, and Triton.

Prerequisites

Tested on WSL2 Ubuntu 24.04.

System Requirements

CUDA Toolkit
CMake 4
Python 3.12+, GTest, NVBench
uv

sudo apt-get install -y python3.12-dev libgtest-dev
uv sync

Usages

Run CUDA Tests

make build && make test

Run CUDA Benchmarks

make build-release && make bench

Run Python Tests

make py-sync && make py-test

Clean

make clean

NCU

Follow (the Windows section for WSL2) in NVIDIA Developer Tools Solutions: Permission Issue with Performance Counters to grant access to the GPU performance counters to all users.
Restart WSL in powershell by running wsl --shutdown
Run ncu (without sudo):

ncu \
  --set=full \ # Most comprehensive profiling
  -f \ # Force overwrite output files if they already exist
  --kernel-name-base demangled \ # Use human-readable kernel names in output
  --kernel-name 'regex:vector_add' \ # Only profile kernels matching the regex pattern "vector_add"
  -o vector_add \ # Output results to files with "vector_add" prefix (creates .ncu-rep files)
  ./001_vector_addition_benchmark \ # The executable to profile. Here is a nvbench program. Flags for nvbench program can be found in https://github.com/NVIDIA/nvbench/blob/main/docs/cli_help.md
  --profile \ # Run once only
  --axis "N=67108864" # Run the benchmark with N=67108864

This will generate vector_add.ncu-rep which can be opened in:

Nsight Compute GUI (Windows): For interactive analysis with charts and recommendations
Command line: ncu -i vector_add.ncu-rep for text-based analysis

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
001e-vector-addition		001e-vector-addition
001m-reduction		001m-reduction
002e-matrix-multiplication		002e-matrix-multiplication
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeetGPU

Prerequisites

System Requirements

Usages

Run CUDA Tests

Run CUDA Benchmarks

Run Python Tests

Clean

NCU

About

Uh oh!

Releases

Packages

Languages

txfs19260817/LeetGPU

Folders and files

Latest commit

History

Repository files navigation

LeetGPU

Prerequisites

System Requirements

Usages

Run CUDA Tests

Run CUDA Benchmarks

Run Python Tests

Clean

NCU

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages