Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
-
Updated
Apr 22, 2025 - C++
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
row-major matmul optimization
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Energinets Model Testbench. Automate gridcompliance studies in PSCAD and Powerfactory.
Set of examples written for hardware acceleration via TornadoVM
Inline PTX Assembly in CUDA example
Bloch's equations and Optimal Control for MRI and NMR applications
FastPtx: a python pTx pulse design tool for freely optimizing RF and gradient pulses with autodifferentiation
Visual Studio Code extension with PTX assembly syntax support
公共運輸整合資訊流通服務平臺(Public Transport Data eXchange,PTX)的非官方 Golang 用戶端程式庫
Add a description, image, and links to the ptx topic page so that developers can more easily learn about it.
To associate your repository with the ptx topic, visit your repo's landing page and select "manage topics."