Stars
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
DeepEP: an efficient expert-parallel communication library
Examples for using ONNX Runtime for machine learning inferencing.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator