Skip to content
View PatStiles's full-sized avatar

Block or report PatStiles

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Learning about CUDA by writing PTX code.

Python 122 4 Updated Feb 27, 2024
Cuda 19 Updated Feb 10, 2025

A lightweight data processing framework built on DuckDB and 3FS.

Python 3,900 320 Updated Mar 5, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 7,623 666 Updated Mar 7, 2025

DuckDB is an analytical in-process SQL database management system

C++ 27,160 2,131 Updated Mar 6, 2025

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 17,273 1,432 Updated Feb 25, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 8,670 569 Updated Mar 7, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 11,552 1,171 Updated Mar 7, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 4,442 414 Updated Mar 7, 2025

Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL

Python 853 51 Updated Mar 4, 2025

Utility for finding long Tokio polls

Rust 7 Updated Feb 17, 2025

Low level access to RISC-V processors

Rust 912 169 Updated Mar 6, 2025

Sparse Decentralized Collaborative Simultaneous Localization and Mapping Framework for Multi-Robot Systems

Shell 460 49 Updated Feb 6, 2025

[ETH Course] Exercise materials of the Machine Learning on Microntrollers course

C 6 1 Updated Mar 3, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 6,989 1,141 Updated Feb 28, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,522 243 Updated Mar 5, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,074 64 Updated Feb 28, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 37,253 4,283 Updated Mar 7, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,817 469 Updated Mar 5, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,059 608 Updated Mar 6, 2025

Source code for "Building Cryptographic Proofs from Hash Functions"

TeX 182 27 Updated Feb 21, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,184 779 Updated Mar 1, 2025

Runtime for executing procedural macros as WebAssembly

Rust 1,354 28 Updated Mar 3, 2025

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Python 511 19 Updated Mar 7, 2025

Pretraining code for a large-scale depth-recurrent language model

Python 661 54 Updated Mar 5, 2025

Shh! Alerts you when you are too loud.

Rust 2 Updated Jan 23, 2025
Rust 39 14 Updated Feb 22, 2025

An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl

TypeScript 4,790 574 Updated Feb 23, 2025
Next
Showing results