Skip to content
View robclu's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report robclu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A web based graphical editor of ZMK keymaps.

JavaScript 1,489 386 Updated Mar 3, 2025

EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING

C++ 42 1 Updated Mar 29, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,367 232 Updated Mar 28, 2025

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 242 17 Updated Aug 31, 2024

A collection of 500+ real-world ML & LLM system design case studies from 100+ companies. Learn how top tech firms implement GenAI in production.

200 36 Updated Mar 9, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,945 188 Updated Mar 29, 2025

aider is AI pair programming in your terminal

Python 30,188 2,735 Updated Mar 29, 2025

✨ AI-powered coding, seamlessly in Neovim

Lua 2,959 179 Updated Mar 27, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,396 814 Updated Mar 27, 2025

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,440 389 Updated Mar 5, 2025
Python 3 Updated Oct 31, 2024

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,109 536 Updated Mar 28, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,329 684 Updated Mar 28, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,386 811 Updated Mar 1, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,948 230 Updated Mar 4, 2025

A high-performance and efficient message queue developed in Rust

Rust 72 5 Updated Feb 19, 2025

Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

Python 21 3 Updated Mar 21, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,518 261 Updated Mar 29, 2025

Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch

Python 1,247 109 Updated Mar 14, 2025

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 27,245 1,670 Updated Mar 21, 2025

Use your Neovim like using Cursor AI IDE!

Lua 11,771 476 Updated Mar 29, 2025

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,223 73 Updated Mar 6, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,109 119 Updated Mar 23, 2025

LLM KV cache compression made easy

Python 444 31 Updated Mar 19, 2025
Python 400 33 Updated Mar 26, 2025

Experiments on speculative sampling with Llama models

Python 125 6 Updated Jun 8, 2023

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

Python 277 23 Updated Feb 24, 2025

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook 3,848 330 Updated Feb 20, 2025

Caliptra IP and firmware for integrated Root of Trust block

275 40 Updated Mar 29, 2025
Next
Showing results