gangliao

🏠

Working from home

Gang Liao gangliao

🏠

Working from home

ML/LLM Serving at Scale

171 followers · 1 following

Achievements

x2 x3

Achievements

x2 x3

Highlights

Lists (1)

Sort

🚀 My stack

Stars

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,281 792 Updated Mar 1, 2025

deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…

Python 19,766 2,094 Updated Mar 13, 2025

mynameisfiber / high_performance_python_2e

Code for the book "High Performance Python 2e" by Micha Gorelick and Ian Ozsvald with OReilly

Python 427 142 Updated Jan 18, 2023

karpathy / LLM101n

LLM101n: Let's build a Storyteller

32,355 1,749 Updated Aug 1, 2024

google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 10,681 1,202 Updated Mar 1, 2025

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,170 652 Updated Mar 9, 2025

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,619 377 Updated Dec 4, 2024

google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Python 5,379 524 Updated Mar 12, 2025

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,481 899 Updated Jul 1, 2024

facebook / buck2

Build system, successor to Buck

Rust 3,742 241 Updated Mar 13, 2025

zwegner / zp7

ZP7: Zach's Peppy Parallel-Prefix-Popcountin' PEXT/PDEP Polyfill

C 50 4 Updated Aug 14, 2024

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,800 104 Updated Jan 21, 2024

harvardnlp / annotated-transformer

An annotated implementation of the Transformer paper.

Jupyter Notebook 6,076 1,288 Updated Apr 7, 2024

meta-llama / codellama

Inference code for CodeLlama models

Python 16,238 1,900 Updated Aug 12, 2024

PKU-YuanGroup / ChatLaw

ChatLaw：A Powerful LLM Tailored for Chinese Legal. 中文法律大模型

7,173 565 Updated Jan 4, 2025

facebook / squangle

SQuangLe is a C++ API for accessing MySQL servers

C++ 125 54 Updated Mar 7, 2025

google / re2

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

C++ 9,185 1,150 Updated Feb 28, 2025

oprecomp / FloatX

Header-only C++ library for low precision floating point type emulation.

C++ 169 26 Updated Jan 24, 2020

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 18,155 2,215 Updated Aug 6, 2024

cwida / FastLanes

Towards a New File Format

C++ 209 13 Updated Mar 4, 2025

powturbo / TurboPFor-Integer-Compression

Fastest Integer Compression

C 791 112 Updated Mar 1, 2024

erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python

Python 5,158 785 Updated Mar 3, 2025

facebook / wdt

Warp speed Data Transfer (WDT) is an embeddedable library (and command line tool) aiming to transfer data between 2 systems as fast as possible over multiple TCP paths.

C++ 2,895 390 Updated Jan 24, 2025

TsinghuaDatabaseGroup / AIDB

ai4db and db4ai work

751 90 Updated Dec 26, 2024

weaviate / weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …

Go 12,745 891 Updated Mar 13, 2025

pgvector / pgvector

Open-source vector similarity search for Postgres

C 14,572 698 Updated Feb 20, 2025

Hannibal046 / Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

22,055 1,809 Updated Mar 4, 2025

google / cuckoo-index

Cuckoo Index: A Lightweight Secondary Index Structure

C++ 129 17 Updated Dec 2, 2021

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

C++ 33,627 3,787 Updated Mar 11, 2025

janestreet / magic-trace

magic-trace collects and displays high-resolution traces of what a process is doing

OCaml 4,831 98 Updated Nov 22, 2024