Skip to content
View gangliao's full-sized avatar
🏠
Working from home
🏠
Working from home

Highlights

  • Pro

Block or report gangliao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashMLA: Efficient MLA decoding kernels

C++ 11,281 792 Updated Mar 1, 2025

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…

Python 19,766 2,094 Updated Mar 13, 2025

Code for the book "High Performance Python 2e" by Micha Gorelick and Ian Ozsvald with OReilly

Python 427 142 Updated Jan 18, 2023

LLM101n: Let's build a Storyteller

32,355 1,749 Updated Aug 1, 2024

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 10,681 1,202 Updated Mar 1, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,170 652 Updated Mar 9, 2025

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,619 377 Updated Dec 4, 2024

The official PyTorch implementation of Google's Gemma models

Python 5,379 524 Updated Mar 12, 2025

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,481 899 Updated Jul 1, 2024

Build system, successor to Buck

Rust 3,742 241 Updated Mar 13, 2025

ZP7: Zach's Peppy Parallel-Prefix-Popcountin' PEXT/PDEP Polyfill

C 50 4 Updated Aug 14, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,800 104 Updated Jan 21, 2024

An annotated implementation of the Transformer paper.

Jupyter Notebook 6,076 1,288 Updated Apr 7, 2024

Inference code for CodeLlama models

Python 16,238 1,900 Updated Aug 12, 2024

ChatLaw:A Powerful LLM Tailored for Chinese Legal. 中文法律大模型

7,173 565 Updated Jan 4, 2025

SQuangLe is a C++ API for accessing MySQL servers

C++ 125 54 Updated Mar 7, 2025

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

C++ 9,185 1,150 Updated Feb 28, 2025

Header-only C++ library for low precision floating point type emulation.

C++ 169 26 Updated Jan 24, 2020

Inference Llama 2 in one file of pure C

C 18,155 2,215 Updated Aug 6, 2024

Towards a New File Format

C++ 209 13 Updated Mar 4, 2025

Fastest Integer Compression

C 791 112 Updated Mar 1, 2024

Benchmarks of approximate nearest neighbor libraries in Python

Python 5,158 785 Updated Mar 3, 2025

Warp speed Data Transfer (WDT) is an embeddedable library (and command line tool) aiming to transfer data between 2 systems as fast as possible over multiple TCP paths.

C++ 2,895 390 Updated Jan 24, 2025

ai4db and db4ai work

751 90 Updated Dec 26, 2024

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …

Go 12,745 891 Updated Mar 13, 2025

Open-source vector similarity search for Postgres

C 14,572 698 Updated Feb 20, 2025

Awesome-LLM: a curated list of Large Language Model

22,055 1,809 Updated Mar 4, 2025

Cuckoo Index: A Lightweight Secondary Index Structure

C++ 129 17 Updated Dec 2, 2021

A library for efficient similarity search and clustering of dense vectors.

C++ 33,627 3,787 Updated Mar 11, 2025

magic-trace collects and displays high-resolution traces of what a process is doing

OCaml 4,831 98 Updated Nov 22, 2024
Next
Showing results