High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
-
Updated
May 20, 2024 - C++
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & more LLMs
TinyChatEngine: On-Device LLM Inference Library
Fast Multimodal LLM on Mobile Devices
Tiny C++11 GPT-2 inference implementation from scratch
PyTorch library for cost-effective, fast and easy serving of MoE models.
Cuda implementation of Extended Long Short Term Memory (xLSTM) with C++ and PyTorch ports
Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
This is a special PyTorch For Poor Guys Who can't afford big GPU
Add a description, image, and links to the large-language-models topic page so that developers can more easily learn about it.
To associate your repository with the large-language-models topic, visit your repo's landing page and select "manage topics."