LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
-
Updated
Nov 18, 2024 - Python
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
Add a description, image, and links to the openai-triton topic page so that developers can more easily learn about it.
To associate your repository with the openai-triton topic, visit your repo's landing page and select "manage topics."