Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
-
Updated
May 29, 2024 - Python
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
Sparsity-aware deep learning inference runtime for CPUs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Code examples and resources for DBRX, a large language model developed by Databricks
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
RayLLM - LLMs on Ray
LLMFlows - Simple, Explicit and Transparent LLM Apps
[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
irresponsible innovation. Try now at https://chat.dev/
Efficient AI Inference & Serving
GPU environment and cluster management with LLM support
LLMs and Machine Learning done easily
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
A tool for generating function arguments and choosing what function to call with local LLMs
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."