A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Sep 23, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
A REST API for vLLM, production ready
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
Friendli: the fastest serving engine for generative AI
Okik is serving framework to deploy LLMs and much more.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
A library to benchmark LLMs via their API exposure
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Stitch simplifies and scales LLM application deployment, reducing infrastructure complexity and costs.
RayLLM - LLMs on Ray
A Framework For Intelligence Farming
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
Building Static Web Applications using Large Language Model. From hand sketched documents, images and screenshots to proper web pages.
Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.
To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."