Stitch simplifies and scales LLM application deployment, reducing infrastructure complexity and costs.
-
Updated
Jun 2, 2024 - Python
Stitch simplifies and scales LLM application deployment, reducing infrastructure complexity and costs.
You can run any large language model on your local machine with this repository.
A framework for few-shot evaluation of autoregressive language models.
A library to benchmark LLMs via their API exposure
Automating the deployment of the Takeoff Server on AWS for LLMs
A guide on how to run LLMs on intel CPUs
Okik is serving framework to deploy LLMs and much more.
A Framework For Intelligence Farming
EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
Streaming of LLM responses in realtime using Fastapi and Streamlit.
Building Static Web Applications using Large Language Model. From hand sketched documents, images and screenshots to proper web pages.
A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App
A REST API for vLLM, production ready
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Friendli: the fastest serving engine for generative AI
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.
To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."