#

llm-serving

Here are 26 public repositories matching this topic...

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Jun 14, 2024
Python

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu mlops llm inferentia llmops llm-serving trainium

Updated Jun 14, 2024
Python

BentoML

bentoml / BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jun 14, 2024
Python

bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Updated Jun 11, 2024
Python

superduperdb

SuperDuperDB / superduperdb

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated Jun 13, 2024
Python

skypilot-org / skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Updated Jun 14, 2024
Python

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jun 14, 2024
Python

ray-project / ray-llm

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 28, 2024
Python

mosec

mosecorg / mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated Jun 8, 2024
Python

hpcaitech / SwiftInfer

Efficient AI Inference & Serving

deep-learning inference artificial-intelligence llama gpt llm-serving llm-inference llama2

Updated Jan 8, 2024
Python

chenhunghan / ialacol

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

python kubernetes ai gpu helm cuda openai cloudnative llm langchain llm-serving llamacpp ggml gptq llm-inference

Updated Feb 5, 2024
Python

friendliai / friendli-client

Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated Jun 12, 2024
Python

asprenger / ray_vllm_inference

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

inference pytorch transformer ray model-serving mlops llm llmops llm-serving vllm

Updated Apr 6, 2024
Python

HPMLL / BurstGPT

A GPT-3.5 & GPT-4 Workload Trace to Optimize LLM Serving Systems

dataset mlsys llm llm-serving

Updated Jun 13, 2024
Python

Stosan / commentator

generative-ai llm-serving

Updated Jul 5, 2023
Python

fork123aniket / LLM-RAG-powered-QA-App

A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App

question-answering ray fine-tuning context-aware-system large-language-models ray-serve llmops llm-serving eleutherai llm-training llm-inference retrieval-augmented-generation parameter-efficient-fine-tuning

Updated Jan 8, 2024
Python

France-Travail / happy_vllm

A REST API for vLLM, production ready

production transformers api-rest serving mlops llm llm-serving vllm

Updated Jun 11, 2024
Python

valyu-network / Stitch

Stitch simplifies and scales LLM application deployment, reducing infrastructure complexity and costs.

llm-serving llm-inference llm-framework llmstack

Updated Jun 2, 2024
Python

friendliai / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

llms generative-ai llm-serving llm-inference

Updated Sep 30, 2023
Python

InquestGeronimo / horizon-takeoff

Automating the deployment of the Takeoff Server on AWS for LLMs

aws machine-learning cloud deep-learning ec2 server artificial-intelligence llmops llm-serving llm-inference

Updated Jan 16, 2024
Python

Improve this page

Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."