#

llm-serving

Here are 30 public repositories matching this topic...

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu mlops xpu llm inferentia llmops llm-serving trainium

Updated Sep 23, 2024
Python

BentoML

bentoml / BentoML

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Sep 23, 2024
Python

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava llama2 llama3 llama3-1

Updated Sep 22, 2024
Python

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Sep 23, 2024
Python

mosec

mosecorg / mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated Sep 22, 2024
Python

skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Updated Sep 23, 2024
Python

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Sep 20, 2024
Python

France-Travail / happy_vllm

A REST API for vLLM, production ready

production api-rest llm llm-serving vllm

Updated Sep 17, 2024
Python

bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.

Updated Sep 16, 2024
Python

EmbeddedLLM / embeddedllm

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

windows cpu llama gemma mistral directx-12 openvino npu openvino-inference-engine aipc directml llm model-inference llm-serving llm-inference open-source-llm phi-3 ipexllm

Updated Sep 10, 2024
Python

friendliai / friendli-client

Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated Aug 30, 2024
Python

okikorg / okik

Okik is serving framework to deploy LLMs and much more.

python machine-learning deeplearning model-serving llm llmops llm-serving llm-inference

Updated Sep 11, 2024
Python

HPMLL / BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

dataset mlsys llm llm-serving

Updated Aug 17, 2024
Python

France-Travail / benchmark_llm_serving

A library to benchmark LLMs via their API exposure

benchmark llm llm-serving vllm

Updated Aug 5, 2024
Python

interestingLSY / swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

cuda transformers inference pytorch transformer llama gpt inference-engine model-serving mlops llm llmops llm-serving llm-inference

Updated Jul 5, 2024
Python

valyu-network / Stitch

Stitch simplifies and scales LLM application deployment, reducing infrastructure complexity and costs.

llm-serving llm-inference llm-framework llmstack

Updated Jun 2, 2024
Python

ray-project / ray-llm

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 28, 2024
Python

Neural-Dragon-AI / Cynde

A Framework For Intelligence Farming

xgboost autoscaling pydantic openai-api polars llm-serving llm-inference modal-labs pydantic-logfire intelligence-farming

Updated May 18, 2024
Python

asprenger / ray_vllm_inference

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

inference pytorch transformer ray model-serving mlops llm llmops llm-serving vllm

Updated Apr 6, 2024
Python

george-mountain / web-app-builder--LLM

Building Static Web Applications using Large Language Model. From hand sketched documents, images and screenshots to proper web pages.

ai pypi pypi-package streamlit llm llm-serving

Updated Mar 12, 2024
Python

Improve this page

Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."