serving

Star

Here are 104 public repositories matching this topic...

deepjavalibrary / djl-serving

Star

A universal scalable machine learning model deployment solution

deep-learning deployment inference pytorch serving djl

Updated Jun 18, 2024
Java

ray-project / ray

Star

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Jun 18, 2024
Python

vespa-engine / vespa

Star

AI + Data, online. https://vespa.ai

java search-engine machine-learning big-data ai server cpp tensorflow vespa serving serving-recommendation vector-search

Updated Jun 18, 2024
Java

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

structured-data serving unstructured-data unified-sql vector-database mysql-compatibility embedding-search embedding-store key-value-distributed-store vector-ocean real-time-semantic-search

Updated Jun 18, 2024
Java

Lightning-AI / LitServe

Star

Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.

api ai serving

Updated Jun 18, 2024
Python

openvinotoolkit / model_server

Star

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated Jun 18, 2024
C++

friendliai / friendli-client

Star

Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated Jun 18, 2024
Python

SeldonIO / seldon-core

Star

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

kubernetes machine-learning deployment serving aiops production-machine-learning mlops machine-learning-operations

Updated Jun 17, 2024
HTML

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 17, 2024
C++

intel / intel-ai-inference-samples

Star

Intel® AI Inference Samples provide example code for deploying optimized inference in Intel platforms.

sample ai intel inference bert serving ipex openvino

Updated Jun 17, 2024
Python

pytorch / serve

Star

Serve, optimize and scale PyTorch models in production

docker kubernetes machine-learning cpu deep-learning metrics gpu optimization pytorch serving mlops

Updated Jun 18, 2024
Java

tensorflow / serving

Star

A flexible, high-performance serving system for machine learning models

python machine-learning deep-neural-networks deep-learning neural-network cpp tensorflow ml serving

Updated Jun 15, 2024
C++

polyaxon / haupt

Star

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

Updated Jun 11, 2024
Python

France-Travail / happy_vllm

Star

A REST API for vLLM, production ready

production transformers api-rest serving mlops llm llm-serving vllm

Updated Jun 11, 2024
Python

torchpipe / torchpipe

Star

Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends