#

llm-serving

Here are 2 public repositories matching this topic...

torchpipe / torchpipe

Serving Inside Pytorch With Multi-threads

deployment inference pytorch ray serve tensorrt serving pipeline-parallelism torch2trt triton-inference-server llm-serving

Updated Jul 19, 2024
C++

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Updated Jul 20, 2024
C++

Improve this page

Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."