tensorrt-llm

Here are 3 public repositories matching this topic...

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan

Nitro is an C++ inference server on top of TensorRT-LLM. OpenAI-compatible API. Run blazing fast inference on Nvidia GPUs. Used in Jan

Whisper in TensorRT-LLM

Add a description, image, and links to the tensorrt-llm topic page so that developers can more easily learn about it.

To associate your repository with the tensorrt-llm topic, visit your repo's landing page and select "manage topics."