Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan
-
Updated
May 30, 2024 - C++
Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan
Nitro is an C++ inference server on top of TensorRT-LLM. OpenAI-compatible API. Run blazing fast inference on Nvidia GPUs. Used in Jan
Whisper in TensorRT-LLM
Add a description, image, and links to the tensorrt-llm topic page so that developers can more easily learn about it.
To associate your repository with the tensorrt-llm topic, visit your repo's landing page and select "manage topics."