Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

License

Notifications You must be signed in to change notification settings

menloresearch/cortex.tensorrt-llm

Error
Looks like something went wrong!

About

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 99.3%
  • Python 0.5%
  • Cuda 0.2%
  • CMake 0.0%
  • Shell 0.0%
  • Smarty 0.0%