Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
-
Updated
Mar 15, 2024 - C++
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
Add a description, image, and links to the moe topic page so that developers can more easily learn about it.
To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."