llm-inference
Here are 16 public repositories matching this topic...
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
-
Updated
May 20, 2024 - C++
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
-
Updated
May 29, 2024 - C++
Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
-
Updated
May 28, 2024 - C++
LLMs as Copilots for Theorem Proving in Lean
-
Updated
May 29, 2024 - C++
A high-performance inference system for large language models, designed for production environments.
-
Updated
May 28, 2024 - C++
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
-
Updated
Mar 15, 2024 - C++
LLM in Godot
-
Updated
May 29, 2024 - C++
Local LLM inference Library
-
Updated
May 20, 2024 - C++
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
-
Updated
May 29, 2024 - C++
Local LLM Inference
-
Updated
May 26, 2024 - C++
Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm
-
Updated
Mar 25, 2024 - C++
Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama
-
Updated
Mar 27, 2024 - C++
Leverage tensor parallelism techniques to run large language models in the CPU memory of edge devices.
-
Updated
May 28, 2024 - C++
Improve this page
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."