Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama
-
Updated
Mar 27, 2024 - C++
Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama
Leverage tensor parallelism techniques to run large language models in the CPU memory of edge devices.
Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm
Local LLM Inference
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
Local LLM inference Library
LLM in Godot
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
A high-performance inference system for large language models, designed for production environments.
LLMs as Copilots for Theorem Proving in Lean
Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."