GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
-
Updated
May 27, 2025 - C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Find secrets with Gitleaks 🔑
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
Official inference library for Mistral models
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
High-speed Large Language Model Serving for Local Deployment
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
🚀 全网效果最好的移动端【实时对话数字人】。 支持本地部署、多模态交互(语音、文本、表情),响应速度低于 1.5 秒,适用于直播、教学、客服、金融、政务等对隐私与实时性要求极高的场景。开箱即用,开发者友好。
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Superduper: End-to-end framework for building custom AI applications and agents.
Standardized Serverless ML Inference Platform on Kubernetes
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
FlashInfer: Kernel Library for LLM Serving
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
The edge and AI gateway for agentic apps. Arch handles the messy low-level work in building agents like applying guardrails, routing prompts to the right agent, and unifying access to any LLM. It’s a language and framework friendly infrastructure layer designed to help you build and ship agentic apps faster.
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."