An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
-
Updated
Jun 9, 2024 - Python
An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
A REST API for vLLM, production ready
Examples of serving LLM on Modal.
Run code inference-only benchmarks quickly using vLLM
Chat with Lex! A RAG app, using HyDE with milvus DB for vector store, VLLM for LLM inference, and FastEmbed for Embeddings!
Preserving entities through the integration of knowledge graphs, Llama 2, vLLM, and LangChain.
Genshin Impact Character Chat Models tuned by Lora on LLM
Carbon Limiting Auto Tuning for Kubernetes
Standardized spec and vendor-specific transforms for ChatML
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
A large-scale simulation framework for LLM inference
Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent,格式化输出,指令追随,长文本,多语言,代码,自定义任务的能力基准测试。
大模型推理框架加速,让 LLM 飞起来
This repository has a lot of LLM projects done. It is the best place to start learning LLM.
Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.
To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."