Model Inference and Deployment

We mainly provide the following ways for inference and local deployment.

llama.cpp

A tool for quantizing model and deploying on local CPU or GPU

Link: llama.cpp-Deployment

🤗Transformers

Original transformers inference method, support CPU/GPU

Link: Inference-with-Transformers

text-generation-webui

A tool for deploying model as a web UI.

Link: text-generation-webui

LlamaChat

LlamaChat is a macOS app that allows you to chat with LLaMA, Alpaca, etc. Support GGML (.bin) and PyTorch (.pth) formats.

Link: Using-LlamaChat-Interface

LangChain

LangChain is a framework for developing LLM-driven applications, designed to assist developers in building end-to-end applications using LLM.

With the components and interfaces provided by LangChain, developers can easily design and build various LLM-powered applications such as question-answering systems, summarization tools, chatbots, code comprehension tools, information extraction systems, and more.

Link: Integrated-with-LangChain

privateGPT

privateGPT is an open-source project based on llama-cpp-python and LangChain among others. It aims to provide an interface for localizing document analysis and interactive Q&A using large models. Users can utilize privateGPT to analyze local documents and use GPT4All or llama.cpp compatible large model files to ask and answer questions about document content, ensuring data localization and privacy.

Link: Use-privateGPT-for-multi-document-QA

中文文档

模型合并与转换
- 在线模型合并与转换（Colab）
- 手动模型合并与转换
模型量化、推理、部署
效果与评测
- 指令理解与生成效果
- C-Eval评测效果与脚本
训练细节
- 预训练脚本
- 指令精调脚本
常见问题

English Docs

Model Reconstruction
- Online conversion with Colab
- Manual Conversion
Model Quantization, Inference and Deployment
System Performance
- Instruction-following and Text Generation
- C-Eval
Training Details
- Pre-training Script
- SFT Script
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly