Skip to content

Model Inference and Deployment

Ziqing Yang edited this page Jun 8, 2023 · 10 revisions

We mainly provide the following ways for inference and local deployment.

llama.cpp

A tool for quantizing model and deploying on local CPU or GPU

Link: llama.cpp-Deployment

🤗Transformers

Original transformers inference method, support CPU/GPU

Link: Inference-with-Transformers

text-generation-webui

A tool for deploying model as a web UI.

Link: text-generation-webui

LlamaChat

LlamaChat is a macOS app that allows you to chat with LLaMA, Alpaca, etc. Support GGML (.bin) and PyTorch (.pth) formats.

Link: Using-LlamaChat-Interface

LangChain

LangChain is a framework for developing LLM-driven applications, designed to assist developers in building end-to-end applications using LLM.

With the components and interfaces provided by LangChain, developers can easily design and build various LLM-powered applications such as question-answering systems, summarization tools, chatbots, code comprehension tools, information extraction systems, and more.

Link: Integrated-with-LangChain

privateGPT

privateGPT is an open-source project based on llama-cpp-python and LangChain among others. It aims to provide an interface for localizing document analysis and interactive Q&A using large models. Users can utilize privateGPT to analyze local documents and use GPT4All or llama.cpp compatible large model files to ask and answer questions about document content, ensuring data localization and privacy.

Link: Use-privateGPT-for-multi-document-QA

Clone this wiki locally