# LangChain QA Application with Xinference and LangChain


This demo walks through how to build an LLM-driven question-answering (QA) application with Xinference, Milvus, and LangChain.

## Deploy Xinference Locally or in a Distributed Cluster.

For local deployment, run `xinference`. It will log an endpoint for you to use.

To deploy Xinference in a cluster, first start an Xinference supervisor using the `xinference-supervisor`. You can also use the option -p to specify the port and -H to specify the host. The default port is 9997. If the default port is used, Xinference will choose an unused port for you. It will also log the endpoint for you to use.

Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. 

You can consult the README file from [Xinference](https://github.com/xorbitsai/inference) for more information.
## Start a Model

To use Xinference with LangChain, you need to first launch a model. You can use command line interface (CLI) to do so:

In [2]:
!xinference launch --model-name "falcon-instruct" --model-format pytorch --size-in-billions 40 -e "http://127.0.0.1:56256"

Model uid: 19c73cee-3506-11ee-b286-fa163e74fa2d


The command will return a model UID for you to use.

## Prepare the Documents

In [3]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader("/home/nijiayi/inference/examples/state_of_the_union.txt") # Replace with the path of the document you want to query from

documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 512,
    chunk_overlap  = 100,
    length_function = len,
)
docs = text_splitter.split_documents(documents)


## Set Up an Embedding Model

In [4]:
from langchain.embeddings import XinferenceEmbeddings

xinference_embeddings = XinferenceEmbeddings(
    server_url="http://127.0.0.1:56256", 
    model_uid = "19c73cee-3506-11ee-b286-fa163e74fa2d" # model_uid is the uid returned from launching the model
)

## Connect to the Vector Database

For vector store, we use the Milvus vector database. [Milvus](https://milvus.io/docs/overview.md) is a database that stores, indexes, and manages massive embedding vectors generated by deep neural networks and other machine learning models. To run, you can first [Install Milvus Standalone with Docker Compose](https://milvus.io/docs/install_standalone-docker.md), or use Milvus Lite in the following way:

In [None]:
$ pip install milvus

$ milvus-server

In [None]:
from langchain.vectorstores import Milvus

vector_db = Milvus.from_documents(
    docs,
    xinference_embeddings,
    connection_args={"host": "0.0.0.0", "port": "19530"},
)

## Query about the Document

In [6]:
query = "what does the president say about Ketanji Brown Jackson"
docs = vector_db.similarity_search(query, k=10)
print(docs[0].page_content) 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.


### Model Inference Based on the Document

Now, we use Llama 2 Chat model supported by Xinference for inference task. 

In [7]:
!xinference launch --model-name "llama-2-chat" --model-format ggmlv3 --size-in-billions 70 -e "http://127.0.0.1:56256"

Model uid: 333e1d68-3507-11ee-a0d6-fa163e74fa2d


In [17]:
from langchain.llms import Xinference

xinference_llm = Xinference(
    server_url="http://127.0.0.1:56256",
    model_uid = "333e1d68-3507-11ee-a0d6-fa163e74fa2d" # model_uid is the uid returned from launching the model
)

First, we can query the LLM without using the document:

In [18]:
xinference_llm(prompt="What did the president say about Ketanji Brown Jackson?")

'\nWhat did the president say about Ketanji Brown Jackson?\nPresident Joe Biden called Judge Ketanji Brown Jackson a "historic" and "inspiring" nominee when he introduced her as his pick to replace retiring Supreme Court Justice Stephen Breyer. He highlighted her experience as a public defender and her commitment to justice and equality, saying that she would bring a unique perspective to the court.\n\nBiden also praised Jackson\'s reputation for being a "fair-minded" and "thoughtful" jurist who is known for her ability to build'

We now query using the document to compare the result. We can create a memory object to track the chat history.

In [19]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

Now we create ConversationalRetrievalChain with chat model and the vectorstore.

In [20]:
from langchain.chains import ConversationalRetrievalChain

chain = ConversationalRetrievalChain.from_llm(
    llm=xinference_llm,
    retriever=vector_db.as_retriever(),
    memory=memory)

Now, we can query information from the document. Instead of simply returning identical sentences from the document, the model generates responses by summarizing relevant content. Furthermore, it can relate a new query to the chat history, creating a chain of responses that build upon each other. 

In [21]:
query = "What did the president say about Ketanji Brown Jackson"
result = chain({"question": query})
result["answer"]

' According to the provided text, President Biden said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago, and that she is one of our nation’s top legal minds who will continue Justice Breyer’s legacy of excellence.'

We can see that the LLM is capable of using the provided document to answer questions and summarize content. We can ask a few more questions:

In [22]:
query = "Did he mention who she succeeded"
result = chain({"question": query})
result["answer"]

'  According to the given text, President Biden said that Ketanji Brown Jackson succeeded Justice Breyer on the Supreme Court.'

The LLM accurately recognizes that "he" refers to "the president", and "she" refers to "Ketanji Brown Jackson" mentioned in the previous query. 

In [24]:
query = "Summarize the President's opinion on COVID-19"
result = chain({"question": query})
result['answer']

'  According to the text, the president views COVID-19 as a "God-awful disease" and wants to move forward in addressing it in a unified manner, rather than allowing it to continue being a partisan dividing line.'

We can see the impressive capabilities of the LLM, and LangChain's "chaining" feature also allows for more coherent and context-aware interactions with the model.