## RAG example with Langchain, Milvus, and vLLM

Requirements:
- A Milvus instance, either standalone or cluster.
- Connection credentials to Milvus must be available as environment variables: MILVUS_USERNAME and MILVUS_PASSWORD.
- A vLLM inference endpoint. In this example we use the OpenAI Compatible API.

### Needed packages and imports

In [1]:
!pip install -q einops==0.7.0 langchain==0.1.9 pymilvus==2.3.6 sentence-transformers==2.4.0 openai==1.13.3


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import os
from langchain.callbacks.base import BaseCallbackHandler
from langchain.chains import RetrievalQA
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.llms import VLLMOpenAI
from langchain.prompts import PromptTemplate
from langchain_community.vectorstores import Milvus

#### Bases parameters, Inference server and Milvus info

In [3]:
# Replace values according to your Milvus deployment
INFERENCE_SERVER_URL = "http://vllm.vllm.svc.cluster.local:8000/v1/"
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"
MAX_TOKENS=1024
TOP_P=0.95
TEMPERATURE=0.01
PRESENCE_PENALTY=1.03
MILVUS_HOST = "vectordb-milvus.milvus.svc.cluster.local"
MILVUS_PORT = 19530
MILVUS_USERNAME = ""
MILVUS_PASSWORD = ""
MILVUS_COLLECTION = "estatutos_galicia"

#### Initialize the connection

In [4]:
model_kwargs = {'trust_remote_code': True}
embeddings = HuggingFaceEmbeddings(
    model_name="nomic-ai/nomic-embed-text-v1",
    model_kwargs=model_kwargs,
    show_progress=False
)

store = Milvus(
    embedding_function=embeddings,
    connection_args={"host": MILVUS_HOST, "port": MILVUS_PORT, "user": MILVUS_USERNAME, "password": MILVUS_PASSWORD},
    collection_name=MILVUS_COLLECTION,
    metadata_field="metadata",
    text_field="page_content",
    drop_old=False
    )

You try to use a model that was created with version 2.4.0.dev0, however, your version is 2.4.0. This might cause unexpected behavior or errors. In that case, try to update to the latest version.



  state_dict = loader(resolved_archive_file)
<All keys matched successfully>


#### Initialize query chain

In [5]:
template="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant named HatBot answering questions.
You will be given a question you need to answer, and a context to provide you with information. You must answer the question based as much as possible on this context.
Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Context: 
{context}

Question: {question} [/INST]
"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

llm =  VLLMOpenAI(
    openai_api_key="EMPTY",
    openai_api_base=INFERENCE_SERVER_URL,
    model_name=MODEL_NAME,
    max_tokens=MAX_TOKENS,
    top_p=TOP_P,
    temperature=TEMPERATURE,
    presence_penalty=PRESENCE_PENALTY,
    streaming=True,
    verbose=False,
    callbacks=[StreamingStdOutCallbackHandler()]
)

qa_chain = RetrievalQA.from_chain_type(
        llm,
        retriever=store.as_retriever(
            search_type="similarity",
            search_kwargs={"k": 2}
            ),
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
        return_source_documents=True
        )

os.environ["TOKENIZERS_PARALLELISM"] = "false"

#### Query example

In [35]:
question = "Explicame sobre el ARTÍCULO SEGUNDO del estatulo loma negra?"
result = qa_chain.invoke({"query": question})

The second article of the Loma Negra Statute, as provided in the context you've given, outlines the requirements for publishing notices for Assemblies. According to this article, notices for Assemblies must be published in the Official Bulletin and one of Argentina's major newspapers at least 10 days before the event and no more than 30 days prior. The notice should include the character of the Assembly, date, time, and location, as well as the Agenda. If a second Assembly is needed due to the failure of the first, it must be held within 30 days and notices published for three days with at least eight days' advance notice. An Ordinary Assembly can be called simultaneously for the first and second convocation. If the second Assembly is scheduled for the same day as the first, there must be an hour-long interval between them.

Additionally, Article 14 requires consent or ratification from the affected Classes when the Assembly needs to adopt resolutions affecting their rights. This conse

In [9]:
question = "By whom will the Ordinary and/or Extraordinary Assemblies be convened?"
result = qa_chain.invoke({"query": question})

The context provided does not contain information on who is responsible for convening the Ordinary and/or Extraordinary Assemblies for Molinos Río de la Plata Sociedad Anónima. According to the statute social, the Asamblea (Assembly) can be modified in terms of its convocation by the Comisión Nacional de Valores (National Securities Commission), but it does not specify who initiates the process.

#### Retrieve source

In [7]:
def remove_duplicates(input_list):
    unique_list = []
    for item in input_list:
        if item.metadata['source'] not in unique_list:
            unique_list.append(item.metadata['source'])
    return unique_list

results = remove_duplicates(result['source_documents'])

for s in results:
    print(s)

https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.6/html-single/getting_started_with_red_hat_openshift_ai_self-managed/index
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.6/html-single/openshift_ai_tutorial_-_fraud_detection_example/index
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.6/html-single/working_on_data_science_projects/index
