# Questions and answers RAG Chatbot

We can use the power of LLM to get answers from our own dataset. This is called retrieval augmented generation (RAG), as you would retrieve the relevant data and use it as augmented context for the LLM. Instead of relying solely on knowledge derived from the training data, a RAG workflow pulls relevant information and connects static LLMs with data retrieval sources (VectorDBs).

In the following example we will create a chatbot with Gradio. Use self hosted Qdrant Vector DB to store document embeddings that will be used in the RAG pipeline.

**Note:** For embeddings it's strongly recommended to create an account on Cohere's website: https://dashboard.cohere.com/welcome/login?redirect_uri=%2Fapi-keys and generate a `Trial key`.

We will start by installing the preprequisites:

In [None]:
!python3 -m pip install openai gradio langchain_qdrant langchain-openai pypdf langchain_cohere==0.1.9 langchain-huggingface langchain_experimental langchain-nvidia-ai-endpoints

Start the application and click on the public URL to open the application in a new tab.

In [10]:
from gradio_chatbot import setup_chatbot, create_empty_collections, clean_up_vector_db
from langchain_cohere import CohereEmbeddings
from langchain_openai import ChatOpenAI
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
from openai import OpenAI
from langchain import hub


QDRANT_URL = "http://qdrant:6333"

embedding_models = {
    "1-nv-embedqa-e5-v5": {
        "model": "nvidia/nv-embedqa-e5-v5",
        "embedding_function": lambda api_key: NVIDIAEmbeddings(
            base_url = "http://nemo-embedding-ms:8080/v1",
            model="nvidia/nv-embedqa-e5-v5"
        ),
        "size": 1024
    }
}

llm_models = { 
    "1-mistral-7b-instruct-with-stained-glass": {
        "model": "mistral-7b-instruct",
        "llm_function": lambda api_key, kwargs={}: ChatOpenAI(
            base_url="http://stained-glass-engine-proxy.default.svc.cluster.local/v1",  # `llm` here is the URL of the Stained Glass Proxy service
            model="mistral-7b-instruct",  # Only the `mistral-7b-instruct` model is available
            api_key="anything",  # Any API key can be used
        )
    }
}

clean_up_vector_db(QDRANT_URL)
create_empty_collections(QDRANT_URL, embedding_models)
demo = setup_chatbot(llm_models, embedding_models, QDRANT_URL)
demo.queue().launch(share=True)

Discovered model is: meta/llama3-8b-instruct




Running on local URL:  http://127.0.0.1:7872
Running on public URL: https://c50696e01b772eab3a.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




File saved to: /home/jovyan/examples/data/Node Name Research - Kubernetes Engine - OCI Confluence.pdf
File saved to: /home/jovyan/examples/data/Jabra Elite 3 Active User Manual_EN_English_RevB.pdf


How the application works:

1. Select the LLM and fill-in the LLM API Key
2. Select the embedding model and fill-in the embedding API Key (if required)
3. Click Load Model

If the model is loaded successfuly, you should see: '<selected llm model>' and '<selected embeddings model>' models loaded

4. Upload a document that will be used for RAG (txt, pdf are supported)
5. Click Create Vector Store.
6. Use the chatbot interface to interact with the LLM.
7. If you change any Text Generation parameter, you have to Click "Load Model".