# Build EdgeDB RAG

## Overview

To create a question answering chatbot, we start by **building the index**. 

Index is made of two components:

- vectorstore to store document embeddings and perform vector search.
- docstore to store full documents.

To build an index, we need to have the documents stored as Markdown files, as well as the metadata JSON file.


The process of responding to a user message includes the following stages:

1. **Contextualization**: Turning a message into a search query using chat history.
2. **Query analysis**: Breaking down the search query into a similarity search part and a filter.
3. **Retrieval**: Using vectorstore to retrieve relevant documents with similarity search.
4. **Generation**: Producing the final answer based on documents and chat history.


In [None]:
from dotenv import load_dotenv, find_dotenv
from pathlib import Path
from tqdm import tqdm
import sys

_ = load_dotenv(find_dotenv())  # read local .env file
sys.path.append(Path("..").resolve().as_posix())

## Setting up components

In [None]:
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings

from src.core.index import Index, ID_KEY
from src.core.retriever import build_retriever
from src.core.generator import build_generator

Building an index of the documentation goes like this:

1. Read the metadata file. It stores information about the docs as a JSON lines file with the following format:

    ```json
    {"url":"relative/path/to/doc.md","category":"edgedb_general"}
    ```

    The category is going to be used by the retriever down the line to filter out irrelevant documents.
    For example, changelogs and integrations in different languages that the user didn't ask about.

2. Based on the metadata file, load the documents and create their embeddings.

In [None]:
# build the index or load it from disk

persist_path = Path("index_storage").resolve()
embedding_function = AzureOpenAIEmbeddings(
    azure_deployment="text-embedding-ada-002", api_version="2023-07-01-preview"
)

if persist_path.exists():
    index = Index.from_persist_path(
        persist_path=persist_path,
        embedding_function=embedding_function,
    )
else:
    # build from scratch using a metadata file
    index = Index.from_metadata(
        metadata_path=Path("../resources/doc_metadata.jsonl").resolve(),
        lib_path=Path("../../docs_md").resolve(),
        persist_path=persist_path,
        embedding_function=embedding_function,
    )

For each user message we are going to use 3 LLM requests to synthesise the answer, namely for contextualization, query analysis and final generation.

Throughout this notebook we're going to use the GPT-4o API via Azure.
However, the first two steps arguably don't need to be performed by such a heavy model, and neither of them need to be performed by a model provided by OpenAI.
The only requirement for the model is that it needs to have **function calling** capabilities.

LangChain supports sevaral of such models, enabling us to use them as drop in replacements.



In [None]:
# create a LangChain chat model instance

llm = AzureChatOpenAI(
    temperature=0.0,
    azure_deployment="gpt4o",
    openai_api_version="2023-07-01-preview",
    max_retries=0,
)

In [None]:
# build a retriever that finds relevant documents via query analysis and vector search

retriever = build_retriever(
    llm=llm, vectorstore=index.vectorstore, docstore=index.docstore
)

In [None]:
# build a generator that responds to user questions in a conversation using documentation

generator = build_generator(llm=llm, retriever=retriever)

## Generating a response

Here we use a basic LangChain `invoke` call to generate the answer based on a query. This can also be replaced by a `stream` call to receive a streaming response instead, to avoid having the user wait while the entire response gets generated.

**Important**: exact contents of the `config` argument are going to vary depending on the way the chat history is handled by a particular application.

To see an example of non-default chat history management and handling the streaming response, see `demo.py`.

In [None]:
from langchain.globals import set_debug

set_debug(False)  # see all under-the-hood operations performed by langchain

# generate a response
response = generator.invoke(
    {"input": "Can I use row-level security on EdgeDB?"},
    config={
        "configurable": {"session_id": "1"}
    },  # constructs a key "abc123" in `store`.
)

In [None]:
# unwrap the response for ease of reading

def pretty_print_response(response):
    print(f"************PROMPT************\n\n{response['input']}\n\n")
    print(f"************ANSWER************\n\n{response['answer'].answer}\n\n")

    print(f"***********CITATIONS**********\n\n")

    for citation in response["answer"].citations:
        doc = response["retrieval_result"]["documents"][citation.source_id]
        print(f"Source {citation.source_id}: {doc.metadata['source']}\n")
        print(f"Category: {doc.metadata['category']}\n")
        print(f"Quote:\n{citation.quote}\n\n")

    print(f"*********SEARCH TERMS*********\n")
    search_terms = response["retrieval_result"]["search_terms"]
    print(f"Query: {search_terms.query}")
    print(f"Filter: {search_terms.category}\n\n")

    print(f"************SOURCES***********\n\n")

    for i, doc in enumerate(response["retrieval_result"]["documents"]):
        print(f"Source {i}: {doc.metadata['source']}\n")
        print(f"Category: {doc.metadata['category']}\n")
        print(f"Content:\n{doc.page_content}\n\n")

In [None]:
pretty_print_response(response)