# Understanding Retrieval Question Answering

## Parsing documents

We will use a small sample of markdown documents in this notebook. Let's find them and make sure we can stuff them into the prompt. That means they may need to be chunked and not exceed some number of tokens. 

In [None]:
import os
from pprint import pprint

# This function is used to find all the markdown files in a directory  
def find_md_files(directory):
    md_files = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith(".md"):
                file_path = os.path.join(root, file)
                with open(file_path, "r") as f:
                    md_files.append(Document(page_content=f.read(), metadata={"source": file_path}))
    return md_files

documents = find_md_files('docs_sample')
len(documents)

In [57]:
import tiktoken

# We will need to coiunt tokens in the documents, and for that we need to know the encoding
encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')
encoding_name = encoding.name
encoding_name

'cl100k_base'

In [58]:
# function to count the number of tokens in each document
def count_tokens(documents):
    token_counts = [len(encoding.encode(document.page_content)) for document in documents]
    return token_counts

count_tokens(documents)

[4179, 365, 1206, 2596, 2940, 537, 956, 803, 1644, 2529, 2093]

In [59]:
from langchain.text_splitter import MarkdownTextSplitter

# Let's use Langchain markdown splitter to split the documents into sections
md_text_splitter = MarkdownTextSplitter()
document_sections = md_text_splitter.split_documents(documents)
len(document_sections), max(count_tokens(document_sections))

(32, 987)

In [60]:
from langchain.text_splitter import TokenTextSplitter

# We need to create more granular chunks of text, so we will use the TokenTextSplitter
token_splitter = TokenTextSplitter(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap=0,
    allowed_special={"<|endoftext|>"},
)
document_sections = token_splitter.split_documents(documents)
len(document_sections), max(count_tokens(document_sections))

(45, 512)

## Embeddings

Let's now use embeddings with a vector database retriever to find relevant documents for a query. 

In [65]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# We will use the OpenAIEmbeddings to embed the text, and Chroma to store the vectors
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(document_sections, embeddings)
retriever = db.as_retriever()
query = "How can I share my W&B report with my team members in a public W&B project?"
docs = retriever.get_relevant_documents(query)

# Let's see the results
for doc in docs:
    print(doc.metadata["source"])

docs_sample/collaborate-on-reports.md
docs_sample/teams.md
docs_sample/teams.md
docs_sample/teams.md


## Stuff Prompt

We'll now take the content of the retrieved documents, stuff them into prompt template along with the query, and pass into an LLM to obtain the answer. 

In [66]:
from langchain.prompts import PromptTemplate

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

context = "\n\n".join([doc.page_content for doc in docs])
prompt = PROMPT.format(context=context, question=query)

In [67]:
# Use langchain to call openai chat API with the question
from langchain.llms import OpenAI
llm = OpenAI()
response = llm.predict(prompt)
pprint(response)

(' To share a report, select the **Share** button on the upper right hand '
 'corner. You can either provide an email account or copy the magic link. '
 'Users invited by email will need to log into Weights & Biases to view the '
 'report. Users who are given a magic link to not need to log into Weights & '
 'Biases to view the report. Shared reports are view-only.')


## Using Langchain

Langchain gives us tools to do this efficiently in few lines of code. Let's do the same using `RetrievalQA` chain. We will also pass a W&B Tracer callback into the chain so we can analyze and debug it. 

In [69]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

from wandb.integration.langchain import WandbTracer

wandb_config = {"project": "llmapps"}

qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)
result = qa.run(query, callbacks=[WandbTracer(wandb_config)])
pprint(result)

WandbTracer.finish()

[34m[1mwandb[0m: Streaming LangChain activity to W&B at https://wandb.ai/darek/llmapps/runs/ga1z0jg1
[34m[1mwandb[0m: `WandbTracer` is currently in beta.
[34m[1mwandb[0m: Please report any issues to https://github.com/wandb/wandb/issues with the tag `langchain`.


(' You can share your report in a public W&B project by selecting the "Share" '
 'button on the upper right hand corner. You can then provide an email account '
 'or copy the magic link. Users invited by email will need to log into Weights '
 '& Biases to view the report. Users who are given a magic link do not need to '
 'log into Weights & Biases to view the report. Shared reports are view-only.')
