# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [None]:
#!pip install -U docarray

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

In [None]:
from langchain.document_loaders import CSVLoader
from IPython.display import display, Markdown

### Retrieving answers form documents using langchain 

In [None]:
file = '../OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

In [None]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

In [None]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch 
).from_loaders([loader])

# Note: OpenAIEmbeddings is the default embedding

In [None]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [None]:
response = index.query(query)

In [None]:
display(Markdown(response))

## What's happening underneath

### LLM's on Documents
![LLM intro](../images/llm_intro.png)
  - LLM's can inspect only a few thousand words at a time
  - If we have large documents how can we get LLM's to answer quesion about it
  - This is where **vector stores** and **embeddings** come into play

### Embeddings
![Embeddings](../images/embeddings.png)
  - Embedding creates numerical representaion for pieces of text
  - This numerical representaion captures semantic meaning of the piece of text
  - Text with similar contents have similar vectors
  - This allows us to compare pieces of text in the vector space

##### Consider the exapmple below
  1. My dog Rover likes to chase squirrels.
  1. Fluffy, my cat, refuses to eat from a can.
  1. The Chevy Bolt accelerates to 60 mph in 6.7 seconds.

In [None]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [None]:
text1 = "My dog Rover likes to chase squirrels."
text2 = "Fluffy, my cat, refuses to eat from a can."
text3 = "The Chevy Bolt accelerates to 60 mph in 6.7 seconds."

In [None]:
embed1 = embeddings.embed_query(text1)
embed2 = embeddings.embed_query(text2)
embed3 = embeddings.embed_query(text3)

In [None]:
# print(len(embed1)) 
# 1536

In [None]:
# print(embed1[:5])
# [-0.02120191603899002, -0.010773591697216034, 0.004235907457768917, 0.006397020071744919, -0.024629006162285805]

# print(embed2[:5])
# [-0.02368231862783432, -0.027932319790124893, -0.006375002674758434, -0.002569616539403796, -0.0007376205176115036]

# print(embed3[:5])
# [0.004903301130980253, 0.014738245867192745, 0.002021784894168377, -0.013944647274911404, -0.022346707060933113]


##### If you look at the vector space
  - Sentences 1 and 2 are very similar because they refers to pets.
  - But sentence 3 is not similar to either 1 or 2, since it's about a car.

##### Hence
- This allows us to easily figure out which pieces of texts are similar.
- So we can choose which pieces of text to be pass into the LLM to answer question.

### Vector Database

 ![Vector Database](../images/vector_databse.png)

 - Stores the vectors generated by embeddings
 - We break large documents into smaller chunks and store them along with
   their embeddings.
 - This is what happens when we create an index.
 - When a query comes we first create an embedding for that query
 - Then we compare with all vectors in the vector database and we pick
   the most similar n chunks.

 ![Vector index](../images/vector_index.png)

 - These are then passed in the prompt to the language model to
   get back the final answer. 

## A detailed step by step way to get answers from documents using langchain

 - This will enble us to learn what's happening under the hood

### 1. Create a document loader from the csv with all discriptions of the products that we want to do question answering over 

In [2]:
from langchain.document_loaders import CSVLoader
file = '../OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
docs = loader.load()

### 2. Create embeddings

In [3]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

### 3. Create embeddings for all pieces of text and store them in a vector store

In [4]:
from langchain.vectorstores import DocArrayInMemorySearch

In [5]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

# this will create a vector store

#### 3.1 We can query the vector store to get entries similar to the query

In [None]:
query = "Please suggest a shirt with sunblocking"

In [None]:
relevent_docs = db.similarity_search(query)

In [None]:
#len(relevent_docs)

In [None]:
#relevent_docs[0]

### 4. How do we use this vector store to perform Quesion Answering over the documents

#### 4.1 We create a retriever
 - A retriever is a genric interface that can be underpinned by
   any method that takes a query and returns documents
   
 - vector store and embeddings are one such method

In [None]:
retriever = db.as_retriever()

### 5. Since we want text generation and a natural language response we need a language model

In [None]:
from langchain.chat_models import ChatOpenAI

In [None]:
llm = ChatOpenAI(temperature = 0.0)

### 6. Answers to a question over the documents can be obtained by

 - Combine all documents into a single piece of text

 - pass the text along with the question to the language model

In [None]:
qdocs = "".join([relevent_docs[i].page_content for i in range(len(relevent_docs))])

In [None]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [None]:
display(Markdown(response))

## All the above steps can be encapsulated by the langchain chain

### A `RetrievalQA` chain does retrieval and Q&A over the retrieved documents.

#### To create a `RetrievalQA` chain we need to pass

 - A language model for text generation

 - A chain type, a method which determines how the context is generated
   and how to call language model for the answer. In `stuff` method all documents are stuffed to the context and make one call to the language
   model

 - A retriever to fetch documents and pass it to the language model

In [None]:
from langchain.chains import RetrievalQA # this will retrieve answers from given documents

In [None]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

### Now we can create a query and run the chain on this query

In [None]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [None]:
response = qa_stuff.run(query)

In [None]:
display(Markdown(response))

### We can do retrieval from customized indexes

In [None]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch, # customized vectorstore
    embedding=embeddings, # customized embeddings
).from_loaders([loader])

## Stuff Method

![Stuff Method](../images/suff_method.png)

 - Simple but unsutable for large documents

### Map_reduce

![Map_reduce](../images/map_reduce.png)

- Take all the chunks and passes them along with the question to llm

- Get the response and use another llm to summarize all the individual
  response to a final answer

- Here we can do individual queries in parallel

- But takes more calls

- Treats all documents independent, which may not always be desirable

### Refine

![Refine](../images/refine.png)

- It loops over many documents

- But unlike map_reduce it does it iteratively, it builds up on the answer
  from the previos documents

- Good for combining information and building up an answer over time

- Generally lead to longer answers

- Not very fast, bceause call are done iteratively

### Map_rerank

![Map_rerank](../images/map_rerank.png)

 - Do a single call to a language model for each document and ask it
   to return a score.

 - This relies on language model to know what the score should be

 - Select the documents with highest score