# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [1]:
# pip install --upgrade langchain

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

In [2]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [15]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

In [16]:
from langchain.indexes import VectorstoreIndexCreator

In [5]:
#pip install docarray

In [39]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

100%|██████████| 1/1 [00:01<00:00,  1.08s/it]


In [49]:
query ="Please list what hoodies you have \
in a table in markdown "

In [50]:
response = index.query(query)

In [51]:
display(Markdown(response))

 

| Name | Brand | Size |
|------|-------|------|
| Mens WordPress Hoodie | WordPress | Large, Medium |
| Purple Divi Engine Text Zipper Hoodie | Divi Engine |  |

In [52]:
loader = CSVLoader(file_path=file)

In [53]:
docs = loader.load()

In [54]:
docs[0]

Document(page_content='ID: 66\nType: simple\nSKU: \nName: Divi Engine String Bag (Big Logo)\nPublished: 1\nIs featured?: 0\nVisibility in catalog: visible\nShort description: This fashionable string bag is made of 100% cotton. It is the perfect size for carrying your everyday essentials.\nDescription: \nDate sale price starts: \nDate sale price ends: \nTax status: taxable\nTax class: \nIn stock?: 1\nStock: \nLow stock amount: \nBackorders allowed?: 0\nSold individually?: 0\nWeight (kg): \nLength (cm): \nWidth (cm): \nHeight (cm): \nAllow customer reviews?: 1\nPurchase note: \nSale price: \nRegular price: 19.99\nCategories: Accessories\nTags: \nShipping class: \nImages: https://ajax-filters-bc.diviengine.com/sampledata/images/Bag1.jpg, https://ajax-filters-bc.diviengine.com/sampledata/images/Bag1-1.jpg, https://ajax-filters-bc.diviengine.com/sampledata/images/Bag1-2.jpg\nDownload limit: \nDownload expiry days: \nParent: \nGrouped products: \nUpsells: \nCross-sells: \nExternal URL: \nBut

In [55]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [56]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [57]:
print(len(embed))

1536


In [58]:
print(embed[:5])

[-0.021913960576057434, 0.006774206645786762, -0.018190348520874977, -0.039148248732089996, -0.014089343138039112]


In [59]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

100%|██████████| 1/1 [00:00<00:00,  1.10it/s]


In [60]:
query = "Please suggest a shirt with sunblocking"

In [61]:
docs = db.similarity_search(query)

In [62]:
len(docs)

4

In [63]:
docs[0]

Document(page_content='ID: 89\nType: variable\nSKU: \nName: Divi Engine Tee\nPublished: 1\nIs featured?: 0\nVisibility in catalog: visible\nShort description: This comfortable cotton t-shirt that features the Divi Engine logo on the front is perfect for any occasion. The shirt is available in three colors.\nDescription: \nDate sale price starts: \nDate sale price ends: \nTax status: taxable\nTax class: \nIn stock?: 1\nStock: \nLow stock amount: \nBackorders allowed?: 0\nSold individually?: 0\nWeight (kg): \nLength (cm): \nWidth (cm): \nHeight (cm): \nAllow customer reviews?: 1\nPurchase note: \nSale price: \nRegular price: \nCategories: Men, Men > Shirts\nTags: \nShipping class: \nImages: https://ajax-filters-bc.diviengine.com/sampledata/images/Shirt-3-yellow-front.jpg, https://ajax-filters-bc.diviengine.com/sampledata/images/Shirt-3-blue-front.jpg, https://ajax-filters-bc.diviengine.com/sampledata/images/Shirt-3-white-front.jpg\nDownload limit: \nDownload expiry days: \nParent: \nGrou

In [64]:
retriever = db.as_retriever()

In [65]:
llm = ChatOpenAI(temperature = 0.0)


In [66]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])


In [67]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 


In [68]:
display(Markdown(response))

| ID | Name                  | Description                                                                                                 |
|----|-----------------------|-------------------------------------------------------------------------------------------------------------|
| 89 | Divi Engine Tee       | This comfortable cotton t-shirt features the Divi Engine logo on the front. Available in three colors.       |
| 105| Divi Tee              | This comfortable cotton t-shirt features the Divi logo on the front and back.                                 |
| 151| Divi Simplified Crop-top | This comfortable cotton crop-top features the Divi and Divi Engine logos on the front and back.              |
| 111| WordPress Tee         | This comfortable cotton t-shirt features the WordPress logo on the front and back.                            |

Each shirt is made of comfortable cotton and is suitable for any occasion. The Divi Engine Tee features the Divi Engine logo on the front and is available in three colors. The Divi Tee features the Divi logo on the front and back. The Divi Simplified Crop-top features both the Divi and Divi Engine logos on the front and back. The WordPress Tee features the WordPress logo on the front and back.

In [69]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [70]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [71]:
response = qa_stuff.run(query)



[1m> Entering new  chain...[0m

[1m> Finished chain.[0m


In [72]:
display(Markdown(response))

I'm sorry, but I don't have any information about sun protection for the shirts.

In [73]:
response = index.query(query, llm=llm)

In [74]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

100%|██████████| 1/1 [00:01<00:00,  1.11s/it]
