## QnA over Documents (doc RAG)
Extract information from specialized data LLM models have not been trained on.<br><br>

* Emebeddings<br>
![<embeddings.png>](../img/embeddings.png)<br><br>

* Vector Database<br>
![vectordb.png](../img/vectordb.png)

In [None]:
# required imports
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains import RetrievalQA
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
# with specifics to our local model
from langchain_ollama import ChatOllama
from langchain_ollama.llms import OllamaLLM
from langchain_ollama import OllamaEmbeddings

In [None]:
# get data
filepath = "./data/clothing.csv"
loader = CSVLoader(filepath)
docs = loader.load()

In [None]:
# setup the embedding model
embeddings = OllamaEmbeddings(model="nomic-embed-text")

In [None]:
# we want to create embeddings for all the docs and store them in the vector store
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [None]:
# then query the vdb content
query = "Please suggest a shirt with sunblocking"
results = db.similarity_search(query)
print(list(results))
print(len(results))

[Document(metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 255}, page_content=': 255\nname: Sun Shield Shirt by\ndescription: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \n\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\n\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\n\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\n\nSun Protection That Won\'t Wear Off\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.'), Document(metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 619}, page_content=": 619\nname: Tropical Breeze Shirt\n

In [None]:
# to setup a QnA capability we first need to create a retriever
ir = db.as_retriever()
# then we need an llm to produce natural language answers
model = "gemma3:12b"
llm = ChatOllama(model=model, temperature=0)

#### Manually exploiting docs

In [None]:
# by concatenating all the results into a single string
qresults = "".join([results[i].page_content for i in range(len(results))])
# and then passing the concatenated string to the llm before asking the question
response = llm.invoke(f"{qresults} Question: Please list all your shirts with sun protection in a table in markdown and summarize each one.")

In [None]:
display(Markdown(response.content))

Okay, here's a table summarizing the shirts with sun protection, based on the provided descriptions.

| Shirt Name | Description Summary | Fabric Composition | UPF Rating | Key Features |
|---|---|---|---|---|
| **Sun Shield Shirt** | High-performance sun shirt designed to block harmful UV rays.  Slightly fitted style. | 78% Nylon, 22% Lycra Xtra Life | 50+ | Wicks moisture, fits over swimsuit, abrasion resistant, handwash/line dry. |
| **Tropical Breeze Shirt** | Lightweight, breathable long-sleeve shirt for sun protection. Originally designed for fishing. Traditional fit. | 71% Nylon, 29% Polyester (Shell), 100% Polyester (Cape Lining) | 50+ | Wrinkle-resistant, moisture-wicking, front & back cape venting, two front pockets, machine wash/dry. |
| **Men's Plaid Tropic Shirt, Short-Sleeve** | Ultracomfortable sun protection shirt, originally designed for fishing.  | 52% Polyester, 48% Nylon | 50+ | Wrinkle-resistant, moisture-wicking, front & back cape venting, two front pockets, machine wash/dry, imported design. |
| **Sunrise Tee** | Women's UV-protective button-down shirt. Lightweight and designed to beat the heat. | 71% Nylon, 29% Polyester (Shell), 100% Polyester (Cape Lining) | 50+ | Wrinkle-resistant, moisture-wicking, front & back cape venting, smoother buttons, low-profile pockets, side shaping, machine wash/dry, imported. |

**Common Themes & Notes:**

*   **UPF 50+:** All shirts boast the highest possible UPF rating, blocking 98% of harmful UV rays.
*   **Origin:** Many are originally designed for fishing, highlighting their practicality for extended outdoor use.
*   **Moisture-Wicking & Quick-Drying:** A consistent feature across all shirts, ensuring comfort in hot weather.
*   **Wrinkle Resistance:**  A desirable quality for travel and ease of care.
*   **Ventilation:** Front and back cape venting is a common feature for increased breathability.
*   **Fit:**  Varying fits are offered, from slightly fitted to traditional/relaxed.

#### Exploiting docs with langchain
Encapsulating all these steps at once with the `RetrievalQA` class, using the simplest chain method called "Stuff" method for this example:<br>

![stuff.png](../img/stuffmethod.png)

In [None]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff",
    retriever=ir, 
    verbose=True
)

In [None]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [None]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [None]:
display(Markdown(response))

Okay, here's a table summarizing the shirts with sun protection, based on the provided context:

| Shirt Name | Summary |
|---|---|
| Sun Shield Shirt by | High-performance sun shirt with UPF 50+ protection (blocking 98% of UV rays). Slightly fitted, made of 78% nylon and 22% Lycra Xtra Life fiber. Wicks moisture, abrasion resistant, and designed to fit over swimsuits. Handwash/line dry. |
| Men's Plaid Tropic Shirt, Short-Sleeve | UPF 50+ protection (blocks 98% of UV rays) with SunSmart technology. Designed for fishing and travel. Wrinkle-free, quick-drying, and machine washable. Made of 52% polyester and 48% nylon. Includes front/back cape venting and pockets. |
| Men's Tropical Plaid Short-Sleeve Shirt | UPF 50+ protection (blocks 98% of UV rays). Traditional, relaxed fit. Made of 100% polyester, wrinkle-resistant. Features front/back cape venting and pockets. |
| Sunrise Tee | Women's UV-protective button-down shirt with UPF 50+ protection. Lightweight, wrinkle-free, and quick-drying. Made of 71% nylon and 29% polyester with a 100% polyester cape lining. Includes cape venting, pockets, and an eyewear loop. |



Hopefully, this table provides a clear overview of the shirts and their sun protection features.

In [None]:
# we can also customize the indexing process i.e. with our own embedding model
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])
response = index.query(query, llm=llm)

In [None]:
display(Markdown(response))

Okay, here's a table summarizing the shirts with sun protection, based on the provided context:

| Name | Summary |
|---|---|
| Sun Shield Shirt by | A high-performance sun shirt guaranteed to protect from harmful UV rays. It's slightly fitted, falls at the hip, and is made of 78% nylon and 22% Lycra Xtra Life fiber. It's UPF 50+ rated, wicks moisture, and is abrasion resistant. |
| Men's Plaid Tropic Shirt, Short-Sleeve | Ultracomfortable shirt with UPF 50+ coverage, originally designed for fishing. It's wrinkle-free, quickly evaporates perspiration, and is made of 52% polyester and 48% nylon. It features cape venting, pockets, and is machine washable/dryable. |
| Men's Tropical Plaid Short-Sleeve Shirt | A lightweight shirt with a traditional, relaxed fit and UPF 50+ protection. Made of 100% polyester, it's wrinkle-resistant and features cape venting and pockets. |
| Women's Tropical Tee, Sleeveless | A sleeveless button-up shirt with a flattering fit and SunSmart™ UPF 50+ protection. It's made of 71% nylon and 29% polyester, with a 100% polyester cape lining. It's wrinkle-resistant and features cape venting, pockets, and a eyewear loop. |

All of these shirts are rated UPF 50+ and block 98% of the sun's harmful rays.

#### Additional chain methods are:
* Map_reduce
    - Can operate over any number of docs
    - Can do individual questions in parallel
    - But takes a lot more calls
    - And treats every doc as independant from others<br>
![map_reduce.png](../img/map_reducemethod.png)<br>

* Refine
    - Loops over many docs iteratively
    - Very good for combining information as it builds up an answer over multiple doc inputs
    - Takes as many calls as Map_reduce
    - Outputs longer answers
    - Slower as every call to a doc depends on the previous one<br>
![refine.png](../img/refinemethod.png)<br>

* Map_rerank
    - More experimental
    - Does a single call to the LLM model for every doc
    - Asks it to return a score and select the highest score
    - Relies on the LLM to know what the score should be
    - Similar to Map_reduce, docs are independant
    - Takes a lot of calls to the LLM<br>
![map_rerank.png](../img/map_rerankmethod.png)<br>