## Evaluation
Leveragin LLMs & chains to evaluate other LLMs & chains capabilities.

In [None]:
from langchain_ollama import OllamaEmbeddings
from langchain.chains import RetrievalQA
from langchain_ollama import ChatOllama
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

In [None]:
# reset data
file = './data/clothing.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [None]:
# reset index with custom local embedding model
embeddings = OllamaEmbeddings(model="nomic-embed-text")
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])



In [None]:
# reset retriever with custom local llm model
model = "gemma3:12b"
llm = ChatOllama(temperature = 0.0, model=model)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

From here we need to generate examples to evaluate our system relevancy.
1. Find datapoints that are good examples:

In [None]:
# these could be good ones
print(data[10])
print(data[11])

page_content=': 10
name: Cozy Comfort Pullover Set, Stripe
description: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.

Size & Fit
- Pants are Favorite Fit: Sits lower on the waist.
- Relaxed Fit: Our most generous fit sits farthest from the body.

Fabric & Care
- In the softest blend of 63% polyester, 35% rayon and 2% spandex.

Additional Features
- Relaxed fit top with raglan sleeves and rounded hem.
- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.

Imported.' metadata={'source': './data/clothing.csv', 'row': 10}
page_content=': 11
name: Ultra-Lofty 850 Stretch Down Hooded Jacket
description: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style

2. Manually prepare sort of a "ground truth" on these examples:

In [None]:
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty 850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

However it is time-consuming, so a way to automate this is by using LLMs themselves, leveraging the `QAGeneration` class:<br>
We want it to create a question-answer pair for each parsed document.

In [None]:
from langchain.evaluation.qa import QAGenerateChain

In [None]:
# instantiate the QAGen object
example_gen_chain = QAGenerateChain.from_llm(ChatOllama(model=model))

In [None]:
# parse and generate QnA pairs over the first five docs
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)



In [None]:
# check for generated QnA pairs
for new_example in new_examples:
    print(new_example)

{'qa_pairs': {'query': "According to the provided description, what material is used for the innersole of the Women's Campside Oxfords, and what specific feature does it include for odor control?", 'answer': "The Women's Campside Oxfords feature a comfortable EVA innersole that includes Cleansport NXT® antimicrobial odor control."}}
{'qa_pairs': {'query': 'According to the provided document, what percentage of the fabric used to construct the Recycled Waterhog Dog Mat is comprised of recycled materials?', 'answer': '94% of the fabric is made from recycled materials.'}}
{'qa_pairs': {'query': "According to the provided description, what level of sun protection does the Infant and Toddler Girls' Coastal Chill Swimsuit offer, and what percentage of the sun's harmful rays does this protection block?", 'answer': "The swimsuit offers UPF 50+ rated sun protection, which blocks 98% of the sun's harmful rays."}}
{'qa_pairs': {'query': 'According to the document, what is the fabric composition o

In [None]:
# cross check with the original datas
for content in data[:5]:
    print(content.page_content)

: 0
name: Women's Campside Oxfords
description: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. 

Size & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. 

Specs: Approx. weight: 1 lb.1 oz. per pair. 

Construction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. 

Questions? Please contact us for any inquiries.
: 1
name: Recycled Waterhog Dog Mat, Chevron Weave
description: Protect your floors from spills and splashing with our ultradurable recycled Waterhog dog mat made right here in the USA. 

Specs
Small - Dimensions: 18" x 28". 
M

In [None]:
# add generated examples to manual ones
examples = examples + [new_example["qa_pairs"] for new_example in new_examples]
examples

[{'query': 'Do the Cozy Comfort Pullover Set have side pockets?',
  'answer': 'Yes'},
 {'query': 'What collection is the Ultra-Lofty 850 Stretch Down Hooded Jacket from?',
  'answer': 'The DownTek collection'},
 {'query': "According to the provided description, what material is used for the innersole of the Women's Campside Oxfords, and what specific feature does it include for odor control?",
  'answer': "The Women's Campside Oxfords feature a comfortable EVA innersole that includes Cleansport NXT® antimicrobial odor control."},
 {'query': 'According to the provided document, what percentage of the fabric used to construct the Recycled Waterhog Dog Mat is comprised of recycled materials?',
  'answer': '94% of the fabric is made from recycled materials.'},
 {'query': "According to the provided description, what level of sun protection does the Infant and Toddler Girls' Coastal Chill Swimsuit offer, and what percentage of the sun's harmful rays does this protection block?",
  'answer': 

Now we need to actually evaluate. Try and run an example manually through the chain:

In [None]:
qa.run(examples[0]["query"])

  qa.run(examples[0]["query"])




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Yes, the Cozy Comfort Pullover Set has side pockets on the pants. The description states: "Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg."'

This appears limiting in terms of what is happening inside the chain:
* What is the actual prompt?
* What are the docs retrieved?

In [None]:
import langchain
langchain.debug = True
qa.invoke(examples[0]["query"])
langchain.debug = False

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Do the Cozy Comfort Pullover Set have side pockets?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Do the Cozy Comfort Pullover Set have side pockets?",
  "context": ": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- 

A context is build from the retrieved docs by the IR.

Now to automate the LLM-assisted process, leveraging the `QAEvalChain` class:

In [None]:
# store the predictions over the total number of examples
predictions = qa.invoke(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


ResponseError: llama runner process has terminated: exit status 2 (status code: 500)

In [None]:
# failed with a 500 code implying a potential limit in tokens or more likely in local hardware
# trying to iterate for debugging
test_predictions = []
for example in examples:
    try:
        prediction = qa.invoke(example)
        test_predictions.append(prediction)
    except Exception as e:
        print(f"Failed on example: {example['query']}\n{e}")
print(test_predictions)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m
Failed on example: According to the provided document, what is a key benefit of the "TEK O2" technology incorporated into the EcoFlex 3L Storm Pants?
llama runner process has terminated: exit status 2 (status code: 500)
[{'query': 'Do the Cozy Comfort Pullover Set have side pockets?', 'answer': 'Yes', 'result': 'Yes, the Cozy Comfort Pullover Set has side pockets on the pants. The description states: "Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg."'}, {'query': '

In [None]:
from langchain.evaluation.qa import QAEvalChain
llm = ChatOllama(temperature=0, model=model)
eval_chain = QAEvalChain.from_llm(llm)

In [None]:
# we omit the last example that crashed in the predictions due to local hardware limitations
graded_outputs = eval_chain.evaluate(examples[:-1], test_predictions)

In [None]:
# and print a report with the evaluation for every QnA pair
for i, eg in enumerate(examples[:-1]):
    print(f"Example {i}:")
    print("Question: " + test_predictions[i]['query'])
    print("Real Answer: " + test_predictions[i]['answer'])
    print("Predicted Answer: " + test_predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()

Example 0:
Question: Do the Cozy Comfort Pullover Set have side pockets?
Real Answer: Yes
Predicted Answer: Yes, the Cozy Comfort Pullover Set has side pockets on the pants. The description states: "Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg."
Predicted Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty 850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT

Example 2:
Question: According to the provided description, what material is used for the innersole of the Women's Campside Oxfords, and what specific feature does it include for odor control?
Real Answer: The Women's Campside Oxfords feature a comfortable EVA innersole that includes Cleansport NXT® antimicrobial odor control.
Predicted Answer: According to the description, the Women's Campside Oxfords have a comfortable E