## LangChain: Evaluation

### Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation

In [67]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

In [68]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

In [69]:
# For creating vector store
from langchain.indexes import VectorstoreIndexCreator
#from langchain.vectorstores import DocArrayInMemorySearch
from langchain.document_loaders import CSVLoader
from langchain_community.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

In [84]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [85]:
type(data)

list

### Index Creation
This code creates an in-memory vector store index (index) that can search documents loaded from a source (defined by the loader variable) using their vector representations. This allows for efficient retrieval of relevant documents based on a search query.

In [86]:
index=VectorstoreIndexCreator(vectorstore_cls=DocArrayInMemorySearch, embedding=OpenAIEmbeddings()).from_loaders([loader])

### Retrieval QA 

This code creates a RetrievalQA object that leverages the previously created vector store index (index) for retrieval and the provided LLM (llm) for answering questions. The retrieved documents will be separated by the specified string ("<<<<>>>>>") before being analyzed by the LLM to answer the user's query.

In [87]:
llm=ChatOpenAI(temperature=0,model=llm_model)

In [88]:
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

In [89]:
#### Test data points
data[1]

Document(page_content=': 1\nname: Recycled Waterhog Dog Mat, Chevron Weave\ndescription: Protect your floors from spills and splashing with our ultradurable recycled Waterhog dog mat made right here in the USA. \n\nSpecs\nSmall - Dimensions: 18" x 28". \nMedium - Dimensions: 22.5" x 34.5".\n\nWhy We Love It\nMother nature, wet shoes and muddy paws have met their match with our Recycled Waterhog mats. Ruggedly constructed from recycled plastic materials, these ultratough mats help keep dirt and water off your floors and plastic out of landfills, trails and oceans. Now, that\'s a win-win for everyone.\n\nFabric & Care\nVacuum or hose clean.\n\nConstruction\n24 oz. polyester fabric made from 94% recycled materials.\nRubber backing.\n\nAdditional Features\nFeatures an -exclusive design.\nFeatures thick and thin fibers for scraping dirt and absorbing water.\nDries quickly and resists fading, rotting, mildew and shedding.\nUse indoors or out.\nMade in the USA.\n\nHave questions? Reach out to

#### Hard coded examples

In [106]:
examples = [
    {'qa_pairs':
        {"query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
        }
    },
    {'qa_pairs':
         {"query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
         }
    }
]

#### LLM Generated Examples

In [107]:
from langchain.evaluation.qa import QAGenerateChain

In [104]:
# QA Generation

example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())

# auto generated Q/A
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:3]]
)



In [109]:
new_examples#[0]#['query']

[{'qa_pairs': {'query': "According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?",
   'answer': "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."}},
 {'qa_pairs': {'query': 'What are the dimensions of the small and medium sizes for the Recycled Waterhog Dog Mat, Chevron Weave?',
   'answer': 'The small size has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".'}},
 {'qa_pairs': {'query': "What features does the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece have according to the document?",
   'answer': 'The swimsuit features bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, fully lined bottom, and it is recommended to be machine washed and line dried for best results.'}}]

In [110]:
new_examples[0]['qa_pairs']['query']

"According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?"

In [111]:
examples

[{'qa_pairs': {'query': 'Do the Cozy Comfort Pullover Set        have side pockets?',
   'answer': 'Yes'}},
 {'qa_pairs': {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
   'answer': 'The DownTek collection'}}]

#### Combine examples

In [112]:
examples= examples + new_examples

In [114]:
qa.run(examples[2]['qa_pairs']["query"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


"The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."

### Manual Evaluation

In [115]:
import langchain
langchain.debug = True

In [116]:
qa.run(examples[2]['qa_pairs']["query"])

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?",
  "context": ": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: So

"The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."

In [117]:
# Turn off the debug mode
langchain.debug = False

In [121]:
examples

[{'qa_pairs': {'query': 'Do the Cozy Comfort Pullover Set        have side pockets?',
   'answer': 'Yes'}},
 {'qa_pairs': {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
   'answer': 'The DownTek collection'}},
 {'qa_pairs': {'query': "According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?",
   'answer': "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."}},
 {'qa_pairs': {'query': 'What are the dimensions of the small and medium sizes for the Recycled Waterhog Dog Mat, Chevron Weave?',
   'answer': 'The small size has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".'}},
 {'qa_pairs': {'query': "What features does the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece have according to the document?",
   'answer': 'The swimsuit features bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fa

### Removing one level of nesting 'qa_pairs'

In [122]:
for item in examples:
    item.update(item.pop('qa_pairs'))

In [123]:
examples

[{'query': 'Do the Cozy Comfort Pullover Set        have side pockets?',
  'answer': 'Yes'},
 {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
  'answer': 'The DownTek collection'},
 {'query': "According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?",
  'answer': "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."},
 {'query': 'What are the dimensions of the small and medium sizes for the Recycled Waterhog Dog Mat, Chevron Weave?',
  'answer': 'The small size has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".'},
 {'query': "What features does the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece have according to the document?",
  'answer': 'The swimsuit features bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, f

### LLM assisted evaluation¶

In [124]:
predictions=qa.apply(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [125]:
from langchain.evaluation.qa import QAEvalChain

In [126]:
llm=ChatOpenAI(temperature=0,model=llm_model)
eval_chain=QAEvalChain.from_llm(llm=llm)


In [127]:
graded_outputs= eval_chain.evaluate(examples,predictions)

In [128]:
predictions

[{'query': 'Do the Cozy Comfort Pullover Set        have side pockets?',
  'answer': 'Yes',
  'result': 'The Cozy Comfort Pullover Set, Stripe does have side pockets.'},
 {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
  'answer': 'The DownTek collection',
  'result': 'The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.'},
 {'query': "According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?",
  'answer': "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.",
  'result': "The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz."},
 {'query': 'What are the dimensions of the small and medium sizes for the Recycled Waterhog Dog Mat, Chevron Weave?',
  'answer': 'The small size has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".',
  'result': 'The small size of the Recycled Waterhog Dog Mat, Chevro

In [129]:
graded_outputs

[{'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'}]

In [130]:
for i,eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Grade: " + graded_outputs[i]['results'])
    print()


Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: The Cozy Comfort Pullover Set, Stripe does have side pockets.
Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Grade: CORRECT

Example 2:
Question: According to the document, what is the approximate weight of the Women's Campside Oxfords per pair?
Real Answer: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Predicted Answer: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Grade: CORRECT

Example 3:
Question: What are the dimensions of the small and medium sizes for the Recycled Waterhog Dog Mat, Chevron Weave?
Real Answer: The small size has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 3