## LangChain: Evaluation

When building a complex application using an LLM, one of the important but sometimes tricky steps is how to evaluate how well your application is doing. Is it meeting some accuracy criteria?

### Outline:
- Example generation
- Manual Evaluation (and debugging)
- LLM assisted evaluation
- LangChain evaluation platform

In [1]:
import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

OPENAI_KEY = os.getenv('OPENAI_KEY')
openai.api_key = OPENAI_KEY

In [2]:
FILENAME = "OutdoorClothingCatalog_1000.csv"
LLM_MODEL = "gpt-3.5-turbo"

In [3]:
from langchain.document_loaders import CSVLoader

loader = CSVLoader(file_path=FILENAME)
data = loader.load()

#### Creating our Q&A Application

In [4]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

In [5]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [6]:
llm = ChatOpenAI(temperature=0.0, model=LLM_MODEL)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=index.vectorstore.as_retriever(),
    verbose=True,
    chain_type_kwargs={
        "document_separator": "<<<<>>>>>"
    }
)

What are some data points, that we want to evaluate it on, there are some methods we are going to look at in next steps.

In [11]:
# Data points, that we can consider valid

print(data[10])
print()
print(data[11])

page_content=": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported." metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 10}

page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion

In [12]:
## Hardcoded examples
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

#### LLM Generated Examples

In [13]:
from langchain.evaluation.qa import QAGenerateChain

example_gen_chain = QAGenerateChain.from_llm(
    ChatOpenAI()
)

new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)



In [14]:
print(new_examples)

[{'qa_pairs': {'query': "What is the description of the Women's Campside Oxfords?", 'answer': "The Women's Campside Oxfords are ultracomfortable lace-to-toe Oxford shoes made of super-soft canvas. They have thick cushioning and are of high quality construction, providing a broken-in feel from the first time they are worn."}}, {'qa_pairs': {'query': 'What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?', 'answer': 'The small size has dimensions of 18" x 28" and the medium size has dimensions of 22.5" x 34.5".'}}, {'qa_pairs': {'query': "What are some features of the Infant and Toddler Girls' Coastal Chill Swimsuit?", 'answer': "The Infant and Toddler Girls' Coastal Chill Swimsuit features bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, fully lined bottom, and it is machine washable and line dry for best results."}}, {'qa

In [15]:
print(new_examples[0])

{'qa_pairs': {'query': "What is the description of the Women's Campside Oxfords?", 'answer': "The Women's Campside Oxfords are ultracomfortable lace-to-toe Oxford shoes made of super-soft canvas. They have thick cushioning and are of high quality construction, providing a broken-in feel from the first time they are worn."}}


In [16]:
print(data[0])

page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries." metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0}


In [22]:
## Merging LLM examples with our examples
examples += new_examples
print(examples)

examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

[{'query': 'Do the Cozy Comfort Pullover Set        have side pockets?', 'answer': 'Yes'}, {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?', 'answer': 'The DownTek collection'}, {'qa_pairs': {'query': "What is the description of the Women's Campside Oxfords?", 'answer': "The Women's Campside Oxfords are ultracomfortable lace-to-toe Oxford shoes made of super-soft canvas. They have thick cushioning and are of high quality construction, providing a broken-in feel from the first time they are worn."}}, {'qa_pairs': {'query': 'What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?', 'answer': 'The small size has dimensions of 18" x 28" and the medium size has dimensions of 22.5" x 34.5".'}}, {'qa_pairs': {'query': "What are some features of the Infant and Toddler Girls' Coastal Chill Swimsuit?", 'answer': "The Infant and Toddler Girls' Coastal Chill Swimsuit features bright colors, ruffles, exclusive

In [23]:
qa.run(examples[0]["query"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Yes, the Cozy Comfort Pullover Set does have side pockets.'

#### Manual Evaluation
Finding what is happening inside the model

In [24]:
import langchain
langchain.debug = True

qa.run(examples[0]["query"])

## turing off the debug mode
langchain.debug = False

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Do the Cozy Comfort Pullover Set        have side pockets?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Do the Cozy Comfort Pullover Set        have side pockets?",
  "context": ": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\

#### LLM Assisted Evaluation

In [25]:
predictions = qa.apply(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [27]:
from langchain.evaluation.qa import QAEvalChain

llm = ChatOpenAI(temperature=0, model=LLM_MODEL)
eval_chain = QAEvalChain.from_llm(llm)

graded_outputs = eval_chain.evaluate(examples, predictions)
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print(eg)
    # print("Question: " + predictions[i]['query'])
    # print("Real Answer: " + predictions[i]['answer'])
    # print("Predicted Answer: " + predictions[i]['result'])
    # print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

Example 0:
{'query': 'Do the Cozy Comfort Pullover Set        have side pockets?', 'answer': 'Yes'}

Example 1:
{'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?', 'answer': 'The DownTek collection'}



#### LangChain Evaluation platform

The LangChain evaluation platform, LangChain Plus, can be accessed here https://www.langchain.plus/. Use the invite code lang_learners_2023