### LangChain for LLM Application development

How to evaluate how the application is doing. Is it meeting a certain criteria? And if you change some stuff how to know if you are making it better or worse?  

You can also use llms and chains to evaluate other language models and chains

In [60]:
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

llm_model = "llama2:7b"
llm = ChatOllama(temperature=0.0, model=llm_model)

file = "OutdoorClothingCatalog_1000.csv"
loader = CSVLoader(file_path=file, encoding='utf-8')
data = loader.load()

embeddings = HuggingFaceEmbeddings()

index= VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding= embeddings

).from_loaders([loader])



  embeddings = HuggingFaceEmbeddings()


In [61]:


qa = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = "stuff",
    retriever = index.vectorstore.as_retriever(),
    verbose = True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

In [62]:
#What are some datapoints we want to evaluate?
data[10]
data[11]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 11}, page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.')

In [None]:
#Hardcoded examples
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]
#This does not scale that well

In [64]:
#We can automate it with language models themselves
from langchain.evaluation.qa import QAGenerateChain #This will take in documents and create a QA question pair from each document

In [65]:
#Do it using the language model
example_gen_chain = QAGenerateChain.from_llm(ChatOllama(model=llm_model)) #Create the chain by passing in the language model

In [66]:
#From there we can create a bunch of examples
new_examples = example_gen_chain.apply_and_parse( #Applies an output parser so that we can get a dictionary as a result
    [{"doc": t} for t in data[:5]]
)

  new_examples = example_gen_chain.apply_and_parse( #Applies an output parser so that we can get a dictionary as a result


In [68]:
#new_examples[0]
new_examples[1]

{'qa_pairs': {'query': 'What percentage of the Recycled Waterhog Dog Mat is made up of recycled materials?',
  'answer': 'According to the document, 94% of the mat is made up of recycled materials.'}}

In [69]:
#data[0]
data[1]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 1}, page_content=': 1\nname: Recycled Waterhog Dog Mat, Chevron Weave\ndescription: Protect your floors from spills and splashing with our ultradurable recycled Waterhog dog mat made right here in the USA. \n\nSpecs\nSmall - Dimensions: 18" x 28". \nMedium - Dimensions: 22.5" x 34.5".\n\nWhy We Love It\nMother nature, wet shoes and muddy paws have met their match with our Recycled Waterhog mats. Ruggedly constructed from recycled plastic materials, these ultratough mats help keep dirt and water off your floors and plastic out of landfills, trails and oceans. Now, that\'s a win-win for everyone.\n\nFabric & Care\nVacuum or hose clean.\n\nConstruction\n24 oz. polyester fabric made from 94% recycled materials.\nRubber backing.\n\nAdditional Features\nFeatures an -exclusive design.\nFeatures thick and thin fibers for scraping dirt and absorbing water.\nDries quickly and resists fading, rotting, mildew and shedding.\nUse

In [None]:
#lets add the examples to the examples we already created
examples += new_examples

In [71]:
#How do we exactly evaluate what is going on?
qa.run(examples[0]["query"]) #First we want to run an example through the chain

#Debugging
#print(llm.predict("Hello, are you ready?"))



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Based on the information provided in the context, the Cozy Comfort Pullover Set does have side pockets. According to the description, the pants have "side pockets" as one of their additional features.'

**It is limiting in terms of what we can see. What are the prompts going in? The documents it retrieves? If it was a more complex chain what are the intermediate results?**

In [72]:
import langchain
langchain.debug = True

In [None]:
qa.run(examples[0]["query"]) #Now it prints out a lot more information

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Do the Cozy Comfort Pullover Set        have side pockets?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Do the Cozy Comfort Pullover Set        have side pockets?",
  "context": ": 73\nname: Cozy Cuddles Knit Pullover Set\ndescription: Perfect for lounging, this knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out. \n\nSize & Fit \nPants are Favorite Fit: Sits lower on the waist. \nRelaxed Fit: Our most generous fit sits farthest from the body. \n\nFabric & Care \nIn the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features \

'Based on the information provided in the context, the Cozy Comfort Pullover Set does have side pockets. According to the description, the pants have "side pockets" as one of their additional features.'

**How can we evaluate all the examples that we have created?** LLM Assisted evaluation

In [77]:
langchain.debug = False
#First fixing the examples as it is expecting a query and some entries are nested
fixed_examples = []
for e in examples:
    if 'query' in e:
        fixed_examples.append(e)
    elif 'qa_pairs' in e and 'query' in e['qa_pairs']:
        # flatten nested qa_pairs
        fixed_examples.append(e['qa_pairs'])
    else:
        # skip or handle other unexpected cases
        pass

predictions = qa.apply(fixed_examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [78]:
from langchain.evaluation.qa import QAEvalChain #import this evalchain

In [80]:
llm = ChatOllama(temperature=0, model=llm_model) 
eval_chain = QAEvalChain.from_llm(llm) #Create the chain with a language model

In [84]:
graded_outputs = eval_chain.evaluate(fixed_examples, predictions)
graded_outputs

[{'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT. The student answer is factually accurate, as it matches the true answer in terms of the materials used to make the tankini top. The student answer mentions "nylon" and "Lycra Xtra Life" for the fabric, which is the same as the true answer\'s mention of "recycled nylon" and "Lycra® spandex." Additionally, the student answer includes the full front lining material, which is also mentioned in the true answer. Well done!'},
 {'results': 'CORRECT'}]

In [85]:
# 
for i, eg in enumerate(fixed_examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()

Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: Based on the information provided in the context, the Cozy Comfort Pullover Set does have side pockets. According to the description, the pants have "side pockets" as one of their additional features.
Predicted Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: Based on the information provided in the context, the Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT

Example 2:
Question: What is the approximate weight of these Women's Campside Oxfords per pair?
Real Answer: According to the document, the approximate weight of these Women's Campside Oxfords per pair is 1 lb. 1 oz.
Predicted Answer: According to the context, the approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Predicte

In [86]:
graded_outputs[0]

{'results': 'CORRECT'}