# Evaluation

### Outline:
- Example generation
- Manual evaluation (debugging)
- LLM-assisted evaluation
- LangChain evaluation platform

### Create or QandA application

In [5]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOllama
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.embeddings import HuggingFaceEmbeddings

In [6]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [9]:
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"  # Modelo ligero y efectivo
)

index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embedding_model,
).from_loaders([loader])

In [11]:
llm = ChatOllama(model="llama3.2", temperature=0)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=index.vectorstore.as_retriever(),
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

### Coming up with test datapoints

In [12]:
data[10]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 10}, page_content='ProductID: P0011\nProductName: Outdoor Gear 11\nCategory: Chaquetas\nPrice: 121.05\nStock: 36\nRating: 3.2')

In [13]:
data[11]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 11}, page_content='ProductID: P0012\nProductName: Outdoor Gear 12\nCategory: Pantalones\nPrice: 155.66\nStock: 1\nRating: 4.0')

### Hard-coded examples

In [14]:
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

### LLM-Generated examples

In [15]:
from langchain.evaluation.qa import QAGenerateChain

In [17]:
example_gen_chain = QAGenerateChain.from_llm(llm)

In [18]:
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)



In [19]:
new_examples[0]

{'qa_pairs': {'query': 'What is the ProductID, ProductName, and Category of the first product in the catalog?',
  'answer': 'The ProductID is P0001, the ProductName is Outdoor Gear 1, and the Category is Chaquetas.'}}

In [20]:
data[0]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0}, page_content='ProductID: P0001\nProductName: Outdoor Gear 1\nCategory: Chaquetas\nPrice: 105.88\nStock: 47\nRating: 4.5')

### Combine examples

In [21]:
examples += new_examples

In [22]:
qa.run(examples[0]["query"])

  qa.run(examples[0]["query"])




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'I don\'t know, I don\'t see any information about a "Cozy Comfort Pullover Set" in the provided context. The products listed are all outdoor gear and clothing items, but none of them match that name.'

### Manual evaluation

In [23]:
import langchain
langchain.debug = True

In [24]:
qa.run(examples[0]["query"])

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Do the Cozy Comfort Pullover Set        have side pockets?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Do the Cozy Comfort Pullover Set        have side pockets?",
  "context": "ProductID: P0689\nProductName: Outdoor Gear 689\nCategory: Chaquetas\nPrice: 253.64\nStock: 54\nRating: 4.2<<<<>>>>>ProductID: P0122\nProductName: Outdoor Gear 122\nCategory: Chaquetas\nPrice: 281.45\nStock: 171\nRating: 3.2<<<<>>>>>ProductID: P0457\nProductName: Outdoor Gear 457\nCategory: Chaquetas\nPrice: 253.2\nStock: 189\nRating: 3.7<<<<>>>>>ProductID: P0476\nProductName: Outdoor Gear 476\nCategory: Pantalones\nPrice: 273.85\nStock: 94\nRating: 4.6"
}
[32;1m[1;3m[llm/s

'I don\'t know, I don\'t see any information about a "Cozy Comfort Pullover Set" in the provided context. The products listed are all outdoor gear and clothing items, but none of them match that name.'

In [26]:
langchain.debug = False

### LLM assisted evaluation

In [32]:
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

In [33]:
predictions = qa.apply(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [34]:
from langchain.evaluation.qa import QAEvalChain

In [35]:
eval_chain = QAEvalChain.from_llm(llm)

In [36]:
graded_outputs = eval_chain.evaluate(examples, predictions)

In [39]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()

Example 0:
Question: Do the Cozy Comfort Pullover Set        have side pockets?
Real Answer: Yes
Predicted Answer: I don't know, I don't see any information about a "Cozy Comfort Pullover Set" in the provided context. The products listed are all outdoor gear and clothing items, but none of them match that name.
Predicted Grade: INCORRECT

The student's answer is not INCORRECT because they answered correctly based on their lack of knowledge about the specific product. However, since we're grading solely on factual accuracy regarding the existence of side pockets in the Cozy Comfort Pullover Set, and the set does indeed have side pockets according to the provided context, the correct grade would be:

INCORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: I don't know what the "Ultra-Lofty 850 Stretch Down Hooded Jacket" is, or if it's even in your dataset. Can you please provide