### **Coming soon...**

# **Evaluating Natural Language Generation w/ RAGAS**

### Overview
In this notebook you will explore the RAGAS (by ExplodingGradients) open source NLG evaluation framework. Ragas aims to create an open standard, providing developers with the tools and techniques to leverage continual learning in their RAG applications. using RAGAS for NLG evaluation you will be able to evaluate each component of your RAG pipeline in isolation. RAGAS primarily uses 4 core metrics:
1. Faithfulness: How factually accurate a generated answer is
2. Answer Relevence: How relevent an answer is to the question
3. Context Precision: The signal to noise ration fo the retreived context
4. Context Recall: Is all required relevant information retreived to answer the question (_requires ground truth_)

_Notes_  
- For this notebook, we will use 30 smaples from the [FIQA](https://sites.google.com/view/fiqa/) public dataset from ExplodingGradients
  - _Schema_ = question,ground_truths,answer,contexts
- For this notebook we will use the previously established Azure OpenAI connection, however a regular OpenAI connection can also be used

 **_Go Deeper_**  
[RAGAS Documentation](https://docs.ragas.io/en/stable/index.html)  
[RAGAS Project GitHub](https://github.com/explodinggradients/ragas)
  
**_Prerequisites_**  
  
Ensure that your environment is setup by completing the steps outlines in [0_setup.ipynb](./0_setup.ipynb)

In [1]:
# Import Libraries
import os
import pandas as pd
from datasets import load_dataset
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient
from dotenv import load_dotenv, find_dotenv
from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas import evaluate
from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
)

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Setup environment
load_dotenv(find_dotenv(), override=True)
print(os.getenv("WORKSPACE_NAME"))

# Get a handle to the workspace
ml_client = MLClient(
    credential=DefaultAzureCredential(),
    subscription_id = os.environ.get('SUBSCRIPTION_ID'),
    resource_group_name = os.environ.get('RESOURCE_GROUP_NAME'),
    workspace_name = os.environ.get('WORKSPACE_NAME'),
)

nlg-eval-aml


In [4]:
# Set config variables
metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
]

azure_configs = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "aoai_key": os.environ.get("AZURE_OPENAI_KEY"),
    "model_deployment": "aoai-gpt4",
    "model_name": "gpt-4",
    "embedding_deployment": "aoai-ada",
    "embedding_name": "text-embedding-ada-002"
}

print(azure_configs)

{'azure_endpoint': 'https://nlg-eval-aoai.openai.azure.com/', 'aoai_key': 'd055230a448149b5976c706e19048509', 'model_deployment': 'aoai-gpt4', 'model_name': 'gpt-4', 'embedding_deployment': 'aoai-ada', 'embedding_name': 'text-embedding-ada-002'}


In [26]:
# Load dataset
fiqa = load_dataset("explodinggradients/fiqa", "ragas_eval")
display(fiqa)

DatasetDict({
    baseline: Dataset({
        features: ['question', 'ground_truths', 'answer', 'contexts'],
        num_rows: 30
    })
})

In [20]:
# Create model instances to be used for evaluation

azure_model = AzureChatOpenAI(
    openai_api_version="2023-07-01-preview",
    azure_endpoint=azure_configs["azure_endpoint"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    openai_api_type="azure",
    openai_api_key=azure_configs["aoai_key"],
    validate_base_url=False,
)

# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-07-01-preview",
    azure_endpoint=azure_configs["azure_endpoint"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
    openai_api_type="azure",
    openai_api_key=azure_configs["aoai_key"],
)

In [23]:
# Evaluate - this may take several minutes
result = evaluate(
    fiqa["baseline"],
    metrics=metrics,
    llm=azure_model,
    embeddings=azure_embeddings,
    raise_exceptions=False
)

passing column names as 'ground_truths' is deprecated and will be removed in the next version, please use 'ground_truth' instead. Note that `ground_truth` should be of type string and not Sequence[string] like `ground_truths`
Evaluating:  20%|██        | 24/120 [02:34<05:14,  3.28s/it]  Invalid response format. Expected a list of dictionaries with keys 'verdict'
Evaluating:  21%|██        | 25/120 [03:25<10:47,  6.81s/it]Invalid response format. Expected a list of dictionaries with keys 'verdict'
Evaluating:  22%|██▎       | 27/120 [03:38<10:42,  6.91s/it]Invalid JSON response. Expected dictionary with key 'Attributed'
Invalid JSON response. Expected dictionary with key 'Attributed'
Invalid JSON response. Expected dictionary with key 'Attributed'
Invalid JSON response. Expected dictionary with key 'Attributed'
Invalid JSON response. Expected dictionary with key 'Attributed'
Evaluating:  29%|██▉       | 35/120 [03:45<05:14,  3.70s/it]Invalid response format. Expected a list of dictionar

In [27]:
# View Results
display(result.to_pandas())

Unnamed: 0,question,ground_truths,answer,contexts,ground_truth,faithfulness,answer_relevancy,context_recall,context_precision
0,How to deposit a cheque issued to an associate...,[Have the check reissued to the proper payee.J...,\nThe best way to deposit a cheque issued to a...,[Just have the associate sign the back and the...,Have the check reissued to the proper payee.Ju...,0.6,0.980001,,1.0
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...,Sure you can. You can fill in whatever you wa...,,0.935905,,1.0
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...,You're confusing a lot of things here. Company...,0.833333,0.923934,,1.0
3,Applying for and receiving business credit,"[""I'm afraid the great myth of limited liabili...",\nApplying for and receiving business credit c...,[Set up a meeting with the bank that handles y...,"""I'm afraid the great myth of limited liabilit...",1.0,0.883902,,
4,401k Transfer After Business Closure,[You should probably consult an attorney. Howe...,\nIf your employer has closed and you need to ...,[The time horizon for your 401K/IRA is essenti...,You should probably consult an attorney. Howev...,0.0,0.898197,,0.0
5,What are the ins/outs of writing equipment pur...,[Most items used in business have to be deprec...,\nWriting equipment purchases off as business ...,[You would report it as business income on Sch...,Most items used in business have to be depreci...,1.0,0.948321,,1.0
6,Can a entrepreneur hire a self-employed busine...,[Yes. I can by all means start my own company ...,"\nYes, an entrepreneur can hire a self-employe...",[Yes. I can by all means start my own company ...,Yes. I can by all means start my own company a...,1.0,0.964149,0.75,1.0
7,Intentions of Deductible Amount for Small Busi...,"[""If your sole proprietorship losses exceed al...",\nThe intention of deductible amounts for smal...,"[""Short answer, yes. But this is not done thro...","""If your sole proprietorship losses exceed all...",1.0,0.858976,,0.0
8,How can I deposit a check made out to my busin...,[You should have a separate business account. ...,\nYou can deposit a check made out to your bus...,"[""I have checked with Bank of America, and the...",You should have a separate business account. M...,1.0,0.990136,,1.0
9,Filing personal with 1099s versus business s-c...,[Depends whom the 1099 was issued to. If it wa...,\nFiling personal taxes with 1099s versus fili...,[Depends whom the 1099 was issued to. If it wa...,Depends whom the 1099 was issued to. If it was...,0.833333,0.952563,0.857143,1.0


Note known issues with the RAGAS framework that may materialize during this notebook:
- [#555 Index Errors](https://github.com/explodinggradients/ragas/issues/555)
- [#395 Dictionary Format Outpus](https://github.com/explodinggradients/ragas/issues/395)
- [#536 OpenAI Integration Broken](https://github.com/explodinggradients/ragas/issues/536)
- [#449 Azure Content Filter Triggered](https://github.com/explodinggradients/ragas/issues/449)