# Evaluations

Run 100 examples at a time under load dataset and batch run some examples. 

Whatever has been ran before is saved to the csv anyway. 

Then score available examples

## Load Dataset and batch run some examples

In [6]:
START_I= 400 ; END_I = 500 

In [7]:
import pandas as pd
import aiohttp 

In [8]:
dataset = pd.read_csv('mle_screening_dataset.csv')

In [9]:
from tqdm.asyncio import tqdm_asyncio
import aiohttp
import asyncio

queries_to_run = dataset.iloc[START_I:END_I]['question'].tolist()
jsons = [{'user_id': '123e4567-e89b-12d3-a456-426614174000', 'user_name': 'Test User', 'query': query} for query in queries_to_run]

async with aiohttp.ClientSession() as session:
    tasks = [session.post('http://localhost:8000/pulse/answer', json=json) for json in jsons]
    responses = await tqdm_asyncio.gather(*tasks)
    results = [await response.json() for response in responses]

100%|██████████| 100/100 [00:41<00:00,  2.40it/s]


In [12]:
for i, result in enumerate(results):
    dataset.iloc[START_I + i, 2] = result['answer']

In [13]:
dataset.to_csv('mle_screening_dataset.csv', index=False)

## Scoring

In [16]:
from evaluate import load

In [17]:
df = pd.read_csv('mle_screening_dataset.csv')
df.dropna(subset = ['pulse_answer'], inplace = True) 

In [18]:
predictions = df['pulse_answer'].to_list()
references = df['answer'].to_list()


In [20]:
metrics = ['meteor', 'bleu', 'rouge']
scores = {}    
for metric in metrics:
    metric_fn = load(metric)
    metric_score   = [metric_fn.compute(predictions=[p], references=[r]) for p,r in zip(predictions, references)] 
    scores[metric] = metric_score



[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/ymubarak/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/ymubarak/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/ymubarak/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [None]:
scores_df = pd.concat([pd.DataFrame().from_records(scores['bleu']), 
pd.DataFrame().from_records(scores['meteor']), 
pd.DataFrame().from_records(scores['rouge'])], axis = 1)


In [50]:
scores_i_care_about = ['rouge1', 'meteor', 'bleu', 'brevity_penalty', 'translation_length', 'reference_length'] 
final_df = pd.concat([df.reset_index(drop = True),scores_df[scores_i_care_about]], axis =1, )

## Final Results

In [77]:
final_df[scores_i_care_about].describe()

Unnamed: 0,rouge1,meteor,bleu,brevity_penalty,translation_length,reference_length
count,400.0,400.0,400.0,400.0,400.0,400.0
mean,0.241258,0.20788,0.012774,0.72535,216.985,301.1325
std,0.076754,0.084174,0.023488,0.391603,66.113935,340.737869
min,0.028302,0.044125,0.0,6.8e-05,85.0,22.0
25%,0.187951,0.140892,0.0,0.321474,163.75,77.0
50%,0.234329,0.211554,0.0,1.0,213.0,141.5
75%,0.295786,0.268289,0.021785,1.0,257.0,430.0
max,0.448276,0.429674,0.241653,1.0,415.0,2206.0


In [78]:
import textwrap

def print_example(i):
    print(i, final_df.iloc[i]['question'])
    print([f"{score} : {final_df.iloc[i][score]}" for score in scores_i_care_about])
    print("**Answer:**")
    print('\n'.join(textwrap.wrap(final_df.iloc[i]['answer'], width=100)))
    print("**Pulse Answer:**")
    print('\n'.join(textwrap.wrap(final_df.iloc[i]['pulse_answer'], width=150)))

    print('-'*100)


In [79]:
# highest 10 scores 
inds_sorted = final_df.sort_values(by = 'meteor', ascending = True).index 
highest_inds = inds_sorted[-10:]
for i in highest_inds[::-1]:
    print_example(i)
    print("")


205 What are the symptoms of COPD ?
['rouge1 : 0.3913043478260869', 'meteor : 0.4296735298625657', 'bleu : 0.08255016567473047', 'brevity_penalty : 1.0', 'translation_length : 137', 'reference_length : 75']
**Answer:**
The most common symptoms of COPD are a cough that does not go away and coughing up a lot of sputum
(mucus). These symptoms may occur years before lung damage has reduced the flow of air in and out of
the lungs. Other symptoms of COPD include shortness of breath, especially with exercise; wheezing (a
whistling sound when you breathe); and tightness in the chest.
**Pulse Answer:**
The symptoms of Chronic Obstructive Pulmonary Disease (COPD) typically develop gradually and can include shortness of breath during physical
activities, a persistent cough that may produce mucus (often called a smoker's cough), wheezing (a whistling sound when breathing), chest tightness,
fatigue, and frequent respiratory infections such as the flu or pneumonia. As the disease progresses, some pe

In [80]:
lowest_inds = inds_sorted[:10]

for i in lowest_inds:
    print_example(i)
    print("")

61 What are the symptoms of Osteoarthritis ?
['rouge1 : 0.11178247734138973', 'meteor : 0.044124662108443305', 'bleu : 0.0', 'brevity_penalty : 0.0004691645669596856', 'translation_length : 158', 'reference_length : 1369']
**Answer:**
Pain and Stiffness in Joints Different types of arthritis have different symptoms. In general,
people with most forms of arthritis have pain and stiffness in their joints. Osteoarthritis usually
develops slowly and can occur in any joint, but often occurs in weight-bearing joints. Early in the
disease, joints may ache after physical work or exercise. Most often, osteoarthritis occurs in the
hands, hips, knees, neck, or low back. Common Signs Common signs of osteoarthritis include - joint
pain, swelling, and tenderness  - stiffness after getting out of bed  - a crunching feeling or sound
of bone rubbing on bone. joint pain, swelling, and tenderness stiffness after getting out of bed a
crunching feeling or sound of bone rubbing on bone. Not everyone with os

In [81]:
scores_df

Unnamed: 0,bleu,precisions,brevity_penalty,length_ratio,translation_length,reference_length,meteor,rouge1,rouge2,rougeL,rougeLsum
0,0.046139,"[0.288, 0.08032128514056225, 0.024193548387096...",1.000000,1.923077,250,130,0.317266,0.369048,0.119760,0.196429,0.202381
1,0.000000,"[0.0431266846361186, 0.016216216216216217, 0.0...",1.000000,16.863636,371,22,0.139373,0.082353,0.035503,0.052941,0.076471
2,0.030208,"[0.21739130434782608, 0.05339805825242718, 0.0...",1.000000,2.049505,207,101,0.254108,0.266667,0.089552,0.162963,0.162963
3,0.241653,"[0.46153846153846156, 0.27184466019417475, 0.1...",1.000000,1.083333,104,96,0.419630,0.448276,0.325581,0.413793,0.413793
4,0.000000,"[0.20625, 0.012578616352201259, 0.0, 0.0]",1.000000,1.684211,160,95,0.187008,0.200000,0.008772,0.086957,0.086957
...,...,...,...,...,...,...,...,...,...,...,...
395,0.090778,"[0.41245136186770426, 0.1171875, 0.05098039215...",1.000000,1.066390,257,241,0.275829,0.401848,0.106729,0.180139,0.272517
396,0.044323,"[0.45108695652173914, 0.09836065573770492, 0.0...",0.547025,0.623729,184,295,0.197991,0.285024,0.082524,0.169082,0.169082
397,0.000000,"[0.3382352941176471, 0.06403940886699508, 0.00...",1.000000,1.214286,204,168,0.297680,0.356495,0.072948,0.181269,0.181269
398,0.000000,"[0.4090909090909091, 0.09714285714285714, 0.01...",0.715176,0.748936,176,235,0.233911,0.322034,0.079545,0.169492,0.169492
