# Context Quality Comparison 

**Overview**: In this notebook, we will compare the quality of retrieved context from Cohere's and OpenAI's embedding models. We have used cosine similarity metric to retrieve the context from the given corpus.

We have used Uptrain Standard Eval - Context Relevance to make a judegement on which model is good for the context retrieval purpose.


Embed v3 is cohere's latest and most advanced embeddings model. Embed v3 offers state-of-the-art performance per trusted MTEB and BEIR benchmarks. One of the key improvements in Embed v3 is its ability to evaluate how well a query matches a document's topic and assesses the overall quality of the content. This means that it can rank the highest-quality documents at the top, which is especially helpful when dealing with noisy datasets.

On the other hand, OpenAI's text-embedding-ada-002 outperforms all the old embedding models on text search, code search, and sentence similarity tasks and gets comparable performance on text classification. 

For our evaluations, we have used Financial QA dataset. The FiQA dataset has roughly 6,000 questions and 57,000 answers. Financial QA is hard because the vocabularies are context specific. In this experiment, we have randomly picked 100 questions and performed our evaluations on top of it. 

In [1]:
import json 
import polars as pl
import os

url = "https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/context_quality_analysis.jsonl"
dataset_path = os.path.join('./', "context_quality_analysis.jsonl")

if not os.path.exists(dataset_path):
    import httpx
    r = httpx.get(url)
    with open(dataset_path, "wb") as f:
        f.write(r.content)
        


In [2]:
dataset= pl.read_ndjson(dataset_path)

In [3]:
dataset

question,ground_truth,context,embedding_model
str,str,str,str
"""Am I exposed t…","""""Yes, you're s…","""The value of a…","""cohere"""
"""Am I exposed t…","""""Yes, you're s…","""Your definitio…","""openai"""
"""What happen in…","""""But what happ…","""If you sold bo…","""cohere"""
"""What happen in…","""""But what happ…","""1) Yes, both o…","""openai"""
"""How to use a c…","""""You must buy …","""""You must buy …","""cohere"""
"""How to use a c…","""""You must buy …","""""You must buy …","""openai"""
"""Where do I fin…","""I agree that a…","""""These warrant…","""cohere"""
"""Where do I fin…","""I agree that a…","""""These warrant…","""openai"""
"""How do we know…","""For a company …","""Generally the …","""cohere"""
"""How do we know…","""For a company …","""Generally the …","""openai"""


In [5]:
len(dataset)

200

In [4]:
dataset = dataset.to_dicts()

## Running Uptrain Eval - Context Relevancy

### OpenAI Embeddings

In [6]:
from uptrain import APIClient, Evals, ResponseMatching
UPTRAIN_API_KEY = "up-*********************" ## INSERT YOUR UPTRAIN KEY HERE

client = APIClient(uptrain_api_key=UPTRAIN_API_KEY)


In [7]:
dataset[1].keys()

dict_keys(['question', 'ground_truth', 'context', 'embedding_model'])

In [65]:
results = client.evaluate_experiments(
    project_name="context_analysis",
    data = dataset, 
    checks = [Evals.CONTEXT_RELEVANCE], 
    exp_columns = ['embedding_model']
)

[32m2023-11-15 00:30:15.145[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain server[0m
[32m2023-11-15 00:30:31.045[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 50 to <100 to the Uptrain server[0m
[32m2023-11-15 00:30:44.670[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 100 to <150 to the Uptrain server[0m
[32m2023-11-15 00:30:57.262[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 150 to <200 to the Uptrain server[0m


## Comparitive Analysis

In [92]:
results[7]['question']

'How do I pay my estimated income tax?'

In [None]:
print('Question:', results[7]['question'])
print('Cohere context:', results[7]['context_embedding_model_cohere'])
print()
print('OpenAI context:', results[7]['context_embedding_model_openai'])

Question: How do I pay my estimated income tax?
Cohere context: "From the IRS page on Estimated Taxes (emphasis added): Taxes must be paid as you earn or receive income during the year, either through withholding or estimated tax payments. If the amount of income tax withheld from your salary or pension is not enough, or if you receive income such as interest, dividends, alimony, self-employment income, capital gains, prizes and awards, you may have to make estimated tax payments. If you are in business for yourself, you generally need to make estimated tax payments. Estimated tax is used to pay not only income tax, but other taxes such as self-employment tax and alternative minimum tax. I think that is crystal clear that you're paying income tax as well as self-employment tax. To expand a bit, you seem to be confusing self-employment tax and estimated tax, which are not only two different things, but two different kinds of things.  One is a tax, and the other is just a means of paying

In this example, we can clearly see that the context retrieved from the cohere's embed v3 model has all the required content to answer this particular question. For instance, the notion of 'self-employment' tax is only retrieved in cohere's model context whereas OpenAI's model context does not mention about this particular tax. 

OpenAI context contains relevant information to answer this question but Cohere's context is much better to answer this question from holistic view.


In [96]:
print('Context Relevance Score using Embed v3 model:', results_openai[7]['score_context_relevance_embedding_model_cohere'])
print('Context Relevance Score using OpenAI model:', results_openai[7]['score_context_relevance_embedding_model_openai'])

Context Relevance Score using Embed v3 model: 1.0
Context Relevance Score using OpenAI model: 0.5


The same conclusion can be achieved by running the Uptrain Standard Eval - Context Relevance. We can see that the context retrieved from the embed v3 model has got a score of 1.0 while the context retrieved from the OpenAI embedding model has a score of 0.5. 


## Conclusion

In [97]:
score_context_relevance_openai = list(pl.DataFrame(results_openai)['score_context_relevance_embedding_model_openai'])
score_context_relevance_cohere = list(pl.DataFrame(results_openai)['score_context_relevance_embedding_model_cohere'])

In [99]:
print('Average Context Relevancy Score using OpenAI model:', sum(score_context_relevance_openai)/len(score_context_relevance_openai))
print('Average Context Relevancy Score using Embed v3 model:', sum(score_context_relevance_cohere)/len(score_context_relevance_cohere))

Average Context Relevancy Score using OpenAI model: 0.49
Average Context Relevancy Score using Embed v3 model: 0.595


Empirically, this shows that the context retrieved from the Embed v3 model is of better quality in comparison to the OpenAI embedding model. 

The same analysis can be seen from the graph as well. The graph can be obtained from Uptrain dashboard by providing the uptrain API key.

<img src="https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/context_analysis.png" alt="Alternative text" />
