# Context Quality Comparison 

**Overview**: In this notebook, we will compare the quality of retrieved context from Cohere's and OpenAI's embedding models. We have used cosine similarity metric to retrieve the context from the given corpus.

We have used Uptrain Standard Eval - Context Relevance to make a judegement on which model is good for the context retrieval purpose.


Embed v3 is cohere's latest and most advanced embeddings model. Embed v3 offers state-of-the-art performance per trusted MTEB and BEIR benchmarks. One of the key improvements in Embed v3 is its ability to evaluate how well a query matches a document's topic and assesses the overall quality of the content. This means that it can rank the highest-quality documents at the top, which is especially helpful when dealing with noisy datasets.

On the other hand, OpenAI's text-embedding-ada-002 outperforms all the old embedding models on text search, code search, and sentence similarity tasks and gets comparable performance on text classification. 

For our evaluations, we have used Financial QA dataset. The FiQA dataset has roughly 6,000 questions and 57,000 answers. Financial QA is hard because the vocabularies are context specific. In this experiment, we have randomly picked 150 questions and performed our evaluations on top of it. 

In [44]:
import json 
import polars as pl

url = "https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/context_quality_comparison.jsonl"
dataset_path = os.path.join('./', "context_quality_comparison.jsonl")

if not os.path.exists(dataset_path):
    import httpx
    r = httpx.get(url)
    with open(dataset_path, "wb") as f:
        f.write(r.content)
        
dataset= pl.read_ndjson(dataset_path)

In [45]:
print(dataset)

shape: (150, 4)
┌────────────────────────┬────────────────────────┬────────────────────────┬───────────────────────┐
│ question               ┆ ground_truth           ┆ context_cohere         ┆ context_openai        │
│ ---                    ┆ ---                    ┆ ---                    ┆ ---                   │
│ str                    ┆ str                    ┆ str                    ┆ str                   │
╞════════════════════════╪════════════════════════╪════════════════════════╪═══════════════════════╡
│ Am I exposed to        ┆ "Yes, you're still     ┆ The value of a foreign ┆ Your definition of    │
│ currency risk wh…      ┆ exposed to cu…         ┆ stock is …             ┆ 'outside your…        │
│ What happen in this    ┆ "But what happen if    ┆ If you sold bought a   ┆ 1) Yes, both of your  │
│ selling call…          ┆ the stock pr…          ┆ call option…           ┆ scenarios w…          │
│ How to use a companion ┆ "You must buy both     ┆ "You must buy both     

## Running Uptrain Eval - Context Relevancy

### OpenAI Embeddings

In [5]:
from uptrain import APIClient, Evals, ResponseMatching
UPTRAIN_API_KEY = "up-*********************" ## INSERT YOUR UPTRAIN KEY HERE

client = APIClient(uptrain_api_key=UPTRAIN_API_KEY)


In [21]:
results_openai = client.log_and_evaluate(
    project_name="benchmark",
    data = final_dataset.with_columns([pl.col("context_openai").alias("context")]).to_dicts(),
    checks = [Evals.CONTEXT_RELEVANCE]
)


[32m2023-11-05 15:26:49.650[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain server[0m
[32m2023-11-05 15:27:02.894[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 50 to <100 to the Uptrain server[0m
[32m2023-11-05 15:27:31.809[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 100 to <150 to the Uptrain server[0m


### Cohere Embed v3 Embeddings

In [22]:
results_cohere = client.log_and_evaluate(
    project_name="benchmark",
    data = final_dataset.with_columns([pl.col("context_cohere").alias("context")]).to_dicts(),
    checks = [Evals.CONTEXT_RELEVANCE]
)


[32m2023-11-05 15:27:45.397[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain server[0m
[32m2023-11-05 15:27:56.904[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 50 to <100 to the Uptrain server[0m
[32m2023-11-05 15:28:09.748[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m455[0m - [1mSending evaluation request for rows 100 to <150 to the Uptrain server[0m


## Comparitive Analysis

In [40]:
print('Question:', dataset[31]['question'][0])
print('Cohere context:', dataset[31]['context_cohere'][0])
print()
print('OpenAI context:', dataset[31]['context_openai'][0])

Question: Determining the minimum dividend that should be paid from my S corporation
Cohere context: @littleadv is right, this depends on your country. Furthermore, this is likely to depend on the type of business you own (in the US: LLC, S-corp, C-corp). In some countries you have to provide yourself a minimum wage if you are classified as a major shareholder and work for the company. When there is a minimum level of wage you have to pay yourself the tax rate on wages is typically higher than on dividends. The wage you then receive is taxed in line with normal wage taxation rules. Above the minimum wage you can pay yourself in dividends.
"If you have an S-Corp with several shareholders - you probably also have a tax adviser who suggested using S-Corp to begin with.  You're probably best off asking that adviser about this issue. If you decided to use S-Corp for multiple shareholders without a professional guiding you, you should probably start looking for such a professional, or you ma

In this example, we can clearly see that the context retrieved from the cohere's embed v3 model is much more relevant to the given question.

In [43]:
print('Context Relevance Score using Embed v3 model:', results_cohere[31]['score_context_relevance'])
print('Context Relevance Score using OpenAI model:', results_openai[31]['score_context_relevance'])

Context Relevance Score using Embed v3 model 1.0
Context Relevance Score using OpenAI model: 0.0


The same conclusion can be achieved by running the Uptrain Standard Eval - Context Relevance. We can see that the context retrieved from the embed v3 model has got a score of 1.0 while the context retrieved from the OpenAI embedding model has a score of 0.0. 


## Conclusion

In [52]:
score_context_relevance_openai = list(pl.DataFrame(results_openai)['score_context_relevance'])
score_context_relevance_cohere = list(pl.DataFrame(results_cohere)['score_context_relevance'])

In [57]:
print('Average Context Relevancy Score using OpenAI model:', sum(score_context_relevance_openai)/len(score_context_relevance_openai))
print('Average Context Relevancy Score using Embed v3 model:', sum(score_context_relevance_cohere)/len(score_context_relevance_cohere))

Average Context Relevancy Score using OpenAI model: 0.51
Average Context Relevancy Score using Embed v3 model: 0.59


Empirically, this shows that the context retrieved from the Embed v3 model is of better quality in comparison to the OpenAI embedding model. 