<a href="https://colab.research.google.com/github/uptrain-ai/uptrain/blob/main/examples/benchmarks/claude_3_vs_gpt_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center">
  <a href="https://uptrain.ai">
    <img width="300" src="https://user-images.githubusercontent.com/108270398/214240695-4f958b76-c993-4ddd-8de6-8668f4d0da84.png" alt="uptrain">
  </a>
</h1>

# Claude 3 vs GPT-4
Claude 3 was recently launched by Anthropic as a competitor to OpenAI's GPT-4. In this notebook, we will compare the two models to see if you should make the switch from GPT-4 to Claude 3.

To do this comparison, we will use UpTrain's Response Matching operator. This operator takes in two values - response and ground_truth - and returns a score between 0 and 1. The score is 1 if the response is very similar the ground_truth and 0 if the response is completely different from the ground_truth.

We have curated a dataset of 25 questions and context pairs. For each question, we will get responses from both GPT-4 and Claude 3.  We will take the response from GPT-4 as the ground_truth and compare the response from Claude 3 to the ground_truth using the Response Matching operator.

# Import the required libraries

In [1]:
from uptrain import Settings
from uptrain.operators import TextCompletion, JsonReader

import os
import polars as pl
import nest_asyncio
nest_asyncio.apply()



# Download the dataset

In [2]:
url = "https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/uptrain_benchmark.jsonl"
dataset_path = os.path.join('./', "uptrain_benchmark.jsonl")

if not os.path.exists(dataset_path):
    import httpx
    r = httpx.get(url)
    with open(dataset_path, "wb") as f:
        f.write(r.content)  

dataset = pl.read_ndjson(dataset_path)
print(dataset)

shape: (25, 3)
┌───────────────────────────────────┬───────────────────────────────────┬─────┐
│ question                          ┆ context                           ┆ idx │
│ ---                               ┆ ---                               ┆ --- │
│ str                               ┆ str                               ┆ i64 │
╞═══════════════════════════════════╪═══════════════════════════════════╪═════╡
│ How to get a grip on finance?'    ┆ Try downloading a finance app li… ┆ 1   │
│ How do “held” amounts appear on … ┆ "The ""hold"" is just placeholde… ┆ 2   │
│ Does negative P/E ratio mean sto… ┆ P/E is the number of years it wo… ┆ 3   │
│ Should a retail trader choose a … ┆ "That\'s like a car dealer adver… ┆ 4   │
│ Possibility to buy index funds a… ┆ "As user quid states in his answ… ┆ 5   │
│ …                                 ┆ …                                 ┆ …   │
│ Discuss the role of inflation in… ┆ Inflation is a pervasive economi… ┆ 21  │
│ Explain the concept of 

# Get responses from Claude 3

In [5]:
dataset_path="./uptrain_benchmark.jsonl"
claude_settings = Settings(model="claude-3-opus-20240229", rpm_limit=4)
dataset = JsonReader(fpath=dataset_path).setup(settings=claude_settings).run()["output"]

dataset = dataset.with_columns([pl.lit("claude-3-opus-20240229").alias("model")])
dataset_with_claude_responses = TextCompletion(col_in_prompt="question", col_out_completion="claude_3_response").setup(settings=claude_settings).run(dataset)["output"]
dataset_with_claude_responses

100%|██████████| 25/25 [05:31<00:00, 13.25s/it]


question,context,idx,model,claude_3_response
str,str,i64,str,str
"""How to get a g…","""Try downloadin…",1,"""claude-3-opus-…","""To get a grip …"
"""How do “held” …","""""The """"hold"""" …",2,"""claude-3-opus-…","""When a credit …"
"""Does negative …","""P/E is the num…",3,"""claude-3-opus-…","""A negative P/E…"
"""Should a retai…","""""That\'s like …",4,"""claude-3-opus-…","""Dark pools are…"
"""Possibility to…","""""As user quid …",5,"""claude-3-opus-…","""Yes, it is pos…"
…,…,…,…,…
"""Discuss the ro…","""Inflation is a…",21,"""claude-3-opus-…","""Inflation is a…"
"""Explain the co…",""" The Earth's …",22,"""claude-3-opus-…","""Plate tectonic…"
"""How did the su…",""" The Surreal…",23,"""claude-3-opus-…","""The Surrealist…"
"""Discuss the im…",""" Globalizatio…",24,"""claude-3-opus-…","""Globalization …"


In [6]:
dataset_with_claude_responses

question,context,idx,model,claude_3_response
str,str,i64,str,str
"""How to get a g…","""Try downloadin…",1,"""claude-3-opus-…","""To get a grip …"
"""How do “held” …","""""The """"hold"""" …",2,"""claude-3-opus-…","""When a credit …"
"""Does negative …","""P/E is the num…",3,"""claude-3-opus-…","""A negative P/E…"
"""Should a retai…","""""That\'s like …",4,"""claude-3-opus-…","""Dark pools are…"
"""Possibility to…","""""As user quid …",5,"""claude-3-opus-…","""Yes, it is pos…"
…,…,…,…,…
"""Discuss the ro…","""Inflation is a…",21,"""claude-3-opus-…","""Inflation is a…"
"""Explain the co…",""" The Earth's …",22,"""claude-3-opus-…","""Plate tectonic…"
"""How did the su…",""" The Surreal…",23,"""claude-3-opus-…","""The Surrealist…"
"""Discuss the im…",""" Globalizatio…",24,"""claude-3-opus-…","""Globalization …"


# Get Responses from GPT-4

In [7]:
gpt_settings = Settings(model="gpt-4", rpm_limit=100)
dataset = dataset_with_claude_responses.with_columns([pl.lit("gpt-4").alias("model")])
experiment_dataset = TextCompletion(col_in_prompt="question", col_out_completion="gpt_4_response").setup(settings=gpt_settings).run(dataset)["output"]
experiment_dataset

100%|██████████| 25/25 [00:35<00:00,  1.44s/it]


question,context,idx,model,claude_3_response,gpt_4_response
str,str,i64,str,str,str
"""How to get a g…","""Try downloadin…",1,"""gpt-4""","""To get a grip …","""1. Education: …"
"""How do “held” …","""""The """"hold"""" …",2,"""gpt-4""","""When a credit …","""When you use a…"
"""Does negative …","""P/E is the num…",3,"""gpt-4""","""A negative P/E…","""No, a negative…"
"""Should a retai…","""""That\'s like …",4,"""gpt-4""","""Dark pools are…","""Whether a reta…"
"""Possibility to…","""""As user quid …",5,"""gpt-4""","""Yes, it is pos…","""Yes, it is pos…"
…,…,…,…,…,…
"""Discuss the ro…","""Inflation is a…",21,"""gpt-4""","""Inflation is a…","""Inflation is a…"
"""Explain the co…",""" The Earth's …",22,"""gpt-4""","""Plate tectonic…","""Plate tectonic…"
"""How did the su…",""" The Surreal…",23,"""gpt-4""","""The Surrealist…","""The Surrealist…"
"""Discuss the im…",""" Globalizatio…",24,"""gpt-4""","""Globalization …","""Globalization …"


# Use the Response Matching operator to get the scores

In [9]:
from uptrain import EvalLLM, ResponseMatching

settings = Settings(evaluate_locally=False)

# Drop the "context" and "model" columns as they are not needed for local evaluation
experiment_dataset = experiment_dataset.drop(["context", "model"])

eval_llm = EvalLLM(settings=settings)
results = eval_llm.evaluate(
    data=experiment_dataset,
    checks=[
        ResponseMatching(
            method="llm",
        )
    ],
    schema={
        "question": "question",
        "response": "claude_3_response",
        "ground_truth": "gpt_4_response",
    }
)

[32m2024-03-07 00:31:13.158[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate_on_server[0m:[36m341[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m


[32m2024-03-07 00:31:51.691[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m330[0m - [1mServer is not running![0m


# Analysis

Let's take a few examples and see the scores for each model.

In [14]:
row = results[0]
print("Question:", row["question"])

Question: How to get a grip on finance?'


In [15]:
print("GPT-4 Response:\n\n")
print(row["gpt_4_response"])

GPT-4 Response:


1. Education: Start by learning the basics of financial concepts. This includes savings, investment, loans, interest rates, taxes, etc. You can learn about finance through online courses, books, blogs, podcasts and webinars.

2. Budgeting: Develop a habit of budgeting your income and expenses. This will help you manage your money more effectively, avoid unnecessary expenses, and save more.

3. Investment: Learn about different types of investment options such as stocks, bonds, mutual funds, real estate, etc., and make informed decisions based on your financial goals and risk tolerance.

4. Financial Planning: Understand the importance of financial planning. This includes setting financial goals, planning for retirement, tax planning, estate planning, etc.

5. Credit Management: Understand how credit works, the importance of a good credit score, and how to manage credit effectively.

6. Consult Experts: Consider consulting a financial advisor or expert for professional

In [16]:
print("Claude-3 Response:\n\n")
print(row["claude_3_response"])

Claude-3 Response:


To get a grip on your finances, you can follow these steps:

1. Track your income and expenses: Start by keeping a record of all your income sources and expenses. This will help you understand your spending patterns and identify areas where you can cut back.

2. Create a budget: Based on your income and expenses, create a realistic budget that allocates your money towards essential expenses, savings, and discretionary spending.

3. Set financial goals: Establish short-term and long-term financial goals, such as paying off debt, saving for a down payment on a house, or building an emergency fund.

4. Pay off debt: If you have any outstanding debts, make a plan to pay them off as quickly as possible. Start with high-interest debts like credit card balances.

5. Save and invest: Allocate a portion of your income towards savings and investments. Consider setting up automatic transfers to your savings account or retirement fund.

6. Cut unnecessary expenses: Look for ar

In [17]:
print("Response Matching Score:", row["score_response_match"])

Response Matching Score: 0.7567567567567567
