In [1]:
text1 = """
Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. It's often used in machine learning and data science to find the parameters that minimize a particular cost function. 
 
Here's a more detailed explanation: 
 
To start with, you have a cost or loss function, which measures how well your algorithm performs. This function depends on your algorithm's parameters. You want to adjust the parameters to minimize the cost function. But the function is complex and it's not clear how to adjust the parameters to make the cost lower. 
 
The idea behind gradient descent is to look at the slope of the cost function at your current point. The slope, or gradient, points in the direction of the steepest increase in the function. So if you adjust your parameters in the opposite direction to the gradient, you'll move towards a minimum of the function. 
 
You iteratively adjust your parameters in the direction of steepest descent, until you hopefully reach the global minimum of the function. There's a parameter called the learning rate which controls how big a step you take at each iteration. 
 
One challenge with gradient descent is that it can get stuck in local minima - points where the function is lower than all nearby points, but not lower than some other point further away. There are various versions of gradient descent (like stochastic gradient descent or mini-batch gradient descent) that try to overcome this issue. 
"""

text2 = """
Gradient descent is an optimization algorithm used in machine learning and deep learning to minimize a loss function by iteratively moving towards the minimum value. It is a popular method for training artificial neural networks, and is particularly useful for solving problems where the objective function has multiple local minima. 
 
The main idea behind gradient descent is to determine the direction in which the function decreases the steepest at each point in the parameter space, and then adjust the parameters (such as weights and biases in a neural network) accordingly. This is done by calculating the gradient of the loss function with respect to each parameter. The gradient is a vector of partial derivatives that points in the direction of the greatest increase of the function. By taking steps in the opposite direction of the gradient, we aim to move towards the minimum of the function. 
 
There are three main types of gradient descent: 
 
1. Batch gradient descent: This method computes the gradient using the entire dataset, and updates the parameters accordingly. This can be computationally expensive, especially for large datasets. 
 
2. Stochastic gradient descent (SGD): This method computes the gradient and updates the parameters using a single training example at a time. This can be faster than batch gradient descent, but may result in a more erratic path towards the minimum due to the noisy nature of the individual training examples. 
 
3. Mini-batch gradient descent: This method is a compromise between batch gradient descent and SGD. It computes the gradient and updates the parameters using a subset (mini-batch) of the training examples. This can provide a balance between computational efficiency and convergence stability. 
 
Gradient descent has a few hyperparameters, such as the learning rate, which determines the step size in the parameter updates. Choosing the right learning rate is crucial for convergence to the global minimum. If the learning rate is too small, the algorithm will take too long to converge; if it's too large, the algorithm may overshoot the minimum and fail to converge. 
 
In summary, gradient descent is an optimization algorithm used to minimize a loss function by iteratively adjusting the parameters in the direction of the steepest decrease, allowing for the training of machine learning models like artificial neural networks. 
"""

In [2]:
text1 = """
### Introduction\n\nIn the realm of economics, understanding individuals' attitudes towards risk is crucial for explaining various economic behaviors, especially in the context of investment, consumption, and insurance decisions. The concept of risk aversion and its relationship with the utility of income plays a pivotal role in this understanding. I will discuss whether an individual being risk-averse is synonymous with having diminishing marginal utility of income.\n\n### Key Concepts\n\n- **Risk Aversion**: A risk-averse individual prefers to avoid uncertainty, choosing a certain outcome over a gamble with a potentially higher expected value. This trait is quantitatively measured by the curvature of the utility function: a concave utility function indicates risk aversion.\n- **Utility**: Utility represents satisfaction or happiness that a consumer obtains from consumption of goods and services. The utility function maps levels of wealth or income to a real number indicating levels of utility.\n- **Marginal Utility of Income**: This is the additional satisfaction or utility a consumer gains from receiving an additional unit of income. It is represented by the first derivative of the utility function with respect to income.\n\n### Proofs\n\n#### Proposition: An individual is risk averse if and only if he/she has diminishing marginal utility of income.\n\n- **Forward Direction (If an individual is risk-averse, then he/she has diminishing marginal utility of income)**: To demonstrate this, consider a utility function \\(U(I)\\), where \\(I\\) represents income. Risk aversion implies that \\(U(I)\\) is concave. Mathematically, a function is concave if its second derivative is negative, i.e., \\(U''(I) < 0\\). The negative second derivative indicates that the slope of the utility function (the marginal utility of income) is decreasing in income, which by definition means diminishing marginal utility of income.\n\n- **Reverse Direction (If an individual has diminishing marginal utility of income, then he/she is risk-averse)**: Assuming that the marginal utility of income diminishes as income increases (\\(U''(I) < 0\\)), it follows that the utility function is concave. A concave utility function means that the individual prefers a certain outcome over a gamble with the same expected income, which is the definition of risk aversion.\n\n### Examples\n\nConsider an individual with a utility function \\(U(I) = \\sqrt{I}\\), where \\(I\\) is income.\n\n- To check for concavity, we derive the first and second derivatives of \\(U(I)\\):\n  - The first derivative \\(U'(I) = \\frac{1}{2\\sqrt{I}}\\), which is positive and decreases as \\(I\\) increases, indicating that marginal utility diminishes with income.\n  - The second derivative \\(U''(I) = -\\frac{1}{4I\\sqrt{I}}\\), which is negative, indicating that \\(U(I)\\) is concave, and thus the individual is risk-averse.\n  \nThis utility function exhibits both diminishing marginal utility of income and risk aversion, consistent with our proposition.\n\n### Conclusion\n\nThe proposition that an individual is risk averse if and only if he/she has diminishing marginal utility of income holds true under the standard assumptions of utility theory. The diminishing marginal utility of income, represented by a concave utility function, inherently captures the essence of risk aversion. This relationship is foundational in economics, helping to explain why individuals make certain financial decisions, including savings, insurance, and investment choices, based on their levels of income and attitudes towards risk.
"""

text2 = """
### Introduction\n\nThe relationship between risk aversion and diminishing marginal utility of income is a fundamental concept in both economics and decision theory. To explore whether an individual is risk averse if and only if they have diminishing marginal utility of income, we need to delve into the definitions and theories underpinning these concepts.\n\n### Key Concepts\n\n**Risk Aversion**: A risk-averse individual prefers a certain outcome over a gamble with an equivalent expected value. This preference indicates a concave utility function, where the utility gained from an incremental increase in wealth decreases as wealth increases.\n\n**Marginal Utility of Income**: This refers to the additional satisfaction (utility) a person gains from receiving an additional unit of income. The concept of diminishing marginal utility suggests that as income increases, the additional satisfaction gained from an extra unit of income decreases.\n\n**Utility Function**: A mathematical representation of an individual's preference. For risk-averse individuals, the utility function is concave due to diminishing marginal utility.\n\n### Proofs\n\n**Forward Direction (If an individual is risk averse, then they have diminishing marginal utility of income):**\n\n- Given a risk-averse individual, their utility function \\(U(W)\\) is concave, where \\(W\\) is wealth.\n- For a utility function to be concave, its second derivative must be negative \\(U''(W) < 0\\).\n- A negative second derivative implies that the slope (or the marginal utility) of the utility function is decreasing as wealth increases.\n- Hence, if an individual is risk averse (preferring certainty over equivalent risk), this implies that the individual has diminishing marginal utility of income.\n\n**Reverse Direction (If an individual has diminishing marginal utility of income, then they are risk averse):**\n\n- Assume an individual has a utility function \\(U(W)\\) with diminishing marginal utility, implying \\(U''(W) < 0\\).\n- Consider a choice between a certain income \\(W_c\\) and a gamble with expected wealth \\(E[W_g] = W_c\\).\n- Due to diminishing marginal utility, the utility of the expected wealth of the gamble \\(U(E[W_g])\\) is less than the expected utility of the gamble \\(E[U(W_g)]\\), because the concavity of \\(U(W)\\) means that the utility of average wealth is less than the average utility across different states of wealth.\n- Therefore, the individual prefers the certain outcome \\(W_c\\) over the gamble, indicating risk aversion.\n\n### Examples\n\n**Example 1: Insurance Purchase**\n\n- Assume an individual can either face a loss with some probability or pay an insurance premium to avoid the loss.\n- A risk-averse individual, due to diminishing marginal utility of income, is willing to pay a premium (which is more than the expected loss) for insurance to avoid the risk, showing a preference for certainty.\n\n**Example 2: Investment Choices**\n\n- Consider two investment options: a risk-free government bond with a certain return and a risky stock with higher potential returns but also potential losses.\n- An individual with diminishing marginal utility of income would prefer the bond over the stock, even if the expected returns are the same, due to the preference for certainty over risk.\n\n### Conclusion\n\nThe propositions that an individual is risk averse if and only if they have diminishing marginal utility of income are supported both theoretically and through practical examples. The concavity of the utility function, indicating both risk aversion and diminishing marginal utility, serves as the crucial link between these concepts. Thus, these characteristics are inherently connected, providing a foundational principle in understanding economic behavior towards risk and income.
"""

In [3]:
text1 = "The bottle is empty."
text2 = "There is nothing in the bottle."

In [4]:
import numpy as np


def stylistic_features(text):
    """
    Extracts stylistic features from a given text, including text length.

    Parameters:
    - text (str): The input text from which stylistic features are extracted.

    Returns:
    - dict: A dictionary containing the sentence count, average word length, punctuation count, and text length.
    """
    text_length = len(text)
    sentences = text.split(".")
    sentence_count = len(sentences) - 1
    word_lengths = [len(word) for word in text.split()]
    average_word_length = sum(word_lengths) / len(word_lengths) if word_lengths else 0
    punctuation_count = sum(1 for char in text if char in ".,;:!?")

    return {
        "text_length": text_length,
        "sentence_count": sentence_count,
        "average_word_length": average_word_length,
        "punctuation_count": punctuation_count,
    }


def stylistic_similarity(
    text1,
    text2,
    features_to_compare=["text_length", "sentence_count", "average_word_length", "punctuation_count"],
):
    """
    Calculates the stylistic similarity between two texts based on customizable features, including consideration of text length.

    Parameters:
    - text1 (str): The first text.
    - text2 (str): The second text.
    - features_to_compare (list): A list of features to compare for similarity. Options include 'sentence_count', 'average_word_length', 'punctuation_count', and 'text_length'.

    Returns:
    - float: The overall stylistic similarity between the two texts based on the selected features.
    """
    features1 = stylistic_features(text1)
    features2 = stylistic_features(text2)
    similarities = []

    for feature in features_to_compare:
        if feature in features1 and feature in features2:
            similarity = 1 - abs(features1[feature] - features2[feature]) / max(features1[feature], features2[feature], 1)
            similarities.append(similarity)

    if similarities:
        average_similarity = np.mean(similarities)
    else:
        average_similarity = 0

    return average_similarity


def extract_structural_features(text):
    """
    Extracts structural features from a given text, including headers, bullet points, and numbered lists.

    Parameters:
    - text (str): The input text from which structural features are extracted.

    Returns:
    - dict: A dictionary containing the header count, bullet points count, and numbered list count.
    """
    features = {}
    features["header_count"] = sum(1 for line in text.split("\n") if line.startswith("#"))
    features["bullet_point_count"] = text.count("\n- ") + text.count("\n* ")
    features["numbered_list_count"] = sum(1 for line in text.split("\n") if line.strip().isdigit() or (len(line.strip()) > 1 and line.strip()[0].isdigit() and line.strip()[1] == "."))
    return features


def structural_similarity(text1, text2):
    """
    Calculates the structural similarity between two texts, considering headers, bullet points, and numbered lists.

    Parameters:
    - text1 (str): The first text.
    - text2 (str): The second text.

    Returns:
    - float: The overall structural similarity between the two texts.
    """
    features1 = extract_structural_features(text1)
    features2 = extract_structural_features(text2)

    header_similarity = 1 - abs(features1["header_count"] - features2["header_count"]) / max(features1["header_count"], features2["header_count"], 1)
    bullet_similarity = 1 - abs(features1["bullet_point_count"] - features2["bullet_point_count"]) / max(features1["bullet_point_count"], features2["bullet_point_count"], 1)
    numbered_list_similarity = 1 - abs(features1["numbered_list_count"] - features2["numbered_list_count"]) / max(features1["numbered_list_count"], features2["numbered_list_count"], 1)

    average_similarity = (header_similarity + bullet_similarity + numbered_list_similarity) / 3

    return average_similarity


def format_similarity(text1, text2):
    """
    Calculates a comprehensive format similarity between two texts, combining stylistic and structural scores, including the consideration of text length.

    Parameters:
    - text1 (str): The first text.
    - text2 (str): The second text.

    Returns:
    - float: The overall format similarity between the two texts.
    """
    features_to_compare = ["text_length", "sentence_count", "average_word_length", "punctuation_count"]
    stylistic_score = stylistic_similarity(text1, text2, features_to_compare[:2])
    structural_score = structural_similarity(text1, text2)

    overall_score = stylistic_score * 0.5 + structural_score * 0.5
    return overall_score


print(f"format_similarity: {format_similarity(text1, text2)}")

format_similarity: 0.9112903225806451


In [5]:
from sentence_transformers import SentenceTransformer
from scipy.spatial.distance import cosine


def BERTsimilarity(text1, text2):
    """
    Calculates the similarity between two texts using BERT embeddings.

    Parameters:
    - text1 (str): The first text.
    - text2 (str): The second text.

    Returns:
    - float: The similarity between the two texts.
    """
    model = SentenceTransformer("all-mpnet-base-v2")
    embedding1 = model.encode(text1)
    embedding2 = model.encode(text2)
    similarity = 1 - cosine(embedding1, embedding2)
    return similarity

In [6]:
import json
import pandas as pd

# Assuming the JSON structure is a list of dictionaries with keys: question, model, provider, response
# Load the JSON data
with open("responses.json", "r") as file:
    data = json.load(file)

# Convert to DataFrame
df = pd.DataFrame(data)


# Calculate similaritys
# Initialize a column for similarities
df["BERT_similarity"] = 0.0
df["format_similarity"] = 0.0

# Iterate over each question and model to calculate similaritys between OpenAI and Azure responses
for question in df["question"].unique():
    for model in df["model"].unique():
        openai_response = df[(df["question"] == question) & (df["model"] == model) & (df["provider"] == "openai")]["response"].iloc[0]
        azure_response = df[(df["question"] == question) & (df["model"] == model) & (df["provider"] == "azure")]["response"].iloc[0]

        # Update the DataFrame with the calculated similaritys
        df.loc[(df["question"] == question) & (df["model"] == model), "BERT_similarity"] = BERTsimilarity(openai_response, azure_response)
        df.loc[(df["question"] == question) & (df["model"] == model), "format_similarity"] = format_similarity(openai_response, azure_response)

In [7]:
def clean_data(df):
    """
    Cleans the DataFrame by removing specific columns and duplicates, and resetting the index.

    Parameters:
    - df (pd.DataFrame): The DataFrame to be cleaned.

    Returns:
    - pd.DataFrame: The cleaned DataFrame.
    """
    df = df.drop(columns=["provider", "response"])
    df = df.drop_duplicates()
    # Use factorize to encode the unique questions, starting with 1
    # df["question"], _ = pd.factorize(df["question"])
    # Increment by 1 to start numbering from 1 instead of 0
    # df["question"] = df["question"] + 1
    df.reset_index(drop=True, inplace=True)
    return df


df_similarity = clean_data(df.copy(deep=True))
df_similarity

Unnamed: 0,question,model,BERT_similarity,format_similarity
0,Tell me some useful info and tips about AI.,gpt-4,0.846239,0.681717
1,Tell me some useful info and tips about AI.,gpt-4-turbo,0.892335,0.907324
2,Are you aware of Phantom Liberty? Please brief...,gpt-4,0.790551,0.962236
3,Are you aware of Phantom Liberty? Please brief...,gpt-4-turbo,0.862975,0.879685
4,\nCyberpunk 2077 — Never Fade Away by P. T. Ad...,gpt-4,0.933726,0.934048
5,\nCyberpunk 2077 — Never Fade Away by P. T. Ad...,gpt-4-turbo,0.887388,0.934703
6,\nA bird in the hand is worth two in the bush\...,gpt-4,0.913258,0.731834
7,\nA bird in the hand is worth two in the bush\...,gpt-4-turbo,0.909161,0.581083
8,\nProve or disprove: “An individual is risk av...,gpt-4,0.924695,0.764237
9,\nProve or disprove: “An individual is risk av...,gpt-4-turbo,0.953328,0.854919


In [8]:
df_similarity["BERT_similarity"].describe()

count    14.000000
mean      0.889601
std       0.045067
min       0.790551
25%       0.866600
50%       0.889862
75%       0.921836
max       0.953328
Name: BERT_similarity, dtype: float64

In [9]:
df_similarity["format_similarity"].describe()

count    14.000000
mean      0.837807
std       0.130243
min       0.581083
25%       0.739935
50%       0.893505
75%       0.943998
max       0.962236
Name: format_similarity, dtype: float64