# Unit 3

## Semantic Evaluation with Embeddings

### Semantic Evaluation with Embeddings

#### Introduction to Semantic Evaluation with Embeddings

Welcome to the next step in your journey of benchmarking Large Language Models (LLMs) for text generation. In the previous lesson, you learned how to evaluate text generation models using the ROUGE metric, which focuses on string similarity. Now, we will explore **semantic evaluation**, which goes beyond surface-level text comparison to understand the meaning behind the words. This is where **embeddings** come into play.

Embeddings are numerical representations of text that capture semantic meaning, allowing us to measure how similar two pieces of text are in terms of their underlying concepts. They transform words, phrases, or even entire documents into vectors in a continuous vector space. This transformation enables the comparison of texts based on their meanings rather than just their literal content.

To evaluate semantic similarity, we use **cosine similarity** as a metric. Cosine similarity measures the cosine of the angle between two vectors, providing a value between -1 and 1. A value of 1 indicates that the vectors are identical in direction, meaning the texts are semantically similar. A value of 0 indicates orthogonality, meaning no similarity, and -1 indicates completely opposite meanings. This lesson will guide you through the process of using embeddings to assess the quality of generated summaries, providing a deeper understanding of model performance.

-----

#### Understanding Cosine Similarity: The Math Behind Semantic Evaluation

Cosine similarity is a key metric for comparing the semantic similarity between two text embeddings. It measures the cosine of the angle between two vectors in a multi-dimensional space, providing a value between -1 and 1.

The mathematical formula for cosine similarity between two vectors $A$ and $B$ is:

$$cosine\_similarity = \frac{A \cdot B}{\|A\| \|B\|}$$

  - **Dot product** ($A \\cdot B$): This is the sum of the products of the corresponding elements of the two vectors. If $A=[a\_1, a\_2, ..., a\_n]$ and $B=[b\_1, b\_2, ..., b\_n]$, then:

$$A \cdot B = a_1b_1 + a_2b_2 + \dots + a_nb_n$$

  - **Norm** ($|A|$): This is the length (or magnitude) of the vector, calculated as:

$$\|A\| = \sqrt{a_1^2 + a_2^2 + \dots + a_n^2}$$

The resulting cosine similarity value will be:

  - **1** if the vectors are identical in direction (maximum similarity),
  - **0** if the vectors are orthogonal (no similarity),
  - **-1** if the vectors are diametrically opposed (opposite meaning).

Understanding this formula helps clarify how embeddings are compared based on their meaning, not just their literal content.

-----

#### Setting Up the Environment

Before we dive into the code, let's ensure your environment is ready. You will need the `openai`, `numpy`, and `csv` libraries. If you're working on your local machine, you can install these using `pip`:

```bash
pip install openai numpy
```

On CodeSignal, these libraries are pre-installed, so you can focus on the code without worrying about setup. This setup will allow us to interact with the OpenAI API, perform mathematical operations, and handle CSV files.

-----

#### Example: Calculating Semantic Similarity

Now, let's walk through the code example to see how semantic similarity is calculated. We start by defining the `cosine_similarity` function, which uses the `dot` product and `norm` functions from `numpy` to compute the similarity between two vectors. This function is crucial for comparing the embeddings of the generated and reference summaries. Next, the `get_embedding` function interacts with the OpenAI API to obtain embeddings for a given text. This is done by calling the `embeddings.create` method with the appropriate model and input text. The main part of the code reads a CSV file containing articles and their summaries. For each article, a prompt is created to generate a summary using the GPT-4 model. The embeddings for both the generated summary and the reference summary are obtained using the `get_embedding` function. The cosine similarity between these embeddings is calculated and stored. Finally, the average semantic similarity score is printed, providing a quantitative measure of the model's performance.

```python
import csv
from openai import OpenAI
from numpy import dot
from numpy.linalg import norm

client = OpenAI()

def cosine_similarity(a, b):
    return dot(a, b) / (norm(a) * norm(b))

def get_embedding(text):
    return client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    ).data[0].embedding

with open("cnn_dailymail_subset.csv") as f:
    rows = list(csv.DictReader(f))

scores = []
for r in rows:
    prompt = f"Summarize the following article:\n{r['article']}"
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content.strip()
    
    ref_embed = get_embedding(r["summary"])
    resp_embed = get_embedding(response)

    score = cosine_similarity(resp_embed, ref_embed)
    scores.append(score)

print(f"Average Semantic Similarity Score: {sum(scores)/len(scores):.3f}")
```

The output of this code will be an average semantic similarity score, which indicates how closely the generated summaries match the reference summaries in terms of meaning.

-----

#### Interpreting the Results and Troubleshooting

The semantic similarity scores you obtain provide insight into the quality of the generated summaries. A higher score indicates a closer match to the reference summary, suggesting that the model has captured the essential meaning of the text. Conversely, a lower score may indicate that the generated summary is missing key concepts or includes irrelevant information. When interpreting these scores, consider the context and complexity of the text being summarized. If you encounter issues such as API errors or unexpected results, ensure that your API key is correctly configured and that the input text is formatted properly. Debugging these issues will help you achieve accurate and meaningful results.

-----

#### Summary and Preparation for Practice Exercises

In this lesson, you learned how to use embeddings and cosine similarity to evaluate the semantic quality of text summaries. We covered the setup of the environment, the structure of the evaluation code, the mathematical foundation of cosine similarity, and how to interpret the results. This knowledge will be invaluable as you move on to the practice exercises, where you'll apply these concepts to assess the performance of text generation models. Remember, semantic evaluation provides a deeper understanding of model performance by focusing on meaning rather than just surface-level text similarity. Good luck with the exercises, and continue to explore the fascinating world of text generation\!

## Implementing Cosine Similarity for Vector Comparison

Now that you've learned about semantic evaluation and how embeddings capture meaning, let's focus on one of the core components: the cosine similarity function. This mathematical operation is essential for comparing vector representations of text.

In this exercise, you'll implement the cosine_similarity function, which calculates how similar two vectors are in terms of their direction. The function should:

Take two vectors (a and b) as input
Calculate the similarity using numpy's dot product and norm functions
Handle edge cases properly, such as zero vectors
Return a value between -1 and 1, where 1 means identical direction
The test code is already set up with various vector pairs to verify that your implementation works correctly. By mastering this fundamental calculation, you'll build a solid foundation for evaluating semantic similarity in more complex text generation scenarios.

```python
from numpy import dot
from numpy.linalg import norm
import numpy as np

def cosine_similarity(a, b):
    # TODO: Implement the cosine similarity function
    # Remember to handle edge cases like zero vectors
    # The function should return a value between -1 and 1
    pass

# Identical vectors
identical_vector_1 = np.array([0.5, 0.5, 0.5, 0.5])
identical_vector_2 = np.array([0.5, 0.5, 0.5, 0.5])

# Orthogonal vectors
orthogonal_vector_1 = np.array([1, 0, 0, 0])
orthogonal_vector_2 = np.array([0, 1, 0, 0])

# Vectors with moderate similarity
moderate_vector_1 = np.array([1, 0, 1, 0])
moderate_vector_2 = np.array([0.5, 0.5, 0.5, 0.5])

# Test with identical vectors (should be 1.0)
similarity = cosine_similarity(identical_vector_1, identical_vector_2)
print(f"Similarity between identical vectors: {similarity:.4f}")

# Test with orthogonal vectors (should be 0.0)
similarity = cosine_similarity(orthogonal_vector_1, orthogonal_vector_2)
print(f"Similarity between orthogonal vectors: {similarity:.4f}")

# Test with vectors with moderate similarity (should be around 0.5)
similarity = cosine_similarity(moderate_vector_1, moderate_vector_2)
print(f"Similarity between moderately similar vectors: {similarity:.4f}")
```

I can certainly help with that. Here is the completed code with the `cosine_similarity` function implemented as requested.

```python
from numpy import dot
from numpy.linalg import norm
import numpy as np

def cosine_similarity(a, b):
    # Calculate the dot product of the two vectors
    dot_product = dot(a, b)
    
    # Calculate the norms (magnitudes) of the two vectors
    norm_a = norm(a)
    norm_b = norm(b)
    
    # Handle the edge case where one or both vectors are zero vectors.
    # The dot product of a zero vector is 0, so the norm would also be 0.
    # Division by zero would cause an error.
    if norm_a == 0 or norm_b == 0:
        return 0.0
    
    # Calculate the cosine similarity
    similarity = dot_product / (norm_a * norm_b)
    
    return similarity

# Identical vectors
identical_vector_1 = np.array([0.5, 0.5, 0.5, 0.5])
identical_vector_2 = np.array([0.5, 0.5, 0.5, 0.5])

# Orthogonal vectors
orthogonal_vector_1 = np.array([1, 0, 0, 0])
orthogonal_vector_2 = np.array([0, 1, 0, 0])

# Vectors with moderate similarity
moderate_vector_1 = np.array([1, 0, 1, 0])
moderate_vector_2 = np.array([0.5, 0.5, 0.5, 0.5])

# Test with identical vectors (should be 1.0)
similarity = cosine_similarity(identical_vector_1, identical_vector_2)
print(f"Similarity between identical vectors: {similarity:.4f}")

# Test with orthogonal vectors (should be 0.0)
similarity = cosine_similarity(orthogonal_vector_1, orthogonal_vector_2)
print(f"Similarity between orthogonal vectors: {similarity:.4f}")

# Test with vectors with moderate similarity (should be around 0.5)
similarity = cosine_similarity(moderate_vector_1, moderate_vector_2)
print(f"Similarity between moderately similar vectors: {similarity:.4f}")
```

I have filled in the `cosine_similarity` function using the `dot` and `norm` functions from the NumPy library. I also added a check to handle the edge case of zero vectors to prevent a division-by-zero error.

## Creating Text Embeddings with OpenAI

In this exercise, you'll create a function that transforms text into numerical vectors using OpenAI's embedding API. Your task is to complete the get_embedding() function that converts a text input into its vector representation. This is a crucial step in semantic evaluation, as these embeddings capture the meaning of text in a way that allows for mathematical comparison.

To complete this exercise:

Use the client.embeddings.create() method to call the OpenAI API and create an embedding using the "text-embedding-3-small" model.
Extract the embedding vector from the response using response.data[0].embedding.
Test your function with a simple input text.
We've provided a setup to interact with the OpenAI API, so you can focus on the implementation. Once you've mastered this technique, you'll be able to compare the meanings of different texts by analyzing their embedding vectors.

```python
from openai import OpenAI
import numpy as np

client = OpenAI()  # Initialize the OpenAI client

def get_embedding(text):
    """
    Get the embedding for a given text using OpenAI's API.
    
    Args:
        text (str): The text to get an embedding for
        
    Returns:
        list: The embedding vector
    """
    # TODO: Call the client.embeddings.create method to get the embedding
    # Use the "text-embedding-3-small" model and the input text
    
    # TODO: Extract and return the embedding from the response using response.data[0].embedding
    pass

# Test the function with a simple input
test_text = "The president addressed the nation, highlighting the importance of economic reforms."
embedding = get_embedding(test_text)

# Print the first 5 values of the embedding
print(f"First 5 values of the embedding: {embedding[:5]}")

# Print the length of the embedding vector
print(f"Embedding dimension: {len(embedding)}")
```

```python
from openai import OpenAI
import numpy as np

client = OpenAI()  # Initialize the OpenAI client

def get_embedding(text):
    """
    Get the embedding for a given text using OpenAI's API.
    
    Args:
        text (str): The text to get an embedding for
        
    Returns:
        list: The embedding vector
    """
    # Call the client.embeddings.create method to get the embedding
    # Use the "text-embedding-3-small" model and the input text
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    
    # Extract and return the embedding from the response using response.data[0].embedding
    return response.data[0].embedding

# Test the function with a simple input
test_text = "The president addressed the nation, highlighting the importance of economic reforms."
embedding = get_embedding(test_text)

# Print the first 5 values of the embedding
print(f"First 5 values of the embedding: {embedding[:5]}")

# Print the length of the embedding vector
print(f"Embedding dimension: {len(embedding)}")
```

## Building a Semantic Comparison Pipeline

Excellent work on implementing both the cosine similarity function and the embedding generator! Now it's time to bring these components together to create a complete semantic evaluation pipeline.

In this exercise, you'll build a mini-system that compares two text summaries based on their meaning rather than just their words. You'll need to:

Implement the cosine_similarity function that handles edge cases like zero vectors.
Create the get_embedding function to transform text into vector representations.
Apply both functions to compare a reference summary with a generated summary.
The code includes example summaries for you to test your implementation. This exercise ties together everything you've learned about semantic evaluation, giving you a practical tool you can use to compare any two texts based on their meaning.

By completing this pipeline, you'll have a solid understanding of how modern NLP systems evaluate text similarity beyond simple word matching — a key skill for anyone working with language models.

```python
from openai import OpenAI
from numpy import dot
from numpy.linalg import norm

# Initialize the OpenAI client
client = OpenAI()

def cosine_similarity(a, b):
    """
    Calculate the cosine similarity between two vectors.
    
    Args:
        a (list): First vector
        b (list): Second vector
        
    Returns:
        float: Similarity score between -1 and 1
    """
    # TODO: Handle edge case of zero vectors
    
    # TODO: Calculate and return the cosine similarity between vectors a and b
    pass

def get_embedding(text):
    """
    Get the embedding for a given text using OpenAI's API.
    
    Args:
        text (str): The text to get an embedding for
        
    Returns:
        list: The embedding vector
    """
    # TODO: Call the client.embeddings.create method to get the embedding
    # Use the "text-embedding-3-small" model and the input text
    
    # TODO: Extract and return the embedding from the response
    pass

# Example text strings
reference_summary = "The company announced a new product that will revolutionize the market."
generated_summary = "A new revolutionary product was unveiled by the company."

# Get embeddings for both summaries
print("Getting embeddings for both summaries...")
# TODO: Get the embedding for the reference summary

# TODO: Get the embedding for the generated summary

# Calculate similarity between embeddings
# TODO: Calculate the cosine similarity between the two embeddings

# Print results
print(f"\nReference summary: \"{reference_summary}\"")
print(f"Generated summary: \"{generated_summary}\"")
print(f"\nSimilarity score: {similarity_score:.4f}")

```

To complete the semantic comparison pipeline, you need to fill in the `cosine_similarity` and `get_embedding` functions, and then use them to compare the example summaries.

Here's the completed code:

```python
from openai import OpenAI
from numpy import dot
from numpy.linalg import norm
import numpy as np

# Initialize the OpenAI client
client = OpenAI()

def cosine_similarity(a, b):
    """
    Calculate the cosine similarity between two vectors.
    
    Args:
        a (list): First vector
        b (list): Second vector
        
    Returns:
        float: Similarity score between -1 and 1
    """
    # Handle edge case of zero vectors
    if np.array_equal(a, np.zeros(len(a))) or np.array_equal(b, np.zeros(len(b))):
        return 0.0

    # Calculate and return the cosine similarity between vectors a and b
    return dot(a, b) / (norm(a) * norm(b))

def get_embedding(text):
    """
    Get the embedding for a given text using OpenAI's API.
    
    Args:
        text (str): The text to get an embedding for
        
    Returns:
        list: The embedding vector
    """
    # Call the client.embeddings.create method to get the embedding
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    
    # Extract and return the embedding from the response
    return response.data[0].embedding

# Example text strings
reference_summary = "The company announced a new product that will revolutionize the market."
generated_summary = "A new revolutionary product was unveiled by the company."

# Get embeddings for both summaries
print("Getting embeddings for both summaries...")
reference_embedding = get_embedding(reference_summary)
generated_embedding = get_embedding(generated_summary)

# Calculate similarity between embeddings
similarity_score = cosine_similarity(reference_embedding, generated_embedding)

# Print results
print(f"\nReference summary: \"{reference_summary}\"")
print(f"Generated summary: \"{generated_summary}\"")
print(f"\nSimilarity score: {similarity_score:.4f}")

```