A great commentator shows excitement, emotion, and attention invested in the game. A poor commentator likely is emotionless and can decrease the fan experience of watching games.  In this notebook, we aim to grade excitement levels of the commentators in our data.

Instead of using the NLI-based zero-shot classifier, we can try a pretrained emotion classification model that is even fine-tuned on detecting emoptions. \

Pretrained models, such as `j-hartmann/emotion-english-distillroberta-base`, are finetuned to capture emotional moods such as happiness, sadness, fear, neglest, etc.  The model is specifically trained for emotion detection.  It's also 'distilled' meaning lighter weight than some comparable others.  We want to find commentator excitement and lack thereof to score the commentators.  A mundane broadcast is likely to decrease fan engagement and will result in a lower score.



In [None]:
# First, set Keras backend
import os
os.environ["KERAS_BACKEND"] = "tensorflow"

# Install required packages
!pip install -U transformers
!pip install torch
!pip install scipy  # for find_peaks

# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import pipeline
import re
from scipy.signal import find_peaks

# Install required packages (run these if needed)
!pip install -U transformers
!pip install -U tensorflow
!pip install -U torch
!pip install hf_xet
!pip install tf-keras



Let's add our data from our preproccessed notebook in this repo.

In [21]:
# load my data post-processed
data = pd.read_json("../dataset/preprocessed_data.json")

In [22]:
data.head()

Unnamed: 0,game_id,teams,transcript,year,tokens,doc_embedding,broadcaster
0,1962-houston_oilers-dallas_texans.txt,"[houston_oilers, dallas_texans]",gilson well defend the goal on your left theyl...,1962,"[gilson, well, defend, the, goal, on, your, le...","[0.0272845495, 0.0167274754, 0.0260243993, 0.0...",ABC
1,1969-chicago_bears-green_bay_packers.txt,"[chicago_bears, green_bay_packers]",cbs television sports presents the national fo...,1969,"[cbs, television, sports, presents, the, natio...","[0.0302205924, 0.014963325100000001, 0.0228471...",CBS
2,1969-cleveland_browns-minnesota_vikings-1.txt,"[cleveland_browns, minnesota_vikings]",the nfl today brought to you by the foundation...,1969,"[the, nfl, today, brought, to, you, by, the, f...","[0.027876755200000002, 0.0162593126, 0.0226585...",CBS
3,1969-cleveland_browns-minnesota_vikings.txt,"[cleveland_browns, minnesota_vikings]",the nfl today brought to you by the foundation...,1969,"[the, nfl, today, brought, to, you, by, the, f...","[0.028167814000000003, 0.016339412, 0.02250985...",CBS
4,1969-new_york_jets-baltimore_colts.txt,"[new_york_jets, baltimore_colts]",&gt;&gt; nbc sports presents the third nflafl ...,1969,"[gtgt, nbc, sports, presents, the, third, nfla...","[0.0310913976, 0.0153203607, 0.024078829200000...",NBC


In [52]:
emotion_classifier = pipeline(
    task="text-classification",
    model="j-hartmann/emotion-english-distilroberta-base",
    return_all_scores=True
)


Device set to use mps:0


In [49]:
def excite(row):
    """ detect excitement in the game transcript """
    try:
        labels = ['joy', 'surprise']
        transcript = str(row['transcript'])
        
        # Break into chunks
        chunk_size = 256
        chunks = [transcript[i:i+chunk_size] for i in range(0, len(transcript), chunk_size)]
        
        # Process these chunks
        emotion_scores = []
        for chunk in chunks:
            try:
                results = emotion_classifier(chunk)
                scores = results[0]
                excitement_score = sum(
                    score['score'] for score in scores 
                    if score['label'] in labels
                )
                emotion_scores.append(excitement_score)
            except Exception as e:
                continue

        # Calculate metrics
        avg_emotion = np.mean(emotion_scores) if emotion_scores else 0
        max_emotion = max(emotion_scores) if emotion_scores else 0
        
        # detect excitement peaks
        if len(emotion_scores) > 1:
            min_distance = max(1, len(emotion_scores)//10)
            peaks, _ = find_peaks(emotion_scores, distance=min_distance)
            excitement_moments = len(peaks)
        else:
            excitement_moments = 0
        
        # Calculate composite excitement score
        excitement_score = np.mean([
            avg_emotion * 0.4,
            max_emotion * 0.4,
            (excitement_moments/10) * 0.2
        ])
        
        return {
            'game_id': row['game_id'],
            'year': row['year'],
            'teams': row['teams'],
            'broadcaster': row['broadcaster'],
            'excitement_score': excitement_score,
            'emotion_score': avg_emotion,
            'max_emotion': max_emotion,
            'excitement_peaks': excitement_moments,
            'excitement_timeline': emotion_scores
        }
    
    except Exception as e:
        print(f"Error processing game {row['game_id']}: {str(e)}")
        return None


In [50]:
def get_excitement(data):
    """
    Process all games in the DataFrame
    """
    results = []
    
    print(f"Starting analysis of {len(data)} games...")
    
    for idx, row in data.iterrows():
        result = excite(row)  # Pass the row, not the entire dataframe
        if result is not None:
            results.append(result)
            
        # Print progress every 10 games
        if len(results) % 10 == 0:
            print(f"Processed {len(results)} games...")
    
    return pd.DataFrame(results)

Now that we have the function to add our excitement score from our pretrained sentiment analyzer, let's apply it to our data.

In [51]:
# Run the analysis
print("Analyzing excitement levels in broadcasts...")
excitement_results = get_excitement(data)

# Display the results
print(f"\nAnalyzed {len(excitement_results)} games")
print("\nTop 5 most exciting games:")
top_exciting = excitement_results.sort_values(by='excitement_score', ascending=False).head(5)
print(top_exciting[['game_id', 'teams', 'broadcaster', 'excitement_score']])

print("\nBottom 5 least exciting games:")
bottom_exciting = excitement_results.sort_values(by='excitement_score').head(5)
print(bottom_exciting[['game_id', 'teams', 'broadcaster', 'excitement_score']])

Analyzing excitement levels in broadcasts...
Starting analysis of 1455 games...


Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0


Processed 10 games...


Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0


Processed 20 games...


Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0
Device set to use mps:0


KeyboardInterrupt: 