# Momentum Shift Score: Quantifying Career-Altering Moments in Major League Baseball

### Objectives:
This analysis focuses on identifying and evaluating pivotal moments in baseball games—specifically, events involving a pitched ball that result in the greatest changes in Win Probability Added (WPA). The goal is to assess how these moments influence the career trajectories of both the pitcher and the hitter. By quantifying this phenomenon into a single metric, we aim to provide a comprehensive narrative of how critical moments can shape the careers of the players involved.

### Table of Contents
| **Topic** | **Page** |
| --------- | -------- |
| **Analysis 1** : Pete Alonso HR off Devin Williams | 1 | 
| **Sentiment Analysis 1** | 2 |
| **Finding Pivotal Moments** | 3 |

### Definitions:
* A "moment" for which a Momentum Shift Score (MSS) can be assigned is a single pitch with an outcome. So a moment is all of a pithces attributes that are obtained from a Baseball Savant query

# Introduction

Baseball, often referred to as America’s pastime, is a sport deeply rooted in tradition and statistical analysis. Over the years, the advent of advanced metrics has revolutionized the way the game is understood, evaluated, and appreciated. Among these metrics, Win Probability Added (WPA) has emerged as a powerful tool to quantify the impact of individual plays on the outcome of a game. However, while WPA provides a game-level perspective, its implications on the career trajectories of players remain largely unexplored.

This project seeks to bridge that gap by introducing the concept of the Momentum Shift Score (MSS). The MSS is designed to quantify career-altering moments in Major League Baseball (MLB), focusing on pivotal events that result in significant changes in WPA. By analyzing these moments, we aim to uncover their influence on the long-term narratives of the players involved—both pitchers and hitters.

Through a combination of data-driven analysis and contextual storytelling, this project will highlight how singular moments, such as a critical home run or a game-changing strikeout, can define or redefine a player’s career. By leveraging tools like Statcast data, sentiment analysis, and historical context, we aim to provide a comprehensive framework for understanding the intersection of performance, pressure, and legacy in professional baseball. 

Ultimately, this research aspires to contribute to the broader discourse on sports analytics, offering a novel perspective on how individual moments resonate beyond the confines of a single game, shaping the legacies of players and the narratives of the sport itself.

In [1]:
import pandas as pd # for data manipulation
import numpy as np # for numerical operations
import matplotlib.pyplot as plt # for plotting
import seaborn as sns # for data visualization
from sklearn.linear_model import LinearRegression # for regression analysis
# from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer # for sentiment analysis
from newspaper import Article # for web scraping
import datetime 
import requests # for web requests
import pybaseball as pyb # for baseball data
# custom functions and utils (create mss_helpers.py later)


In [2]:
# Load data

# Load the statcast data for 2015-2024
data_15_24 = pd.read_csv('data/statcast_2015-2024.csv')
# load player ID spreadsheet
mlbIDs = pd.read_csv('data/mlbIDs.csv')

# Only need name and ID associated with the player
mlbIDs = mlbIDs[['PLAYERNAME', 'MLBID']]
mlbIDs['MLBID'] = mlbIDs['MLBID'].fillna(0).astype(int)

  data_15_24 = pd.read_csv('data/statcast_2015-2024.csv')


In [9]:
# function to clean statcast data
# creates batter name column: merges the ID with the player name
# renames the columns
# drops unnecessary ID column
def clean_statcast_data(df):
    df = df.merge(mlbIDs, left_on='batter', right_on='MLBID', how='left')
    df.rename(columns={'PLAYERNAME': 'batter_name'}, inplace=True)
    df.drop(columns=['MLBID'], inplace=True)
    df.rename(columns={'player_name': 'pitcher_name'}, inplace=True)
    return df

In [11]:
# clean the statcast data
data_15_24 = clean_statcast_data(data_15_24)

In [12]:
# find baseball moments in the last 10 seasons with the highest change in win probability
highest_wpa = data_15_24.loc[data_15_24['delta_home_win_exp'].abs() >= 0.5]
highest_wpa = highest_wpa[['game_date', 'pitcher_name', 'batter_name', 'events', 'delta_home_win_exp']]
highest_wpa['delta_home_win_exp_abs'] = highest_wpa['delta_home_win_exp'].abs()
highest_wpa.head()

Unnamed: 0,game_date,pitcher_name,batter_name,events,delta_home_win_exp,delta_home_win_exp_abs
1334,2024-10-25,"Cortes, Nestor",Freddie Freeman,home_run,0.729,0.729
10626,2024-10-03,"Williams, Devin",Pete Alonso,home_run,-0.661,0.661
13124,2024-09-30,"Johnson, Pierce",Francisco Lindor,home_run,-0.657,0.657
13137,2024-09-30,"Díaz, Edwin",Ozzie Albies,double,0.643,0.643
19722,2024-09-28,"Miller, Ryan",,single,-0.584,0.584


# Analysis 1
## Pete Alonso HR off of Devin Williams
### 2024 NL Wild Card - October 3

Pull data from games played on October 3, 2024 by the Milwalkee Brewers

In [6]:
import pybaseball as pb

mets_brewers_wc = pb.statcast('2024-10-03', team='MIL')
mets_brewers_wc.head()

This is a large query, it may take a moment to complete


100%|██████████| 1/1 [00:00<00:00,  2.02it/s]


Unnamed: 0,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,player_name,batter,pitcher,events,description,...,n_thruorder_pitcher,n_priorpa_thisgame_player_at_bat,pitcher_days_since_prev_game,batter_days_since_prev_game,pitcher_days_until_next_game,batter_days_until_next_game,api_break_z_with_gravity,api_break_x_arm,api_break_x_batter_in,arm_angle
130,SL,2024-10-03,87.3,-2.71,5.53,"Ross, Joe",621438,605452,field_out,hit_into_play,...,1,3,1,1,,2,2.69,-0.55,-0.55,33.1
33,FF,2024-10-03,94.3,-2.23,5.26,"Williams, Devin",516782,642207,single,hit_into_play,...,1,3,1,1,,2,1.2,0.86,0.86,28.5
34,CH,2024-10-03,86.3,-2.58,5.01,"Williams, Devin",516782,642207,,swinging_strike,...,1,3,1,1,,2,3.2,1.67,1.67,18.9
36,CH,2024-10-03,86.4,-2.61,5.05,"Williams, Devin",516782,642207,,swinging_strike,...,1,3,1,1,,2,3.22,1.74,1.74,15.9
38,CH,2024-10-03,87.5,-2.49,5.06,"Williams, Devin",516782,642207,,ball,...,1,3,1,1,,2,2.82,1.57,1.57,19.4


### Find Pete Alonso's ID

A simple eye search is all that was needed to find the infamous homer on this day.
After finding the only HR hit off of Devin Williams we conclude Alonso's ID is as follows.

In [7]:
pete_alonso_id = 624413

### Filter Alonso's at bat vs. Devin Williams

Filter the game data by pitcher name and batter ID to narrow the data frame to the infamous home run at bat

In [8]:
alonso_williams_ab = mets_brewers_wc[
    (mets_brewers_wc['player_name'] == 'Williams, Devin') & 
    (mets_brewers_wc['batter'] == pete_alonso_id)
]
alonso_williams_ab

Unnamed: 0,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,player_name,batter,pitcher,events,description,...,n_thruorder_pitcher,n_priorpa_thisgame_player_at_bat,pitcher_days_since_prev_game,batter_days_since_prev_game,pitcher_days_until_next_game,batter_days_until_next_game,api_break_z_with_gravity,api_break_x_arm,api_break_x_batter_in,arm_angle
60,CH,2024-10-03,86.1,-2.56,5.15,"Williams, Devin",624413,642207,home_run,hit_into_play,...,1,3,1,1,,2,3.25,1.68,1.68,19.9
62,CH,2024-10-03,86.5,-2.61,5.05,"Williams, Devin",624413,642207,,ball,...,1,3,1,1,,2,3.13,1.74,1.74,17.8
64,FF,2024-10-03,94.0,-2.3,5.35,"Williams, Devin",624413,642207,,ball,...,1,3,1,1,,2,1.03,0.74,0.74,31.2
67,FF,2024-10-03,93.3,-2.33,5.29,"Williams, Devin",624413,642207,,ball,...,1,3,1,1,,2,1.26,0.95,0.95,29.2
68,CH,2024-10-03,87.1,-2.59,5.15,"Williams, Devin",624413,642207,,called_strike,...,1,3,1,1,,2,3.13,1.7,1.7,18.9


# Sentiment Analysis for Alonso HR

In [9]:
from newspaper import Article
from serpapi import GoogleSearch

# Use SerpAPI to search for articles
def search_articles(query, num_results=5):
    params = {
        "q": query,
        "num": num_results,
        "api_key": "df17a3ca952b8b0db1eb0612a7f5f97474d4b75ad20b335c818f78de9eeff3a2"  # Replace with your SerpAPI key
    }
    search = GoogleSearch(params)
    results = search.get_dict()
    links = [result['link'] for result in results.get('organic_results', [])]
    return links

# Use Newspaper3k to extract article content
def extract_article(url):
    article = Article(url)
    article.download()
    article.parse()
    return article.title, article.text

# Demo
query = "Jose ALtuve walk-off home run against Aroldis Chapman interview OR article"
urls = search_articles(query)
if not urls:
    print("No articles found.")
else:
    for url in urls:
        try:
            title, content = extract_article(url)
            print(f"Title: {title}\n{content[:500]}...\n")
        except Exception as e:
            print(f"Failed to extract article from {url}: {e}")


Title: Aroldis Chapman: Jose Altuve's Actions After Walk-off vs. Yankees 'Suspicious'
As he rounded third base and approached home plate following his walk-off home run in the American League Championship Series, Astros star Jose Altuve gestured to his teammates not to rip off his jersey in the celebration.

Chapman commented on the matter.

"I've seen that video—a lot of people have seen that video; it's a popular video right now," he said, per ESPN's Buster Olney. "And yeah, if you look at his actions, they look a little suspicious. At the end of the day, I just don't know."

Y...

Title: Aroldis Chapman: José Altuve's ALCS Home Run Celebration 'A Little Suspicious'
Aroldis Chapman responds to Jim Crane's comments that the Astros stealing signs didn't have an impact on the game:



"I disagree with that...when you have an advantage like that, it's definitely going to make you a stronger team" pic.twitter.com/6YW3Q6jVFr...

Title: 'I couldn't believe it': Chapman on walk-off HR
Since 

In [10]:
from serpapi import GoogleSearch

params = {
  "q": "Cristiano Ronaldo",
  "location": "Austin,Texas,United States",
  "hl": "en",
  "gl": "us",
  "api_key": "df17a3ca952b8b0db1eb0612a7f5f97474d4b75ad20b335c818f78de9eeff3a2"
}

search = GoogleSearch(params)
results = search.get_dict()
twitter_results = results["twitter_results"]

KeyError: 'twitter_results'