# Momentum Shift Score: Quantifying Career-Altering Moments in Major League Baseball

### Objectives:
This analysis focuses on identifying and evaluating pivotal moments in baseball games—specifically, events involving a pitched ball that result in the greatest changes in Win Probability Added (WPA). The goal is to assess how these moments influence the career trajectories of both the pitcher and the hitter. By quantifying this phenomenon into a single metric, we aim to provide a comprehensive narrative of how critical moments can shape the careers of the players involved.

### Table of Contents
| **Topic** | **Page** |
| --------- | -------- |
| **Analysis 1** : Pete Alonso HR off Devin Williams | 1 | 
| **Sentiment Analysis 1** | 2 |
| **Finding Pivotal Moments** | 3 |

### Definitions:
* A "moment" for which a Momentum Shift Score (MSS) can be assigned is a single pitch with an outcome. So a moment is all of a pithces attributes that are obtained from a Baseball Savant query

# Introduction

Baseball, often referred to as America’s pastime, is a sport deeply rooted in tradition and statistical analysis. Over the years, the advent of advanced metrics has revolutionized the way the game is understood, evaluated, and appreciated. Among these metrics, Win Probability Added (WPA) has emerged as a powerful tool to quantify the impact of individual plays on the outcome of a game. However, while WPA provides a game-level perspective, its implications on the career trajectories of players remain largely unexplored.

This project seeks to bridge that gap by introducing the concept of the Momentum Shift Score (MSS). The MSS is designed to quantify career-altering moments in Major League Baseball (MLB), focusing on pivotal events that result in significant changes in WPA. By analyzing these moments, we aim to uncover their influence on the long-term narratives of the players involved—both pitchers and hitters.

Through a combination of data-driven analysis and contextual storytelling, this project will highlight how singular moments, such as a critical home run or a game-changing strikeout, can define or redefine a player’s career. By leveraging tools like Statcast data, sentiment analysis, and historical context, we aim to provide a comprehensive framework for understanding the intersection of performance, pressure, and legacy in professional baseball. 

Ultimately, this research aspires to contribute to the broader discourse on sports analytics, offering a novel perspective on how individual moments resonate beyond the confines of a single game, shaping the legacies of players and the narratives of the sport itself.

In [4]:
import pandas as pd # for data manipulation
import numpy as np # for numerical operations
import matplotlib.pyplot as plt # for plotting
import seaborn as sns # for data visualization
from sklearn.linear_model import LinearRegression # for regression analysis
# from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer # for sentiment analysis
#from newspaper3k import Article # for web scraping
import datetime 
import requests # for web requests
import pybaseball as pyb # for baseball data
import plotly.graph_objects as go # for interactive plots
# custom functions and utils (create mss_helpers.py later)


In [5]:
# load player ID spreadsheet
mlbIDs = pd.read_csv('data/mlbIDs.csv')

# Only need name and ID associated with the player
mlbIDs = mlbIDs[['PLAYERNAME', 'MLBID']]
mlbIDs['MLBID'] = mlbIDs['MLBID'].fillna(0).astype(int)

In [6]:
# USE ANYTIME DATA IS SCRAPED USING pybaseball.statcast()
# 
# function to clean statcast data
# creates batter name column: merges the ID with the player name
# renames the columns
# drops unnecessary ID column
def clean_statcast_data(df):
    df = df.merge(mlbIDs, left_on='batter', right_on='MLBID', how='left')
    df.rename(columns={'PLAYERNAME': 'batter_name'}, inplace=True)
    df.drop(columns=['MLBID'], inplace=True)
    df.rename(columns={'player_name': 'pitcher_name'}, inplace=True)
    return df

```python
# Load data

# Scrape the statcast data for 2015-2024
data_15_24 = stacast('2015-01-01', '2024-12-31')

# find moments where the win expectancy for either ream changed by 0.5 or more
pivotal_moment_data = data_15_24.loc[data_15_24['delta_home_win_exp'].abs() >= 0.5]

# clean the resulting data
pivotal_moment_data = clean_statcast_data(pivotal_moment_data)

# Create delta_home_win_exp_abs column before reordering
pivotal_moment_data['delta_home_win_exp_abs'] = pivotal_moment_data['delta_home_win_exp'].abs()

# Reorder columns to place delta_win_exp columns first and batter_name next to pitcher_name
cols = ['delta_home_win_exp', 'delta_home_win_exp_abs', 'pitcher_name', 'batter_name'] + \
    [col for col in pivotal_moment_data.columns if col not in ['delta_home_win_exp', 'delta_home_win_exp_abs', 'pitcher_name', 'batter_name']]

# Reorder the DataFrame
pivotal_moment_data = pivotal_moment_data[cols]

# Save the cleaned data to a new CSV file
pivotal_moment_data.to_csv('data/pivotal_moment_data.csv', index=False)
```

In [7]:
# Load the cleaned pivotal moment data from saved CSV

# Contains every pitch that resulted in a change in win expectancy by at least 50% from 2015-2024
# i.e. a "pivotal moment"
pivotal_moment_data = pd.read_csv('data/pivotal_moment_data.csv')


In [8]:
# gets the pivotal moment data for a specific date and team and optional player inputs
def get_pivotal_moment(date=None, team=None, pitcher=None, batter=None):
    moment_data = pivotal_moment_data.copy()

    if date:
        moment_data = pivotal_moment_data[
            (pivotal_moment_data['game_date'] == date)
            ]
        
    if team:
        moment_data = moment_data[
            ((moment_data['home_team'] == team) | 
             (moment_data['away_team'] == team))
            ]

    if pitcher:
        moment_data = moment_data[
            (moment_data['pitcher_name'] == pitcher)
            ]
    if batter:
        moment_data = moment_data[
            (moment_data['batter_name'] == batter)
            ]
    
    return moment_data

# Analysis 1
## Pete Alonso HR off of Devin Williams
### 2024 NL Wild Card - October 3

Pull data from games played on October 3, 2024 by the Milwalkee Brewers

In [9]:
al_wil_date = datetime.datetime(2024, 10, 3)
alonso_williams_hr = get_pivotal_moment(team='NYM')
alonso_williams_hr.head()


Unnamed: 0,delta_home_win_exp,delta_home_win_exp_abs,pitcher_name,batter_name,pitch_type,game_date,release_speed,release_pos_x,release_pos_z,batter,...,n_thruorder_pitcher,n_priorpa_thisgame_player_at_bat,pitcher_days_since_prev_game,batter_days_since_prev_game,pitcher_days_until_next_game,batter_days_until_next_game,api_break_z_with_gravity,api_break_x_arm,api_break_x_batter_in,arm_angle
1,-0.661,0.661,"Williams, Devin",Pete Alonso,CH,2024-10-03,86.1,-2.56,5.15,624413,...,1,3,1.0,1.0,,2.0,3.25,1.68,1.68,19.9
2,-0.657,0.657,"Johnson, Pierce",Francisco Lindor,CU,2024-09-30,86.4,-2.31,6.04,596019,...,1,4,2.0,1.0,2.0,1.0,3.21,-0.84,0.84,39.4
3,0.643,0.643,"Díaz, Edwin",Ozzie Albies,FF,2024-09-30,96.5,-2.15,4.95,645277,...,1,4,1.0,1.0,3.0,0.0,1.18,0.88,0.88,20.1
15,0.614,0.614,"Díaz, Edwin",Corbin Carroll,SL,2024-08-28,90.1,-2.1,5.0,682998,...,1,4,3.0,1.0,1.0,1.0,2.6,-0.43,0.43,23.0
29,0.558,0.558,"Brazobán, Huascar",Zach Neto,FC,2024-08-03,91.1,-1.45,5.82,687263,...,1,3,3.0,1.0,3.0,1.0,2.03,-0.04,-0.04,42.4


# Sentiment Analysis for Alonso HR

In [None]:
from newspaper import Article
from serpapi import GoogleSearch

# Use SerpAPI to search for articles
def search_articles(query, num_results=5):
    params = {
        "q": query,
        "num": num_results,
        "api_key": "df17a3ca952b8b0db1eb0612a7f5f97474d4b75ad20b335c818f78de9eeff3a2"  # Replace with your SerpAPI key
    }
    search = GoogleSearch(params)
    results = search.get_dict()
    links = [result['link'] for result in results.get('organic_results', [])]
    return links

# Use Newspaper3k to extract article content
def extract_article(url):
    article = Article(url)
    article.download()
    article.parse()
    return article.title, article.text

# Demo
query = "Jose ALtuve walk-off home run against Aroldis Chapman interview OR article"
urls = search_articles(query)
if not urls:
    print("No articles found.")
else:
    for url in urls:
        try:
            title, content = extract_article(url)
            print(f"Title: {title}\n{content[:500]}...\n")
        except Exception as e:
            print(f"Failed to extract article from {url}: {e}")


In [None]:
from serpapi import GoogleSearch

params = {
  "q": "Cristiano Ronaldo",
  "location": "Austin,Texas,United States",
  "hl": "en",
  "gl": "us",
  "api_key": "df17a3ca952b8b0db1eb0612a7f5f97474d4b75ad20b335c818f78de9eeff3a2"
}

search = GoogleSearch(params)
results = search.get_dict()
twitter_results = results["twitter_results"]