<a href="https://colab.research.google.com/github/thowley1207/capstone_project/blob/05/05_generate_event_study_abnormal_returns_and_car.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install wrds w/no dependencies bc we don't query wrds in this script
# Initializer still needs wrds lib installed
# However, installing its dependencies causes numpy version to be
#   incompatible with the required version to run regression
!pip install --no-dependencies wrds

!wget https://raw.githubusercontent.com/thowley0824/capstone/main/colab_initialization/initializer.py

import json
import pandas as pd
import pathlib
import matplotlib
import numpy as np
import requests
import statsmodels.api as sm
import zipfile

from matplotlib import pyplot as plt

import initializer
initializer.initialize_colab()

In [None]:
'''
SET DATA SUBDIRECTORIES AND FORM TYPE PREFIX
WHEN APPLICABLE, THIS FORM TYPE PREFIX WILL BE USED MOVING FORWARD
'''

returns_data_subdir = 'data/event_study/returns/'
results_data_subdir = 'data/event_study/results/'
file_prefix = '8k_'

'''
ADDITIONAL FILE NAMES CARRIED DOWN FROM PRIOR WORK
'''

event_study_ret_data_file_names = [
    'event_study_ret_data_pt_1.pkl',
    'event_study_ret_data_pt_2.pkl',
    'event_study_ret_data_pt_3.pkl',
    'event_study_ret_data_pt_4.pkl']

est_win_data_file_names = [
    'est_win_data_pt_1.pkl',
    'est_win_data_pt_2.pkl',
    'est_win_data_pt_3.pkl',
    'est_win_data_pt_4.pkl']

'''
NEW FILE NAMES FOR USE BELOW
'''

regression_params_file_name = 'regression_params.pkl'
event_window_data_file_name = 'event_window_data.pkl'
event_car_data_file_name = 'event_car_data.pkl'

**Define a helper function for use in fitting each event's estimation period OLS regression model:**

        generate_event_regression_data(estimation_window_df, id)

* **Function input parameters are:**
        # The dataframe containing combined return data across all events within each respective event's event window:
            estimation_window_df

        # Event_id of the individual event you are generating regression data for:
            id

* **Function steps through the following sequence:**
    * Creates a dictionary **event_regression_dict** containing a single entry:
            event_regression_dict['event_id'] = id
    * Creates a dataframe **event_est_win_data** from **estimation_window_df** containing only the data where the event_id = id
            estimation_window_df[estimation_window_df['event_id'] == event].copy()
    * Create vector **X** equal to the dataframe's market return column and add a constant
            X = event_est_win_data[['mkt_return']]
            X = sm.add_constant(X)
    * Create vector **y** from the dataframe's security return column
            y = event_est_win_data[['sec_return']]

    * Use the stats_models OLS function to create **mod**, an object equal to the OLS regression model object using X and y
    * Create **est**, an object containing the results of the fit regression model
            mod = sm.OLS(y,X)
            est = mod.fit()
    * Using the attributes of the fitted model object **est**, add the following values to the **event_regression_dict**:

            # The intercept of the fitted OLS model
            event_regression_dict['alpha'] = est.params['const']
            
            # The slope coefficient of the fitted OLS model
            event_regression_dict['beta'] = est.params['mkt_return']

            # The standard error of the fitted model's residuals
            event_regression_dict['resid_std_error']** = np.sqrt(est.mse_resid)

* **Function returns:**
        # The previously created dict containing the event id & relevant regression outputs
        event_regression_dict

In [None]:
def generate_event_regression_data(estimation_window_df,
                                   event):

    event_regression_dict = {'event_id':event}

    event_est_win_data = estimation_window_df[estimation_window_df[
            'event_id'] == event].copy()

    X = event_est_win_data[['mkt_return']]
    y = event_est_win_data[['sec_return']]
    X = sm.add_constant(X)

    mod = sm.OLS(y,X)
    est = mod.fit()

    event_regression_dict['alpha'] = est.params['const']
    event_regression_dict['beta'] = est.params['mkt_return']
    event_regression_dict['resid_std_error'] = np.sqrt(est.mse_resid)

    return event_regression_dict

 **Step 1: Generate Regression Parameters For Each Event**



**Load the full event study return data into a dataframe and filter to create a dataframe containing only data required to generate the fitted OLS model for each event**

* Data is read from the file **event_study_ret_data.pkl** created earlier

**Filter event_study_return_data to create a dataframe containing only data required to generate the fitted OLS model for each event**

* The filtered dataframe **est_win_data** is created using the **est_per_flag** constructed earlier to retain only data within the estimation period for each individual event (estimation period is the trailing 1 year data ex trailing 1 quarter period relative to the respective event date)
* Drop all columns in est_win_data not required for fitting the regression model; these are:
    * **event_id**, for future use as a unique identifier of individual events / a key for joining to other data
    * **mkt_return** the daily index return data
    * **sec_return** the daily individual security return data

** Create a dataframe containing regresssion data for all event study events**

* Each row in this dataframe **regression_params** contains the output from calling the **generate_event_regression_data** function defined above on an individual event from the full set of events
* The dataframe is created using Pandas **from_dict** constructor
    * Constructor is passed a list_comprehension, with each item in the list defined as an individual call of the **generate_event_regression_data** function
    * This list comprehension defines a list containing single-row dicts containing the regression data for each event in the event study

In [None]:
regression_params_lst = []
for i in range(4):

    est_win_data = pd.read_pickle((
        returns_data_subdir +
        file_prefix +
        est_win_data_file_names[i]
        ))

    regression_params = pd.DataFrame.from_dict([
        generate_event_regression_data(estimation_window_df = est_win_data,
                                       event = id
                                      ) for id in est_win_data['event_id'
                                      ].unique()])

    regression_params_lst.append(regression_params)

    print(f'''Regression parameters part {i} generated.
    Output added to regression parameters list.''')

In [None]:
regression_params = pd.concat(regression_params_lst)

regression_params.to_pickle((
        results_data_subdir +
        file_prefix +
        regression_params_file_name
        ))

print(f'''Combined regression parameters generated.
Output written as {regression_params_file_name}''')

 **Step 2:**

* **Create Event Study Event Window Return Data With Regression Results**
    * Read in each event study return data file
    * Filter this event study return data to only include the event window
    * Merge this to the regression parameters generated above
    * Use the actual event study returns and the regression parameters to determine abnormal returns for each event window


In [None]:
event_window_data_lst = []

for i in range(4):

    event_study_ret_data = pd.read_pickle((
        returns_data_subdir +
        file_prefix +
        event_study_ret_data_file_names[i]
        ))

    event_window_data = event_study_ret_data[
        event_study_ret_data['event_wind_flag'] == 1]

    event_window_data = event_window_data.merge(
        regression_params,
        how = 'left',
        on='event_id')

    event_window_data['abnormal_return'] = event_window_data[
        'sec_return'] - (
            event_window_data['alpha'] + (
            event_window_data['beta'] * event_window_data['mkt_return'])
            )

    event_window_data['relative_date'] = event_window_data.groupby(
        'event_id').cumcount()-5

    event_window_data_lst.append(event_window_data)

    print(f'''Event window data part {i+1} generated.
    Output added to event window data list.''')



In [None]:
event_window_data = pd.concat(event_window_data_lst)

event_window_data.to_pickle((
        results_data_subdir +
        file_prefix +
        event_window_data_file_name
        ))

print(f'''Combined event window data generated.
Output written as {event_window_data_file_name}.''')

 **Step 3:**

* **Calculate Event Study CAR Data**
    * Define all the possible CAR window ranges for use in calculation
    * Using the date relative to the event and the corresponding abnormal returns from the event window data, calculate the CAR for every possible CAR window
    * Additionally, calculate the standardized CAR results by dividing each CAR value by the events residual standard error from the regression generated using the event's respective event window
    * We will run the LLM fine tuning using both raw and standardized CAR results, to see if standardizing CAR leads to more or less accurate classification
    * We are doing this since we are using events across a large time range, and the variance in returns may be different in different periods within this range


In [None]:
event_abnormal_returns = event_window_data[['event_id',
                                            'relative_date',
                                            'abnormal_return']]

event_car_data = event_window_data[['event_id',
                                    'resid_std_error']
                                   ].drop_duplicates()

car_window_ranges = [(-5,5), (-4,4), (-3,3),
                     (-2,2), (-1,1), (-1,2),
                     (-1,3), (-1,4), (-1,5)]

for window in car_window_ranges:

    car_col = f'car_{window[0]}_{window[1]}'
    scar_col = f's{car_col}'
    num_days = window[1] - window[0] + 1

    car_window = range(window[0], window[1]+1)

    ret_window = event_abnormal_returns[
        ['event_id',
         'abnormal_return']
        ][event_abnormal_returns['relative_date'].isin(car_window)]

    car = ret_window.groupby(
        'event_id').sum().reset_index().rename(
            columns = {'abnormal_return':car_col})

    event_car_data = event_car_data.merge(car,
                                          how = 'left',
                                          on='event_id')

    event_car_data[scar_col] = event_car_data[car_col].div(
        (np.sqrt(num_days)*event_car_data['resid_std_error'])
        )

In [None]:
event_car_data.to_pickle((
    results_data_subdir +
    file_prefix +
    event_car_data_file_name
    ))

print(f'''Event CAR data generated.
Output written as {event_car_data_file_name}.''')