# Analysis of quantitative market variables of US Equities around earnings seasons
### Authored by Sarang Balan under guidance of Jean-Phillipe Matois

This python notebook conducts a quantitative and qualitative analysis around the market variables of public US companies to discover, analyse and understand potential patterns in the markets during earning seasons

### Importing Libraries

In [196]:
### importing libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn as skl
import yfinance as yf
import html5lib
import datetime

from datetime import date
from datetime import datetime
from datetime import timedelta



### Collecting earnings dates

The yfinance library accesses earnings reports of public companies from the Yahoo Finance website. Note, this library and API access is not officially related to Yahoo, and the data is of acceptable quality for the purposes of this project. 

First step is to create a python function that takes in tickers and start/end date and outputs precise dates of the earnings 

In [82]:
def get_earnings_dates(ticker: str, start_date: str, end_date: str):
    """
    This function takes three arguments: stock ticker, start date, and end date - to return a list of datetime format
    strings of earning release dates within the given date period.
    
    Arguments:
        ticker (str): Ticker of the Equity
        start_date (str/datetime): Start date after which to bring up and save the dates of future releases
        end_date (str/datetime): End date before which all earnings release dates have to be saved
    
    returns:
        list_dates: list of dates of earning releases
    """
    
    stock = yf.Ticker(ticker)
    
    df = stock.earnings_dates
    start_date = datetime.strptime(start_date, '%Y-%m-%d')
    end_date = datetime.strptime(end_date, '%Y-%m-%d')
    df.index = df.index.tz_localize(None)
    
    mask = (df.index > start_date) & (df.index<= end_date)
    mydf = df.loc[mask]
    
    mylist = list(mydf.index)
    
    return mylist
    

In [83]:
mylist = get_earnings_dates(ticker='MSFT', start_date='2018-04-26', end_date='2022-04-26')


In [155]:
mylist

[Timestamp('2022-01-25 11:00:00'),
 Timestamp('2021-10-26 12:00:00'),
 Timestamp('2021-07-27 12:00:00'),
 Timestamp('2021-04-27 12:00:00'),
 Timestamp('2021-01-26 11:00:00'),
 Timestamp('2020-10-27 12:00:00'),
 Timestamp('2020-07-22 12:00:00'),
 Timestamp('2020-04-29 12:00:00'),
 Timestamp('2020-01-29 11:00:00'),
 Timestamp('2019-10-23 12:00:00'),
 Timestamp('2019-07-18 12:00:00'),
 Timestamp('2019-04-24 12:00:00'),
 Timestamp('2019-01-30 11:00:00'),
 Timestamp('2018-10-24 12:00:00'),
 Timestamp('2018-07-19 12:00:00'),
 Timestamp('2018-04-26 12:00:00')]

### Collecting stock prices

Now, given a list of dates for earning releases, we need to create a function that pulls in hourly stock prices from T - 2 trading days and T + 2  trading days, where T is the market closing time of the earnings release. 

In [194]:
def get_prices(ticker: str, t_date: datetime, find_percentage_change: bool=True):
    """
    Captures hourly prices 14 trading hours before market close (day of release) and 14 trading hours after next open
    
    Arguments:
        ticker (str): Ticker of the stock 
        t_date (datetime): Date of earnings release
        find_percentage_change (bool): Return list in terms of % change from first hour price recorded (T-2 days)
        
    return:
        pricelist: list of prices over 5 or 4 trading days
        timelist: list of times corresponding to pricelist items
        
    """
    
    #data = yf.download(ticker, start="2021-01-01")
    data = mydat
    mylist = []
    timelist = []
    T = t_date
    for delta in range(-2,3,1):
    
        Tnow = T + timedelta(days=delta)
        Tnow = Tnow.to_pydatetime().date().strftime('%Y-%m-%d')
    
        mydf = data.loc[Tnow]
        #print('printing df')
        #print(mydf)
    
        if mydf.empty and delta < 0:
            Tnow = T + timedelta(days=delta-2)
            Tnow = Tnow.to_pydatetime().date().strftime('%Y-%m-%d')
            mydf = data.loc[Tnow]
            #print(mydf)
            
        
        if mydf.empty and delta > 0:
            Tnow = T + timedelta(days=delta+2)
            Tnow = Tnow.to_pydatetime().date().strftime('%Y-%m-%d')
            mydf = data.loc[Tnow]
            #print(mydf)
            
            
        mylist.extend(list(mydf.Close))
        timelist.extend(list(mydf.index))
    if find_percentage_change:
        pricelist = [x / mylist[0] for x in mylist]
    
    return timelist, pricelist

In [195]:
get_prices('MSFT', T)

[Timestamp('2022-01-21 09:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-21 10:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-21 11:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-21 12:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-21 13:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-21 14:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-21 15:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-24 09:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-24 10:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-24 11:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-24 12:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-24 13:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-24 14:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-24 15:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-25 09:30:00-0500', tz='America/New_York'),
 Timestamp('2022-01-25 10:30:00-0500', tz='America/New_

In [1]:
data.loc['2021-01-04']

NameError: name 'data' is not defined