# Reactions in stock prices

Hypothesis Nov 16, 2022. 

## Hypothesis

Valuations of stock prices are an intricate network of companies offering complimentary and competitive goods. Thus, positive and negative news for one company (increasing their stock price) will lead to knock on effects for other company's stock prices.

This code is dedicated to testing whether there are time delays in the price changes (percentage change) of one stock and its effect on other stocks.

Disclaimer: This test will only be able to pick apart _association_, not causality. While strong patterns between stocks may emerge, I propose that rigorous testing will be necessary to confirm 

1. _why_ one stock may closely correlate to another,
2. whether confirming evidence is spread uniformly throughout time (also what time frame the association is expected to be valid for),
3. trends are not significantly defined by general macroeconomic trends.

## Goal

Identifying frequently occuring instances of percent changes in the "primary stock" for which the "related stock" can be bought and sold at a profit with minimal risk.

## Test setup

1. Cross matching initially all S&P500 stocks with one another.
2. Calculation of correlation coefficient between percent changes over intervals: 5m, 10m, 20m, 30m, 1h, 2h, 3h, 1d, 2d (open to revision). 
3. Prioritization of top n performers - expansion of correlation test to longer / shorter time frames.
4. Association bootstrap test.
__
5. Agreement of good success parameters (provided adequate hypotheses are found). 
6. Multiple days test run to confirm validity.

## Meeting Notes

- Take big jumps from company - look at other big jumps from other stocks with phase shift.
- Limit to well correlated companies? i.e. Pepsi and Coke.
- Maybe look into smaller companies.
- Compare sectors

## Libraries

In [2]:
#Get Data
import yfinance as yf
import bs4 as bs
import requests

#Visualize Data
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
import matplotlib.dates as mdates

#Manipulate Data
from numpy import *
from numpy.random import *
from datascience import *
import pandas as pd

#Other
from datetime import datetime
import time

#Style setup
plt.style.use('bmh')
%matplotlib inline

# Force display of all values INACTIVE
from IPython.core.interactiveshell import InteractiveShell
#InteractiveShell.ast_node_interactivity = "all"

import warnings
warnings.filterwarnings("ignore")

from tqdm import tqdm

## Essential functions

1. `historical(stock, period, interval, tm_val)` - GET yFinance ticker price, data preprocessing
2. `get_snp500()` - GET wikipedia s&p500 tickers
3. `percent_change(val_1, val_2)` - Get percentage change between two values
4. `standardize(arr_n)` - Standardize numerical array
5. `offset(stock_a, stock_b, period, interval, n)` - Get r for stock_a predicting stock_b by n intervals (disclaimer: try / except clause)

In [3]:
def parse_prefix(line, fmt):
    try:
        t = time.strptime(line, fmt)
    except ValueError as v:
        if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):
            line = line[:-(len(v.args[0]) - 26)]
            t = time.strptime(line, fmt)
        else:
            raise
    return t

def historical(stock, period, interval, tm_val):
    #valid TM_VAL: date, time
    # valid PERIOD: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
    # valid INTERVAL: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
    
    #Fetch data
    price_history = yf.download(tickers=stock, period=period, interval=interval, progress=False) # period: Smallest subdivision, interval: Time frame

    time_series = list(price_history['Open'])
    time_series_h = list(price_history['High'])
    time_series_l = list(price_history['Low'])
    
    #Format time
    dt_list = make_array()
    
    first_day = ""
    first = 0
    for dt in list(price_history.index):
        if first == 0:
            first_day = str(dt)
        first += 1
        if tm_val == "date":
            date = parse_prefix(str(dt), '%Y-%m-%d %H:%M:%S')
            day = "{}-{}-{}".format(date[0], date[1], date[2])
            dt_list = append(dt_list, day)
        elif tm_val == "time":
            date = parse_prefix(str(dt), '%Y-%m-%d %H:%M:%S')
            day = "{}:{}".format(date[3], date[4])
            dt_list = append(dt_list, day)
            
    history = Table().with_columns({"Date": dt_list, "Valuation": time_series, "High": time_series_h, "Low": time_series_l})
    
    return first_day, history

def get_snp500():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})

    tickers = []

    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)
    
    tickers = [s.replace('\n', '') for s in tickers]

    return tickers

def percent_change(val_1, val_2):
    return (val_2 - val_1) / val_1

def standardize(arr_n):
    avr_stock = average(arr_n)
    std_stock = std(arr_n)
    stdized_stock = (arr_n - avr_stock) / std_stock
    return stdized_stock

def stock_price_to_percent(dict_n):
    avr_stock_a = dict_n[1].column("Valuation")
    percentage_stock_a = make_array()
    for i in range(0, len(avr_stock_a) - 1): 
        percentage_stock_a = append(percentage_stock_a, percent_change(avr_stock_a[i], avr_stock_a[i + 1]))
    return percentage_stock_a
    
def offset(stock_a, stock_b, period, interval, n):
    
    # Get data
    try:
        stock_a_hist = historical(stock_a, period, interval, "date")
        stock_b_hist = historical(stock_b, period, interval, "date")

        # Convert to percentage & standardize
        stock_a_perc = stock_price_to_percent(stock_a_hist)
        stock_b_perc = stock_price_to_percent(stock_b_hist)
    
        # Offset by n units
        stock_a_offset = stock_a_perc[:len(stock_a_perc)-n]
        stock_b_offset = stock_b_perc[n:]

        stock_a_std = standardize(stock_a_offset)
        stock_b_std = standardize(stock_b_offset)
    
        stock_compared = Table().with_columns(stock_a, stock_a_std, stock_b, stock_b_std)
    
        # stock_compared.scatter(stock_a)
    
        r = mean(stock_a_std * stock_b_std)
        return r
    
    except:
        return "nan"

### 1 Day correlation

In [5]:
short_tick = get_snp500()[0:2]

def cross_correlate(stocks, interval, test_length, offset_interval):

    all_correlations = Table(make_array("Stock", "Predicting", "r"))

    for i in tqdm(short_tick):
        for j in short_tick:
            all_correlations = all_correlations.with_row([i, j, offset(i, j, test_length, interval, offset_interval)])
    return all_correlations

In [None]:
cross_correlate(short_tick, "1d", "1y", 1).show()