## On which weekday should we purchace VTI (Vanguard Total Stock Market ETF)?

For each weekday defined by weekday 0 = Monday, ..., weekday 4 = Friday, compare the average prices of VTI and find out on which day we should purchase index funds if we want to avoid checking its price frequently or potentially for automation. 

### Sampling:
1. Collect data of 30 weeks (30 Mondays, Tuesdays, Wednessday, Thursdays and Fridays) assuming that there is no bias and then apply test statistics to see what we get.  

2. Based on the assumption that election results affect the stock market a lot, use data prior to the democrat's primary election and then apply test statistics to see what I get (~ Jan 31st, 2020).

3. The news of coronavirus might have a huge effect on the stock prices as well so if that seems to be the case, I will think about a way to mitigate its effect: probably by using the data from last year (~ Nov 29th, 2019).

**To make the sample unbiased, I will remove weeks that include holidays during the week, because, for example, if Monday is a holiday, then Tuesday might kind of function as Monday and this can be considered as a bias in the sample.**

Since taking average is difficult given the availability of data, define average in two ways:
1. The (max + min) / 2
2. Closing value**

In [15]:
import pandas as pd

In [135]:
# utility functions

def days(year: int, month: int):
    days = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]

    if year % 4 == 0 and month == 2:
        return 29
    else:
        return days[month - 1]

def eval_date(date: str):
    return list(map(int, date.split("-")))

def is_sequence(dates):
    n = dates[0][2]
    for i in range(4):
        c_date = dates[i]
        if c_date[2] != n:
            return False
        if c_date[2] == 1:
            month, year = 0, 0
            if c_date[1] != 1:
                month = c_date[1] - 1
                year = c_date[0]
            else:
                month = 12
                year = c_date[0] - 1
            n = days(year, month)
        else:
            n = c_date[2] - 1
            
    return dates[4][2] == n

def collect_data(mem, vals):
    [f, h, w, t, m] = vals
    mem[0].append(m)
    mem[1].append(t)
    mem[2].append(w)
    mem[3].append(h)
    mem[4].append(f)
    return mem

def avg_1(data):
    return list(map(lambda weekday : sum(map(lambda x : (x[2] + x[3]) / 2, weekday)) / len(weekday), data))

def avg_2(data):
    return list(map(lambda weekday : sum(map(lambda x : x[4], weekday)) / len(weekday), data))

### Hypothesis 1:
Let u0 = Average price of VTI on Mondays, u1 = Average price of VTI on Tuesdays, …).
1. H0: u0 = u1 = u2 = u3 = u4
2. H1: u's are different

## Sample 1:
30 weeks (30 Mondays, Tuesdays, Wednessday, Thursdays and Fridays) until 3/6/2020

In [136]:
vtis = pd.read_csv('VTI.csv', usecols=['Date','Open','High','Low','Close'])
vtis['Date'] = vtis['Date'].apply(eval_date)
vtis = vtis.iloc[::-1]

data = [[], [], [], [], []]

# remove weeks with holidays
while len(data[0]) < 30:
    if is_sequence(list(vtis.head(5)['Date'])):
        data = collect_data(data, vtis.head(5).values)
        vtis = vtis[5:]
    else:
        vtis = vtis[1:]

### 1. average = (max + min) / 2

In [137]:
avg_1(data)

[155.44950028333332,
 155.5116669166667,
 155.40433393333336,
 155.51316656666665,
 155.42449923333334]

### 2. average = closing values

In [138]:
avg_2(data)

[155.58700050000002,
 155.30366670000004,
 155.5786672,
 155.58900086666668,
 155.53233336666668]

## Sample 2:
30 weeks prior to the democrat's primary (~ Jan 31st, 2020).

In [139]:
vtis = pd.read_csv('VTI.csv', usecols=['Date','Open','High','Low','Close'])
vtis['Date'] = vtis['Date'].apply(eval_date)
vtis = vtis.iloc[::-1]

while list(vtis.head(1)['Date']) != [[2020, 1, 31]]:
    vtis = vtis[1:]

data = [[], [], [], [], []]

# remove weeks with holidays
while len(data[0]) < 30:
    if is_sequence(list(vtis.head(5)['Date'])):
        data = collect_data(data, vtis.head(5).values)
        vtis = vtis[5:]
    else:
        vtis = vtis[1:]

### 1. average = (max + min) / 2

In [141]:
avg_1(data)

[153.08966698333333,
 153.18916681666667,
 153.07583403333334,
 153.43566689999997,
 153.82699938333332]

### 2. average = closing values

In [142]:
avg_2(data)

[153.0713338,
 153.18899993333332,
 153.2286672,
 153.66666709999996,
 153.80766653333333]

## Sample 3:
Before the outbreak of coronavirus (~ Nov 29th, 2019).

In [145]:
vtis = pd.read_csv('VTI.csv', usecols=['Date','Open','High','Low','Close'])
vtis['Date'] = vtis['Date'].apply(eval_date)
vtis = vtis.iloc[::-1]

while list(vtis.head(1)['Date']) != [[2019, 11, 29]]:
    vtis = vtis[1:]

data = [[], [], [], [], []]

# remove weeks with holidays
while len(data[0]) < 30:
    if is_sequence(list(vtis.head(5)['Date'])):
        data = collect_data(data, vtis.head(5).values)
        vtis = vtis[5:]
    else:
        vtis = vtis[1:]

### 1. average = (max + min) / 2

In [146]:
avg_1(data)

[150.0583333166667,
 150.16183323333334,
 149.99883434999998,
 150.2390004166667,
 150.63216655000002]

### 2. average = closing values

In [147]:
avg_2(data)

[150.08533376666665,
 150.17699993333335,
 150.14166766666665,
 150.4310001666667,
 150.72333326666666]

### Hypothesis 2
However, there might always huge narrative in our society that affect the stock market significantly, so maybe it is good to not remove those factors.
Indeed, this is yet another hypothesis to be tested by comparing the results I get by trying to eliminate a particular potential factor.
**if there is no significant difference between data collected by removing a factor A vs. the data collected by removing a factor B then we can conclude that on which day to buy VTI is not affected by a dominant narrative created by breaking news.**

### Motivation:
I read books called Sapiens: A Brief History of Humankind, Homo Deus: A Brief History of Tomorrow, and 21 Lessons for the 21st Century all written by the same author, Yuval Noah Harari. In these books, one of the arguments the author makes is that the stock market is the largest and the most complex algorithm humans have invented and I found this idea fascinating and wanted to play with the stock market's data. Also, for my personal finance, I would like to know when I should purchase index funds.