# Part 05 - Pandas applied for stock market analysis

One of the most interesting uses of python is as a data analysis system. Its plotting tools are top notch, and along with libraries such as pandas, there are a wealth of statistical analysis tools available.

For a more complete look at data analysis using pandas, see [this example working through earth science data](https://www.scipy-lectures.org/intro/scipy.html#statistics-and-random-numbers-scipy-stats). I have a personal interest in stocks and shares, and I'm a avid fan of SpaceX and Tesla, so lets look at some data on Tesla.

In [None]:
# Pandas is like a programmable spreadsheet, you have named columns of data and an index
import pandas as pd
# This next line is needed to fix a current bug in pandas_datareader
pd.core.common.is_list_like = pd.api.types.is_list_like
# Pandas datareader is an interface to lots of kinds of sources of data, including stocks
import pandas_datareader.data as web

# It should be noted that Pandas datareader for stock information breaks all the time,
# all I can promise is that this works today, 6th July 2018
 
# Let's get Tesla stock data; Tesla's ticker symbol is TSLA on the US NASDAQ exchange
# First argument is the series we want, second is the source ("stooq" works at the moment),
# third is the start date, fourth is the end date
tesla = web.DataReader("TSLA.US", "stooq")

# We'll also extract the Standard & Poor 500 index (sum of the top 500 companies)
spx = web.DataReader("^SPX", "stooq")

#What kind of data did we get? Lets print the header of the table and the first few rows
print(tesla.head())

Great, opening and closing prices as well as daily highs/lows and the volume of stocks traded, all indexed by the date of the information. Notice that some dates are missing (weekends/holidays).

You should note that the last cell took a long time to run. That's because websites with stock data limit the rate at which you can download information. Here is a place where data should be downloaded once, then pickled for later processing (and reprocessing).

We need a basis for comparison for the performance of an individiual stock, so we also downloaded the S&P 500 index (the average of the 500 largest companies on the index).

In [None]:
print(spx.head())

OK, so the last working day is a NaN value (Not a number), probably because the aggregate volume data is not available yet. This is typical of stock market data, full of interesting features. 

Let's try plotting the data and comparing tesla versus the index,

In [None]:
import matplotlib.pyplot as plt   # Import matplotlib
# We will look at stock prices over the past years, starting at January 1, 2016

spx["Close"].plot(grid = True) # Plot the adjusted closing price of AAPL
tesla["Close"].plot(grid = True) # Plot the adjusted closing price of AAPL

Ooopps, the S&P 500 has data back until the 1800s.... Lets filter that data, only taking back as far as the start of 2016

In [None]:
start = datetime.date.today()
end = datetime.datetime(2016,1,1)
spx.loc[start:end]["Close"].plot(grid = True)
tesla.loc[start:end]["Close"].plot(grid = True)

Let's normalise by the price today

In [None]:
start = datetime.datetime(2018,7,6) #Today
end = datetime.datetime(2016,1,1)
(spx.loc[start:end]["Close"]/spx.iloc[0]["Close"]).plot(grid = True)
(tesla.loc[start:end]["Close"]/tesla.iloc[0]["Close"]).plot(grid = True)

Nice! We can see that over this time period, Tesla actually performed exactly to market average but with significant volatility...

Let's plot tesla's performance relative to the S&P 500 over that time.

In [None]:
start = datetime.datetime(2018,7,6) #Today
end = datetime.datetime(2016,1,1)
data = tesla.loc[start:end]["Close"]/spx.loc[start:end]["Close"]*spx.iloc[0]["Close"] - tesla.iloc[0]["Close"]
data.plot(grid = True)

Warren Buffet (famous american investor, 3rd wealthiest person in the world), made a  \$1M bet that a low-cost S&P 500 index tracker (simple fund that invests in all the companies in the index) would outperform highly managed index funds over 10 years. He won this bet when the index tracker made 7\% annual and the hedge funds made 2.2\%. 

This is anecdotal evidence that even stocks that have performed amazingly at times (like tesla) still average out to the index.

## Extra credit

For more information on this, see this example:
http://earthpy.org/pandas-basics.html


- What day of the week is the best day to buy Tesla? Just take all data for the week, normalise it by Monday's value, and then plot the average value over the rest of the week.