# Financial Analysis with the `stock_analysis` package
In this chapter we will build the [`stock_analysis` package](https://github.com/stefmolin/stock-analysis) using everything we learned in the book so far, and use it for some basic technical analysis of financial assets.

## Data Extraction
The `StockReader` class in the `stock_analysis` package makes it easy to collect financial data, like stock market or bitcoin prices. We simply have to create an instance by specifying the data range. Then, we can grab the data we wish:

In [6]:
!pip3 install git+https://github.com/stefmolin/stock-analysis.git@master

Collecting git+https://github.com/stefmolin/stock-analysis.git@master
  Cloning https://github.com/stefmolin/stock-analysis.git (to revision master) to /private/var/folders/sb/c6y3g1vx66scbk6ld_kzcxsh0000gn/T/pip-req-build-9tmge05r
Building wheels for collected packages: stock-analysis
  Running setup.py bdist_wheel for stock-analysis ... [?25ldone
[?25h  Stored in directory: /private/var/folders/sb/c6y3g1vx66scbk6ld_kzcxsh0000gn/T/pip-ephem-wheel-cache-46ii44th/wheels/1c/fa/6a/def46c020b082a675eab6de594dcbe1c14e1a4ce51138e8763
Successfully built stock-analysis
[33mYou are using pip version 18.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [7]:
import stock_analysis

ModuleNotFoundError: No module named 'stock_analysis'

In [8]:
reader = stock_analysis.StockReader('2017-01-01', '2018-12-31')

# get bitcoin data
bitcoin = reader.get_bitcoin_data()

# get faang data
fb, aapl, amzn, nflx, goog = (
    reader.get_ticker_data(ticker) for ticker in ['FB', 'AAPL', 'AMZN', 'NFLX', 'GOOG']
)

# get S&P 500 data
sp = reader.get_index_data()

NameError: name 'stock_analysis' is not defined

## EDA
We can group our assets together in dataframes in order to perform EDA:

In [None]:
from stock_analysis.utils import group_stocks, describe_group

faang = group_stocks(
    {
        'Facebook' : fb, 
        'Apple' : aapl, 
        'Amazon' : amzn, 
        'Netflix' : nflx, 
        'Google' : goog
    }
)

faang_sp = group_stocks(
    {
        'Facebook' : fb, 
        'Apple' : aapl, 
        'Amazon' : amzn, 
        'Netflix' : nflx, 
        'Google' : goog,
        'S&P 500' : sp
    }
)

bit_sp = group_stocks(
    {
        'Bitcoin' : bitcoin,
        'S&P 500' : sp
    }
)

all_assets = group_stocks(
    {
        'Bitcoin' : bitcoin,
        'S&P 500' : sp,
        'Facebook' : fb, 
        'Apple' : aapl, 
        'Amazon' : amzn, 
        'Netflix' : nflx, 
        'Google' : goog
    }
)

This makes it easy to describe them all at once:

In [None]:
describe_group(all_assets).loc['close',]

It's a dataframe so we can transpose it to view it easier. Here we see that we have more entries for bitcoin than the S&P 500 because bitcoin prices change daily.

In [None]:
describe_group(bit_sp).loc['close',].T

We can compare the FAANG stocks to each other:

In [None]:
describe_group(faang).loc[['close', 'volume'],]

### Visualizations
The `StockVisualizer` class makes it easy to create a variety of visualizations for financial analysis. We simply pass in the dataframe of the asset we want to visualize:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

netflix_viz = stock_analysis.StockVisualizer(nflx)

We can use the `pairplot()` method to see the `seaborn.pairplot()` of the OHLC data:

In [None]:
netflix_viz.pairplot()

*Note: to look at quarterly performance in the pairplot as we did in [chapter 6](https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas/blob/master/ch_06/1-introduction_to_seaborn.ipynb), add a `quarter` column to the Netflix dataframe before creating the `StockVisualizer` object and pass `hue='quarter'` to the `pairplot()` method.*

We can plot the evolution of a column over time to see how the price changed and add reference lines as we please:

In [None]:
ax = netflix_viz.evolution_over_time(
    'close', 
    figsize=(10, 4), 
    legend=False, 
    title='Netflix closing price over time'
)
netflix_viz.add_reference_line(
    ax, 
    x=nflx.high.idxmax(), 
    color='k', 
    linestyle=':', 
    label=f'highest value ({nflx.high.idxmax():%b %d})',
    alpha=0.5
)
ax.set_ylabel('price ($)')

July was a rough month for Netflix with after hours trading:

In [None]:
netflix_viz.after_hours_trades()

Facebook had many more rough months:

In [None]:
# compare to FB
stock_analysis.StockVisualizer(fb).after_hours_trades()

We can look at the volume traded and stock price over time to see that while the Netflix stock price has trended up so has volume traded.

In [None]:
netflix_viz.trade_volume(True)

We can use the moving average to smooth out the trend for the stock. Notice the fall Netflix took in July pushed the moving averages below the stock price while they had been above the price in the past.

In [None]:
ax = netflix_viz.moving_average('close', ['60D', '200D'])
netflix_viz.shade_region(
    ax, x=('2018-07-01', '2018-07-31'), color='blue', alpha=0.2
)
plt.suptitle('Netflix Closing Price')
plt.ylabel('price ($)')

The moving average may be used to determine support and resistance levels. Support exerts upward pressure on the stock price (lower prices get people to buy) and resistance exerts downward force on the stock price (high prices cause people to sell):

In [None]:
ax = stock_analysis.StockVisualizer(nflx['2018']).evolution_over_time(
    'close', figsize=(15, 3), legend=False, title='Understanding support and resistance'
)

ref_lines = [
    (315, 0.55, 0.77, 'g', 'support (buy now)'),
    (250, 0.12, 0.17, 'g', ''),
    (280, 0.25, 0.3, 'g', ''),
    (280, 0.78, 0.83, 'g', ''),
    (260, 0.85, 0.92, 'g', ''),
    (230, 0.93, 0.96, 'g', ''),
    (385, 0.46, 0.53, 'g', ''),
    (420, 0.43, 0.55, 'r', 'resistance (sell now)'),
    (285, 0.05, 0.16, 'r', ''),
    (335, 0.17, 0.38, 'r', ''),
    (385, 0.6, 0.77, 'r', ''),
    (330, 0.8, 0.83, 'r', ''),
    (290, 0.86, 0.96, 'r', '')
]

for y, xmin, xmax, color, label in ref_lines:
    ax.axhline(y, xmin, xmax, color=color, label=label)

arrows = [
    ('2018-08-26', 315, 0, 20, 'g'),
    ('2018-02-17', 250, 0, 10, 'g'),
    ('2018-04-02', 290, 0, 20, 'g'),
    ('2018-11-03', 280, 0, 10, 'g'),
    ('2018-12-01', 260, 0, 10, 'g'),
    ('2018-12-29', 230, 0, 10, 'g'),
    ('2018-06-10', 385, 0, 20, 'g'),
    ('2018-07-20', 420, 0, -20, 'r'),
    ('2018-02-04', 285, 0, -10, 'r'),
    ('2018-03-02', 335, 0, -10, 'r'),
    ('2018-04-27', 335, 0, -10, 'r'),
    ('2018-10-10', 385, 0, -10, 'r'),
    ('2018-11-12', 330, 0, -10, 'r'),
    ('2018-12-20', 290, 0, -10, 'r')
]

for date, y, growx, growy, color in arrows:
    ax.arrow(date, y, growx, growy, width=2, alpha=0.5, color=color)

plt.ylabel('price ($)')
plt.legend()

Sometimes exponential smoothing makes more sense because we can weight more recent values higher. Notice how the exponentially smoothed curves more closely track the stock price compared to the moving average with equally weighted observations (especially in the shaded region where the drop occurred):

In [None]:
ax = netflix_viz.exp_smoothing('close', [60, 200])
q_3, q_4 = nflx.close.quantile([0.75, 1])
netflix_viz.shade_region(
    ax, y=(q_3, q_4), color='grey', alpha=0.2
)
plt.suptitle('Netflix Closing Price')
plt.ylabel('price ($)')

If you follow the installation instructions [here](https://github.com/matplotlib/jupyter-matplotlib), you can run the following interactive plot to see what different windows do to the moving average and exponentially smoothed moving average. This requires you to install `ipyml` and `node.js` and run a few commands from the command line as indicated in the aforementioned link. Note you will need to restart the kernel.

*More information on the interact() function can be found [here](https://ipywidgets.readthedocs.io/en/stable/examples/Using%20Interact.html).*

In [None]:
%matplotlib widget
from ipywidgets import interact

def moving_averages(viz):
    def plot_viz(period=20, exponentially_smoothed=False):
        if exponentially_smoothed:
            ax = netflix_viz.exp_smoothing('close', period)
        else:
            ax = netflix_viz.moving_average('close', str(period) + 'D')
        netflix_viz.shade_region(
            ax, x=('2018-07-01', '2018-07-31'), color='blue', alpha=0.2
        )
        
        plt.suptitle('Neftlix Closing Price')
        plt.ylabel('price ($)')
    
    return plot_viz
interact(moving_averages(netflix_viz), period=(5, 200, 5), exponentially_smoothed=False);

Switch back to `inline` plots:

In [None]:
%matplotlib inline

Netflix hit its max and has been trending downward:

In [None]:
import numpy as np
ax = netflix_viz.exp_smoothing('close', 60)
netflix_viz.add_reference_line(
    ax, y=nflx.high.max(), color='red', linestyle='-', label='max'
)
nflx_decline = nflx[nflx.high.idxmax():]
netflix_viz.add_reference_line(
    ax, 
    x=nflx_decline.index, 
    y=np.linspace(
        nflx_decline.high.max(), 
        nflx_decline.low.min(), 
        num=nflx_decline.shape[0]
    ), 
    color='r', 
    linestyle=':', 
    label='decline'
)
plt.suptitle('Netflix Closing Price')
plt.ylabel('price ($)')

We can use a joint plot to see how Netflix compared to the S&P 500:

In [None]:
netflix_viz.jointplot(sp, 'close')

Netflix and Facebook both struggled in July, but their correlation is weak:

In [None]:
netflix_viz.correlation_heatmap(fb)

We can see that Facebook and Netflix had similar stock prices in 2017, but Netflix really pulled ahead in 2018, despite taking a tumble (because Facebook did as well):

In [None]:
netflix_viz.fill_between_other(fb)

If we want to visualize groups of assets we use the `AssetGroupVisualizer`:

In [None]:
faang_viz = stock_analysis.AssetGroupVisualizer(faang)

We can view the effect that after hours trading had on each of the FAANG stocks:

In [None]:
faang_viz.after_hours_trades()

We can look at what the exponentially smoothed moving averages look like for all FAANG stocks as well:

In [None]:
faang_viz.exp_smoothing('close', 20)

With short windows, both these smoothing methods perform similarly:

In [None]:
faang_viz.moving_average('close', '20D')

The distributions of closing price across the FAANG stocks are quite different:

In [None]:
faang_viz.histogram('close')

Viewing boxplots we can easily see that Google and Amazon are on a much larger scale:

In [None]:
faang_viz.boxplot('close')
plt.yscale('log')

Plotting all the FAANG stocks' closing prices over time shows that early 2018 had a lot of growth for the lot of them:

In [None]:
faang_viz.evolution_over_time('close')

Correlations between the stocks are weak, with the strongest being between Google and Amazon:

In [None]:
faang_viz.heatmap(True)

We can view this information as scatter plots as well:

In [None]:
faang_viz.pairplot()

Bitcoin is on a much larger scale so we have to use subplots to compare it to the stock market. Notice that bitcoin took a huge tumble (check out the scale on the y-axis):

In [None]:
faang_sp_viz = stock_analysis.AssetGroupVisualizer(faang_sp)
bitcoin_viz = stock_analysis.StockVisualizer(bitcoin)

fig, axes = plt.subplots(1, 2, figsize=(15, 5))
faang_sp_viz.evolution_over_time('close', ax=axes[0])
bitcoin_viz.evolution_over_time('close', ax=axes[1], label='Bitcoin')

## Technical Analysis
Technical analysis involves looking at metrics to understand the peformance of an asset.
### Single Asset
We can use the `StockAnalyzer` to calculate metrics for us:

In [None]:
nflx_analyzer = stock_analysis.StockAnalyzer(nflx)

One such metric is annualized volatility:

In [None]:
nflx_analyzer.annualized_volatility()

### Asset Groups
When working with groups, we use the `AssetGroupAnalyzer`:

In [None]:
all_assets_analyzer = stock_analysis.AssetGroupAnalyzer(
    all_assets
)

#### Coefficient of Variation
We can check how large the standard deviation is in terms of the mean of the stock price as another way to look at volatility. This shows that bitcoin is the most volatile:

In [None]:
all_assets_analyzer.analyze('cv')

#### Annualized Volatility
Facebook looks more volatile using this metric than the previous, but it's just middle of the pack.

In [None]:
all_assets_analyzer.analyze('annualized_volatility')

#### Bear Market
We can check if the assets are in a bear market, meaning their value dropped 20% in the last 2 months:

In [None]:
all_assets_analyzer.analyze('is_bear_market')

#### Beta
Beta tells us how the assets performance correlates to a market index. A beta of 1 with the S&P 500 means the asset moves with the S&P 500. Notice bitcoin is below zero because it is a different type of asset.

In [None]:
all_assets_analyzer.analyze('beta', index=sp)

#### Alpha
Alpha allows us to compare the returns of an asset to an index. We can compare to the S&P 500 using a risk-free rate of return of 2.46% (current government treasury bond rate). This shows that Facebook has worse returns and bitcoin much higher than the S&P 500 (due to scale): 

In [None]:
all_assets_analyzer.analyze('alpha', index=sp, r_f=2.46)

#### Cumulative Returns
Cumulative returns show us what we would have made for a dollar investment at time period 0. This can be multipled by the actual investment to get the return:

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
cumulative_returns = all_assets_analyzer.analyze(
    'cumulative_returns'
)
for name, data in cumulative_returns.items():
    data.plot(
        ax=axes[0] if name == 'Bitcoin' else axes[1], 
        label=name,
        legend=True
    )
plt.suptitle('Cumulative Returns 2017-2018')

## Modeling
For modeling tasks, we can use the `StockModeler`:

In [None]:
from stock_analysis import StockModeler

### Time Series Decomposition
We can decompose the Netflix time series into its trend, seasonality component, and residuals:

In [None]:
decomposition = StockModeler.decompose(nflx, 20) # 20 period frequency
fig = decomposition.plot()
fig.suptitle('Netflix Stock Price Time Series Decomposition', y=1)

### ARIMA
We can use ARIMA to model the performance with autoregressive (AR), differences or lagged data (I), and moving average (MA) terms. The autocorrelation plot can help find a good starting point for this:

In [None]:
# we can use autocorrelation plots to find a good starting point for ARIMA
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(nflx)

ARIMA takes a long time to train, so we will start with an AR of 10:

In [None]:
# this takes a long time to run, so we will start with a smaller AR of 10
arima_model = StockModeler.arima(nflx, ar=10, i=1, ma=5)

The `summary()` method shows statistics on the model's performance:

In [None]:
# AR = 10, I = 1, MA = 5
print(arima_model.summary())

The residuals appear to be normally distributed; however, their size appears to depend on time, so this is not a good model:

In [None]:
StockModeler.plot_residuals(arima_model)

### Linear Regression
We can build a linear regression using the last value to determine the next value:

In [None]:
X, Y, lm = StockModeler.regression(nflx)
print(lm.summary())

Once again, the residuals appear to depend on time:

In [None]:
StockModeler.plot_residuals(lm)

### Compare Models Against Actuals
We can look at the predictions made. ARIMA looks like this:

In [None]:
import datetime

start = datetime.date(2019, 1, 1)
end = datetime.date(2019, 1, 31)

StockModeler.arima_predictions(
    nflx, arima_model, start=start, end=end, title='ARIMA'
)
plt.ylabel('price')

Let's see how these models compared to what actually happened:

In [None]:
jan_2019 = stock_analysis.StockReader(
    start, end
).get_ticker_data('NFLX')

fig, axes = plt.subplots(1, 2, figsize=(15, 5))

arima_ax = StockModeler.arima_predictions(
    nflx, arima_model, start=start, end=end,
    ax=axes[0], title='ARIMA'
)
jan_2019.close.plot(ax=arima_ax, style='b:', label='actual close')
arima_ax.legend()
arima_ax.set_ylabel('price ($)')

linear_reg = StockModeler.regression_predictions(
    nflx, lm, start=start, end=end,
    ax=axes[1], title='Linear Regression'
)
jan_2019.close.plot(ax=linear_reg, style='b:', label='actual close')
linear_reg.legend()
linear_reg.set_ylabel('price ($)')

### Random Walks
Prediction is difficult due to randomness. Stock price can be modeled like a [random walk](https://en.wikipedia.org/wiki/Random_walk). Can you find the real stock data? 

*Note: To get different results, pass in a number to the `seed` argument.*

In [None]:
from random_walk import random_walk_stock_comparison

real_stock, axes = random_walk_stock_comparison(aapl['2017'], probs=[0.45, 0.55])
axes

Was your guess correct?

In [None]:
real_stock