# The Capital Asset Pricing Model

In this section we will explore deriving the Betas for CAPM via Ordinary Least Squares Regression Analysis.

We can choose any interval (daily, weekly, monthly, etc.) to calculate $\beta$, but in order to enable comparisons with published betas we'll use a monthly interval, and look at the last five years.


## Setting up our Regression
Let's revisit the CAPM equation:

## $$E(R_i)=R_f+\beta_i(E(R_m)-R_f)$$

With a bit of rearrangment this becomes:
## $$E(R_i)=R_f + \beta_iE(R_m) - \beta_iR_f$$

## $$E(R_i)=(1-\beta_i)R_f + \beta_iE(R_m)$$

Redefining the first term as $a$ gives us a one-variable form with intercept (assuming for the moment that $R_f$ is closer to being a constant)$

## $$E(R_i)=a + \beta_iE(R_m)$$

We can use this to do a simple regression.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import getstock as gs
import pandas as pd

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

apikey = "B0LGKLAGCAQEQ2T1"

### Let's get our data

We want to calculate returns so we'll need:
1. Index monthly prices
1. Stock monthly prices and dividends
1. We will calculate returns from these
 1. The index levels are inclusive of dividends so we can just use the pandas pct_change() function 
 1. But for the stock we want to include dividends in the return. We calculate return from our prices $P$ and dividends $D$ for each period as:
 
 
 $$ \frac{P_{end} - P_{begin} + D}{P_{begin}} $$

In [None]:
# our stock
stock_symbol = "AAPL"

# stock monthly
stock_data = gs.getMonthlyStockPrices(stock_symbol, apikey)

# spx
index_data = gs.getMonthlyStockPrices("SPY", apikey)

# trim the data
stock_trimmed = stock_data['2015':]

index_trimmed = index_data['2015':]

In [None]:
# check our data, do we have dividends, etc?
stock_trimmed[stock_trimmed.dividend_amt > 0]

In [None]:
stock_trimmed.head()

In [None]:
# calculate returns
index_returns = index_trimmed.adjusted_close.pct_change()

# calc returns with dividends
stock_returns = (stock_trimmed.adjusted_close - stock_trimmed.adjusted_close.shift(1) + stock_trimmed.dividend_amt) / stock_trimmed.adjusted_close.shift(1)

In [None]:
# check our data
index_returns.head()

In [None]:
# clean up our N/As in first period
index_returns = index_returns.dropna()
stock_returns = stock_returns.dropna()

Before we do the regression, let's compare the returns of the stock and the market visually...

In [None]:
# for fun let's plot the stock returns and index (market) returns for the period.
fig, ax1 = plt.subplots(1, 1, figsize=(10, 6))

plt.plot(stock_returns, 'r-', label=stock_symbol)
plt.plot(index_returns, 'g-', label="S&P500")
fig.suptitle("{} vs the Market".format(stock_symbol))
plt.legend()
plt.show()

### Calculating Beta by OLS Regression

In [None]:
import statsmodels.formula.api as smf

model_data = pd.concat([stock_returns.dropna(), index_returns.dropna()], axis=1)
model_data.columns = ['stock_r', 'index_r']

In [None]:
model_data

In [None]:
results = smf.ols('stock_r ~ index_r - 1', data=model_data).fit()

In [None]:
print(results.summary())

In [None]:
results = smf.ols('stock_r ~ index_r - 1', data=model_data).fit()

In [None]:
print(results.summary())

### Interpreting Regression Results in the Context of Stock Betas

What does all this mean?

First, we can look out our coefficients, in the table above, or through ```results.params```

In [None]:
print(results.params)

This is our beta $\beta$:

In [None]:
results.params.index_r

next let's look at R-squared. R-squared describes the "explanatory" power of the regression model. In terms of CAPM and beta, we can think of R-squared as describing the % of the stock's return that is explained by the "market". The rest (1 - Rsq) is considered to be "idiosyncratic" to the stock.

In [None]:
# for our stock
print("R-squared (market contribution to return): {:.2f}%".format(results.rsquared * 100))
print("Idiosyncratic return: {:.2f}%".format((1-results.rsquared) * 100))

### But how good is our fit?

The OLS model is an *estimate* of the true model, and is therefore fit with some degree of error. How can we think about this? Let's take a look at some plots.

In [None]:
from statsmodels.graphics.gofplots import qqplot
import scipy.stats as stats

# qqplot can show us the residuals vs 
fig, ax = plt.subplots(1, 1, figsize=(8, 6))
qqplot(results.resid, fit=True, line="45", ax=ax)
plt.show()

In [None]:
# plot 
fig, ax = plt.subplots(1, 1, figsize=(8, 6))
ax.scatter(index_returns, stock_returns, label="actual index vs stock returns")
ax.xlabel = "S&P 500 return"
ax.ylabel = "stock return"
#ax.plot(index_returns, results.params.index_r * index_returns + results.params.Intercept, "r-", label="fitted values")
plt.plot(index_returns.dropna(), results.fittedvalues, "r-", label="fitted" )
plt.legend()
plt.show()

### Actual and predicted values - distribution

In [None]:
from scipy.stats import norm

mu = stock_returns.mean()
std = stock_returns.std()

fig, ax = plt.subplots(1, 1, figsize=(10, 5))

# plot our data
ax.hist(stock_returns, bins=20, density=True, alpha=0.6, color='blue', label="actual returns")

ax.hist(results.fittedvalues, bins=20, density=True, alpha=0.6, color='red', label="fitted returns")

# plot the PDF
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
ax.plot(x, p, 'k', linewidth=2, label="normal dist given real $\mu$ and $\sigma$")

plt.title("{} Fit Results: $\mu$ = {:.2f},  $\sigma$ = {:.2f}".format(stock_symbol, mu, std))
plt.legend()
plt.show()