# CAPM Analysis: Calculating the Beta of a stock as a Regression with Python

This jupyter notebook follows [medium post by Bernard Brenyah](https://medium.com/python-data/capm-analysis-calculating-stock-beta-as-a-regression-in-python-c82d189db536).
I adopted detailed explanations of relavant ideas from Investopedia using keywords: beta and CAPM.
<br>

Capital Asset Pricing Model (CAPM) is an extension of the Markowitz’s Modern Portfolio Theory. This model was developed by the independent works of William Sharpe, Jack Treynor, Jan Mossin, and John Lintner who built on the idea of diversification as introduced by the works of Harry Markowitz.

CAPM describes the relationship between systematic risk and expected return for assets(usually stocks). CAPM is widely used as a method to pricing risky securities aand for generating estimates of the expected returns of assets, considering both the risk of those assets and the cost of capital.<br> 
CAPM attempts to prices securities by examining the relationship that exists between expected returns and risk. The model implies that investors always combine two types of assets or securities; a risk-free asset and a risky asset in the form of a market portfolio of various assets. CAPM further posits that investors expect to be rewarded for holding these risky assets according to the risk inherited for holding on to such assets. After all, such kind of risk cannot be diversified (*market-related* usually referred to as *systematic risk*) and as a result, investors need to be compensated for taking on such “*undiversifiable*” risks. It is intuitive when you think about. Let’s look at this example:

An investor can buy risk-free asset like treasury bills of any stable government. If such an investor opts to buy some investment package from company ABC instead of this risk-free assets, then such an investor ought to be compensated for this decision. According to CAPM, company ABC does by this by offering the returns of the treasury bill plus an incentive usually referred to as market **premium/excess** market returns (Market Return-Risk Free Rate) for the given level of risk (**Beta**) the investors take.

This is why it is common for most funds to advertise “Treasury Bill rate + XYZ%” to customers.

The *Beta of an asset* is a measure of the sensitivity(volatility) of its(a security or portfolio) returns relative to a market benchmark (usually a market index). i.e. beta is a measure of the volatility(or systematic risk) of a security or portfolio compared to the market as a whole. How sensitive/insensitive is the returns of an asset to the overall market returns (usually a market index like S&P 500 index). What happens when the market jumps, does the returns of the asset jump accordingly or jump somehow?

Beta is used in the **CAPM (forumula)**:
$$
    r_i = r_f + \beta_i (r_m - r_f)
$$
where $r_i$ is the expected return of a security(or an asset $i$), $r_f$ is the *risk-free rate*, $\beta_i$ is the beta of the security $i$ relative to the market, $r_i$ ,the market return, and $r_m - r_f$ is the so-called *risk premium*.

The formula for calculating Beta of a stock(beta formula) is:

$$
    \beta = \frac{Cov(r_s, r_m)}{Var(r_m)}
$$

where:
- $r_s$ = the return on an individual stock
- $r_m$ = the return on the overall market<br>
and returns are computed over a certain specified period. Also ```WARNING```: for beta to provide any useful insight, the market that is used as a benchmark should be related to the stock. For example, calculating the beta of a bond ETF using the S&P 500 as the benchmark would not provide much helpful insight because bonds aand stocks are too dissimilar.But how do we check such '*suitability*'? This could be done via a statistical term, namely ```R-squared```: in order to make sure that a specific stock is being compared to the right benchmark, it should have a high R-squared value in relation to the benchmark. This could indicate a more relevant benchmark. e.g. a gold ETF would have a low beta and R-squared relationship w/ the S&P 500 as BM. Here R-squared is a statistical measure that shows the percentage of a security's historical price movements that can be explained by movements in the benchmark index.
<br><br>

In statistical terms, beta represents the slope of the line through a regression of data points. In finance, each of these data points represents an individul stock's returns against those of the market as a whole.

# Interpretation of a Beta with special values
A stock with a beta of:

**0** indicates no correlation with the chosen benchmark (e.g. NASDAQ index )

**1.0** indicates a stock has the same volatility as the market. It has a systematic risk. However, the beta calculation cannot detect any unsystematic risk. Adding such stock to a portfolio doesn't add any risk to the portfolio, but it also doesn't increase the likelihood that the portfolio will provide axcess return.

**greater than one** indicates a stock that’s more volatile than its benchmark. Such stocks will tend to move with more momentum than S&P 500. Technology stocks and small cap stocks tend to have higher betas than the benchmark. Adding such stocks to a portfolio will increase the risk of the portfolio, but may also increase its expected return.

**less than one** is (theoretically) less volatile than the benchmark or the market. Such stocks has less momentum. Including such stocks in a portfolio makes it less risky. e.g. Utility stocks often have low betas because they tend to move slower than market averages.

**1.5** is 50% more volatile than the benchmark

**-1.0** shows an inverse(or as a mirror image) movement of price, i.e. such stock is inversely correalated to the market benchmark. e.g. the ```gold price```.(relative to S&P 500) More precisely, GLD(SPDR Gold Shares,a gold ETF) would have a low beta and R-squared relationship w/ the S$P 500. Stocks w/ negative beta could be thought of as an opposite, mirror image of the benchmark's trend. ```Put options``` and ```inverse ETFs``` are designed to have negative betas. There are also a few industry groups, like gold miners, where a negative beta is also common.

# Beta in theory vs. in practice
Beta in coefficient theory assumes that stock returns are normally distributed from a statistical perspective, which is not always the case in reality. Financial markets are prone to laarge surprises. Therefore what a stock's beta might predict about a stock's future movement isn't always true.

# drawbacks of beta
Its limitations are: beta is useful in determing a security's short-term risk, and for analyzing volatility to arrive at equity costs when using the CAPM. However since beta is calculated using historical data points, it becomes less meaningful when looking into predict a stock's future movements. Also less useful for long-term investments since a stock's volatility can change significantly from year to year, depending upon the company's growth stage and other factors. Furthermore the beta measure on a particular stock tends to jump around over time, which makes it unreliable as a stable measure.

# So, is Beta a good measure of risk?
While beta provides some information abou risk, it is not an effective measure of risk on its own: beta is based on historical(past) data and does not provide any forward guidance. It also does not consider the fundamentals of company or its earnings and growth potential.

If you find the theoretical overview of CAPM confusing, I highly recommend that you watch this video:

------
Now its implementations are given as below:

In [2]:
# pip install statsmodels

# 처음에 아래 셀 실행시, ModuleNotFoundError: No module named 'statsmodel' 에러가 떠서 설치

# 결과는 아래
# Collecting statsmodels
#   Downloading statsmodels-0.13.5-cp311-cp311-win_amd64.whl (9.0 MB)
    #  ---------------------------------------- 9.0/9.0 MB 10.8 MB/s eta 0:00:00
# Requirement already satisfied: pandas>=0.25 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from statsmodels) (2.0.0)
# Collecting patsy>=0.5.2
#   Downloading patsy-0.5.3-py2.py3-none-any.whl (233 kB)
    #  -------------------------------------- 233.8/233.8 kB 7.2 MB/s eta 0:00:00
# Requirement already satisfied: packaging>=21.3 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from statsmodels) (23.0)
# Collecting scipy>=1.3
#   Downloading scipy-1.10.1-cp311-cp311-win_amd64.whl (42.2 MB)
    #  ---------------------------------------- 42.2/42.2 MB 9.8 MB/s eta 0:00:00
# Requirement already satisfied: numpy>=1.17 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from statsmodels) (1.24.2)
# Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from pandas>=0.25->statsmodels) (2.8.2)
# Requirement already satisfied: pytz>=2020.1 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from pandas>=0.25->statsmodels) (2023.3)
# Requirement already satisfied: tzdata>=2022.1 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from pandas>=0.25->statsmodels) (2023.3)
# Requirement already satisfied: six in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from patsy>=0.5.2->statsmodels) (1.16.0)
# Installing collected packages: scipy, patsy, statsmodels
# Successfully installed patsy-0.5.3 scipy-1.10.1 statsmodels-0.13.5
# Note: you may need to restart the kernel to use updated packages.
# 
# [notice] A new release of pip available: 22.3.1 -> 23.1.2
# [notice] To update, run: C:\Users\unbes\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip

Collecting statsmodels
  Downloading statsmodels-0.13.5-cp311-cp311-win_amd64.whl (9.0 MB)
     ---------------------------------------- 9.0/9.0 MB 10.8 MB/s eta 0:00:00
Collecting patsy>=0.5.2
  Downloading patsy-0.5.3-py2.py3-none-any.whl (233 kB)
     -------------------------------------- 233.8/233.8 kB 7.2 MB/s eta 0:00:00
Collecting scipy>=1.3
  Downloading scipy-1.10.1-cp311-cp311-win_amd64.whl (42.2 MB)
     ---------------------------------------- 42.2/42.2 MB 9.8 MB/s eta 0:00:00
Installing collected packages: scipy, patsy, statsmodels
Successfully installed patsy-0.5.3 scipy-1.10.1 statsmodels-0.13.5
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: C:\Users\unbes\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [1]:
import pandas as pd
import statsmodel.api as sm # 통계분석을 위함 라이브러리-회귀분석, 시계열 분석 등 다양한 통계 분석 기능을 제공

'''
Download monthly prices of Facebook and S&P 500 index from 2014 to 2017
CSV file downloaded from Yahoo File
start period: 02/11/2014 
end period: 30/11/2014
period format: DD/MM/YEAR
'''

fb = pd.read_csv('FB.csv', parse_dates=True, index_col='Date')
sp_500 = pd.read_csv('^GSPC.csv', parse_dates=True, index_col='Date')

# joining the closing prices of the two datasets 
monthly_prices = pd.concat(fb['close'], sp_500['close'], axis=1)    # axis=1 옵션으로 열 방향으로 합친다
monthly_prices.columns = ['FB', '^GSPC']

# check the head of the dataframe
monthly_prices.head()

ModuleNotFoundError: No module named 'statsmodel'

In [None]:
# calculate montly returns
monthly_returns = monthly_prices.pct_change(1)  # 1달 전 가격 대비 현재 가격의 변화율을 계산
clean_monthly_returns = monthly_returns.dropna(axis=0)  # drop the first missing row: 
clean_monthly_returns.head()

Good! Now that we have a clean set of monthly returns on Facebook and S&P 500. Let’s go ahead and make the Ordinary Least Square (OLS)Regression with Statsmodels.

In [None]:
# split dependent and indepent variables
X = clean_monthly_returns['^GSPC']
y = clean_monthly_returns['FB']

# Add a constant to the indepent value
X1 = sm.add_constant(X)

# make a regression model
model = sm.OLS(y, X1)

# fit model and print results
results = model.fit()
results.summary()

The moment of truth! Does our regression model work?

As you can see from the summary, the coefficient value for (^GSPC) is 0.5751. If the Beta value provided by Yahoo! Finance is anywhere as close to this figure, then our regression model and attempt to replicate how Yahoo! Finance calculates Beta values is correct.

BOOM! Yahoo Finance gives Facebook a Beta value of 0.58. Our regression model gives it a value of 0.5751 which when rounded off is 0.58.

As a bonus, I am also going to show how Scipy’s lingress method can be used to easily make a linear regression as well.

In [None]:
# 
from scipy import stats

slope, intercept, r_value, p_value, std_err =  stats.linregress(X, y)

print(slope)

The slope value is 0.575090640347 which when rounded off is the same as the values from both our previous OLS model and Yahoo! Finance.

Until the next post, happy coding!

As always the source code and associated files for this post along with previous posts can be checked on the [GitHub page](https://github.com/PyDataBlog/Python-for-Data-Science) for the Publication.