
# #0. Regression analysis
This is a powerful tool for uncovering the associations between variables observed in data(i.e. is a powerful tool for statistical inference), but cannot easily indicate causation. (This is the reason why we need a broad knowledge and insights on the economy and the market as a whole.)

Two basic types of regression are:

- *Simple linear regression*: w/ single independent variable & its regression model is given as:

$$
    y_i = \beta_0 + \beta_1 x_i + \epsilon_i, \text{ for } i = 1, \ldots, n
$$

Let's define a problem: we will find $\beta_0$ and $\beta_1$ satisfying the regression model(a linear equation) which minimizes the errors

Least squares estimates, i.e. ***regression parameters***, are given as:
\begin{align*}
    \hat{\beta_1} &= \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \\
    \hat{\beta_0} &= \bar{y} - \hat{\beta_1} \bar{x}
\end{align*}
where $\bar{x}=\frac{\sum x_i}{n}$ is the mean of the $x$ values. (& the same for $\bar{y}$)<br><br>

- *Multiple linear regression*: this model uses two or more independent variables:

$$
    y = \beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n + \epsilon_i, \text{ for } i = 1, \ldots, n
$$

Other known estimation techniques are: Bayesian liner regression, Quantile regression, Mixed models, PCR(Principal Component Regression), Least-angle regression, Theil-Sen estimator, $\alpha$-trimmed mean approach, L-, M-, S-, and R-estimators. will deal w/ this later.

Its famous application includes: CAPM is an often-used regression model in finance for pricing assets and discovering costs of capital-

# #1. Linear Regresssion in Finance: computing beta and CAPM

In this example, we apply (elementary) linear regression to compute the beta $\beta$ of a single stock price, and then compute its CAPM.

Secondly, we extend the same idea to a portfolio of stocks to obtain its beta and then associated CAPM.

Recall that Capital Asset Pricing Model(CAPM) describes the relationship between the risk-free expected return of assets(such as *U.S. Treasury Bills*) and the systematic risk of the market(for example that of **S&P 500 index**). In other words, CAPM indicates that the expected return of an asset is equal to the risk-free return plus a risk premium. More explicilty this is given as the equation below.

In short, ```CAPM formula``` is given as follows:
$$
    r_i = r_f + \beta_i (r_m - r_f)
$$
where $r_i$ is the expected return of a security(or an asset $i$), $r_f$ is the *risk-free rate*, $\beta_i$ is the beta of the security $i$ relative to the market, $r_i$ ,the market return, and $r_m - r_f$ is the so-called *risk premium*.

For an individual stock, the expected return given by the above ```CAPM formula``` is not a particularly good predictor of the actual return. But, when the asset is a well-diversified portfoio of stocks, it is a much better predictor. As a result, the equation
$$
    \text{Return on diversified portfolio } = r_f + \beta (r_m - r_f)
$$
can be used as a basis for hedging a diversified portfolio. The $\beta$ in the equation is the beta of the whole portfoilo and can be calculated as the weighted average of the betas of the stocks in the portfolio.[Hull(11th ed.), p97]

- small digression<br>
$r_f$ acounts for the time value of money and the remaining terms account for an additional risk. The goal of CAPM formula is to evaluate whether a stock is fairly valued when its risk and the time value of money are compared w/ its expected return. i.e. by knowing the individual parts of CAPM, it is possible to gauge whether the current price of a stock is consistent w/ its likely return.


More on beta: Beta is a measure of a stock's volatility in relation to the overall market, e.g. *S&P 500 index*. In other words, Beta represents the slope of the regression line, which is the market return vs. the individual stocks return. Beta is used in the CAPM to describe the relationship between systematic risk, or market risk, and the expected return of an asset. By definition, overall market has beta of 1.0 and individual stocks are ranked by how volatile they are relative to the market. For example,
- If $\beta_i = 1.0$, this means the price of asset *i* is perfectly correlated w/ the market.
- If $\beta_i < 1.0$, then the security is theoretically less volatile than the market. This case is referred as '*defensive*'.
- If $\beta_i > 1.0$, then the asset price is more volatile than the market. Referred as '*aggresive*'<br><br>

- Caution<br>
When considering CAPM, one assumes there exists a risk-free asset with zero standard deviation. Also investors are assumed to be rational and want to maximize return and reduce risk as much as possible.

# Ways of implementing *linear regression* in Python

1. ```polyfit()``` & ```polyval()```

2. 

In [1]:
import pandas as pd
import numpy as np
from pylab import plt, mpl

plt.style.use('seaborn')
mpl.rcParams['font.family'] = 'serif'
%matplotlib inline
df = pd.read_csv('C:/Users/unbes/OneDrive/invest/Codes/ipynb/source/tr_eikon_eod_data.csv', 
                  index_col=0, parse_dates=True)

import plotly.express as px # Python data visualization library_can create interactive and high-quality visulizations with minimal code.

  plt.style.use('seaborn')


In [3]:
# pandas-datareader이 미설치 상태여서 아래 코드로 설치했다.
# !pip install 


# 아래는 그 결과
# Collecting pandas-datareader
#   Downloading pandas_datareader-0.10.0-py3-none-any.whl (109 kB)
#      -------------------------------------- 109.5/109.5 kB 6.2 MB/s eta 0:00:00
# Collecting lxml
#   Downloading lxml-4.9.2-cp311-cp311-win_amd64.whl (3.8 MB)
#      ---------------------------------------- 3.8/3.8 MB 10.9 MB/s eta 0:00:00
# Requirement already satisfied: pandas>=0.23 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from pandas-datareader) (2.0.0)
# Collecting requests>=2.19.0
#   Downloading requests-2.29.0-py3-none-any.whl (62 kB)
#      ---------------------------------------- 62.5/62.5 kB 3.3 MB/s eta 0:00:00
# Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from pandas>=0.23->pandas-datareader) (2.8.2)
# Requirement already satisfied: pytz>=2020.1 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from pandas>=0.23->pandas-datareader) (2023.3)
# Requirement already satisfied: tzdata>=2022.1 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from pandas>=0.23->pandas-datareader) (2023.3)
# Requirement already satisfied: numpy>=1.21.0 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from pandas>=0.23->pandas-datareader) (1.24.2)
# Collecting charset-normalizer<4,>=2
#   Downloading charset_normalizer-3.1.0-cp311-cp311-win_amd64.whl (96 kB)
#      ---------------------------------------- 96.7/96.7 kB 5.8 MB/s eta 0:00:00
# Collecting idna<4,>=2.5
#   Downloading idna-3.4-py3-none-any.whl (61 kB)
#      ---------------------------------------- 61.5/61.5 kB 3.2 MB/s eta 0:00:00
# Collecting urllib3<1.27,>=1.21.1
#   Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
#      -------------------------------------- 140.9/140.9 kB 8.2 MB/s eta 0:00:00
# Collecting certifi>=2017.4.17
#   Downloading certifi-2022.12.7-py3-none-any.whl (155 kB)
#      -------------------------------------- 155.3/155.3 kB 9.1 MB/s eta 0:00:00
# Requirement already satisfied: six>=1.5 in c:\users\unbes\appdata\local\packages\pythonsoftwarefoundation.python.3.11_qbz5n2kfra8p0\localcache\local-packages\python311\site-packages (from python-dateutil>=2.8.2->pandas>=0.23->pandas-datareader) (1.16.0)
# Installing collected packages: urllib3, lxml, idna, charset-normalizer, certifi, requests, pandas-datareader
# Successfully installed certifi-2022.12.7 charset-normalizer-3.1.0 idna-3.4 lxml-4.9.2 pandas-datareader-0.10.0 requests-2.29.0 urllib3-1.26.15

# [notice] A new release of pip available: 22.3.1 -> 23.1.2
# [notice] To update, run: C:\Users\unbes\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip

Collecting pandas-datareader
  Downloading pandas_datareader-0.10.0-py3-none-any.whl (109 kB)
     -------------------------------------- 109.5/109.5 kB 6.2 MB/s eta 0:00:00
Collecting lxml
  Downloading lxml-4.9.2-cp311-cp311-win_amd64.whl (3.8 MB)
     ---------------------------------------- 3.8/3.8 MB 10.9 MB/s eta 0:00:00
Collecting requests>=2.19.0
  Downloading requests-2.29.0-py3-none-any.whl (62 kB)
     ---------------------------------------- 62.5/62.5 kB 3.3 MB/s eta 0:00:00
Collecting charset-normalizer<4,>=2
  Downloading charset_normalizer-3.1.0-cp311-cp311-win_amd64.whl (96 kB)
     ---------------------------------------- 96.7/96.7 kB 5.8 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Downloading idna-3.4-py3-none-any.whl (61 kB)
     ---------------------------------------- 61.5/61.5 kB 3.2 MB/s eta 0:00:00
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
     -------------------------------------- 140.9/140.9 kB 8.2 MB/s eta 0:


[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: C:\Users\unbes\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [None]:
# import pandas_datareader as pdr

# prices = pdr.get_data_yahoo(['GOOGL', 'AAPL', 'MSFT', 'FB'], start='2013-01-01', end='2013-12-31')['Adj Close']
# prices.head()

In [6]:
df

Unnamed: 0_level_0,AAPL.O,MSFT.O,INTC.O,AMZN.O,GS.N,SPY,.SPX,.VIX,EUR=,XAU=,GDX,GLD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2010-01-01,,,,,,,,,1.4323,1096.35,,
2010-01-04,30.572827,30.950,20.88,133.90,173.08,113.33,1132.99,20.04,1.4411,1120.00,47.71,109.80
2010-01-05,30.625684,30.960,20.87,134.69,176.14,113.63,1136.52,19.35,1.4368,1118.65,48.17,109.70
2010-01-06,30.138541,30.770,20.80,132.25,174.26,113.71,1137.14,19.16,1.4412,1138.50,49.34,111.51
2010-01-07,30.082827,30.452,20.60,130.00,177.67,114.19,1141.69,19.06,1.4318,1131.90,49.10,110.82
...,...,...,...,...,...,...,...,...,...,...,...,...
2018-06-25,182.170000,98.390,50.71,1663.15,221.54,271.00,2717.07,17.33,1.1702,1265.00,22.01,119.89
2018-06-26,184.430000,99.080,49.67,1691.09,221.58,271.60,2723.06,15.92,1.1645,1258.64,21.95,119.26
2018-06-27,184.160000,97.540,48.76,1660.51,220.18,269.35,2699.63,17.91,1.1552,1251.62,21.81,118.58
2018-06-28,185.500000,98.630,49.25,1701.45,223.42,270.89,2716.31,16.85,1.1567,1247.88,21.93,118.22


In [None]:
df.

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2216 entries, 2010-01-01 to 2018-06-29
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AAPL.O  2138 non-null   float64
 1   MSFT.O  2138 non-null   float64
 2   INTC.O  2138 non-null   float64
 3   AMZN.O  2138 non-null   float64
 4   GS.N    2138 non-null   float64
 5   SPY     2138 non-null   float64
 6   .SPX    2138 non-null   float64
 7   .VIX    2138 non-null   float64
 8   EUR=    2216 non-null   float64
 9   XAU=    2211 non-null   float64
 10  GDX     2138 non-null   float64
 11  GLD     2138 non-null   float64
dtypes: float64(12)
memory usage: 225.1 KB


In [8]:
df.info

<bound method DataFrame.info of                 AAPL.O  MSFT.O  INTC.O   AMZN.O    GS.N     SPY     .SPX   
Date                                                                       
2010-01-01         NaN     NaN     NaN      NaN     NaN     NaN      NaN  \
2010-01-04   30.572827  30.950   20.88   133.90  173.08  113.33  1132.99   
2010-01-05   30.625684  30.960   20.87   134.69  176.14  113.63  1136.52   
2010-01-06   30.138541  30.770   20.80   132.25  174.26  113.71  1137.14   
2010-01-07   30.082827  30.452   20.60   130.00  177.67  114.19  1141.69   
...                ...     ...     ...      ...     ...     ...      ...   
2018-06-25  182.170000  98.390   50.71  1663.15  221.54  271.00  2717.07   
2018-06-26  184.430000  99.080   49.67  1691.09  221.58  271.60  2723.06   
2018-06-27  184.160000  97.540   48.76  1660.51  220.18  269.35  2699.63   
2018-06-28  185.500000  98.630   49.25  1701.45  223.42  270.89  2716.31   
2018-06-29  185.110000  98.610   49.71  1699.80  220.57 

In [9]:
df.head

<bound method NDFrame.head of                 AAPL.O  MSFT.O  INTC.O   AMZN.O    GS.N     SPY     .SPX   
Date                                                                       
2010-01-01         NaN     NaN     NaN      NaN     NaN     NaN      NaN  \
2010-01-04   30.572827  30.950   20.88   133.90  173.08  113.33  1132.99   
2010-01-05   30.625684  30.960   20.87   134.69  176.14  113.63  1136.52   
2010-01-06   30.138541  30.770   20.80   132.25  174.26  113.71  1137.14   
2010-01-07   30.082827  30.452   20.60   130.00  177.67  114.19  1141.69   
...                ...     ...     ...      ...     ...     ...      ...   
2018-06-25  182.170000  98.390   50.71  1663.15  221.54  271.00  2717.07   
2018-06-26  184.430000  99.080   49.67  1691.09  221.58  271.60  2723.06   
2018-06-27  184.160000  97.540   48.76  1660.51  220.18  269.35  2699.63   
2018-06-28  185.500000  98.630   49.25  1701.45  223.42  270.89  2716.31   
2018-06-29  185.110000  98.610   49.71  1699.80  220.57  2

In [10]:
df = df.dropna()

In [None]:
df['Date'] = df.transpose
df

In [None]:
# Take AAPL as our 1st example
stocks_df = df['AAPL.O']

In [None]:
# normalize stock data based on initial price
def normalize(df):
    x = df.copy()
    for i in x.columns[1:]:
        x[i] = x[i]/x[i][0]
    return x

# function to plot interactive plot
def interactive_plot(df, title):
    fig = px.line(title=title)  # 우선 빈 그래프 객체를 생성
    for i in df.columns[1:]:
        fig.add_scatter(x=df['Date'], y=df[i], name=i)  # 각 열을 돌며 add_scatter로 선 그래프를 추가
    fig.show()

interactive_plot(normalize(df), 'Normalized Prices')

# Calculating daily returns

In [None]:
def daily_return(df):
    df_daily_return = df.copy()
    for i in df.columns[1:]:
        for j in range(1, len(df)):
            df_daily_return[i][j] = ((df[i][j] - df[i][j - 1]) / df[i][j - 1]) * 100
        df_daily_return[i][0] = 0
    return df_daily_return

stocks_daily_return = daily_return(stocks_df)
stocks_daily_return

# calculating beta(and alpha) for a single stock price
...relative to S&P 500 index.

Recall that beta is the slope of the regression line, i.e., the market return vs. stock('AAPL') return. Here we use ```np.polyfit()```:

In [None]:
beta, alpha = np.polyfit(stocks_daily_return['SPY'], stocks_daily_return['AAPL'], 1)
print('Beta for {} stock is = {} and alpha is ={}'.format('AAPL', beta, alpha))

# Calculating CAPM for a single stock

In [None]:
stocks_daily_return['sp500'].mean()

In [None]:
# annual average market return
rm = stocks_daily_return['sp500'].mean() * 252

In [None]:
# setting Risk-free rate = 0
rf = 0

# CAPM for AAPL
ER_AAPL = rf + beta * (beta - rf)
ER_AAPL

# Calculating beta for a portfolio of stocks

In [None]:
beta = {}
alpha = {}

for i in stocks_daily_return.columns:
    if i != 'Date' and i != 'sp500':
        stocks_daily_return.plot(kind='scatter', x='sp500', y=i)
        b, a = np.polyfit(stocks_daily_return['sp500'], stocks_daily_return[i], 1)
        plt.plot(stocks_daily_return['sp500'], b * stocks_daily_return['sp500'] + a, '-', color='r')
        beta[i] = b
        alpha[i] = a
        plt.show()

In [None]:
beta

In [None]:
alpha

# calculating CAPM for a portfolio of stocks

In [None]:
keys = list(beta.keys())

ER = {}

rf = 0
rm = stocks_daily_return['sp500'].mean() * 252

In [None]:
for i in keys:
    ER[i] = rf + beta[i] * (rm - rf)

for i in keys:
    print('Expected Return Based on CAPM for {} is {}%'.format(i, ER[i]))

In [None]:
portfolio_weights = 1/8 * np.ones(8)
ER_portfolio = sum(list(ER.values())) * portfolio_weights

print('Expected Return Based on CAPM for the pportfolio is {}%\n'.format(ER_portfolio))