### Title: Modern Portfolio Theory - Applied on Berkshire Hathaway's Portfolio
Author: Tan Zhi Lun   
Contact: zhilun296@gmail.com

Modern Portfolio Theory, or Mean-Variance Analysis is one of the most well respected theories on how investors can construct a portfolio which maximizes their return or minimizes their volatility.  

This often involves identifying an efficient frontier (in the µ-σ space), and all other portfolios not on the efficient frontier are considered to be inefficient, as there are portfolios with higher expected return or lower standard deviation available.  
Below is an illustration of the efficient frontier:

<img src = "Images/efficientfrontier.jpg" width = 400>

In this exercise, we will be constructing two portfolios that:
1. Minimizes variance and standard deviation (implying quadratic utility curve); and 
2. Maximizes Sharpe Ratio  

respectively.  

Points to note:
1. Note that this exercise also 'doubles as' an attempt to reconstruct Berkshire Hathaway's portfolio. To avoid having to deal with the innumerable securities available, we will be defining our investment universe as securities currently in Berkshire Hathaway's portfolio as of Aug 2020, reducing our computational requirements and also giving us an insight of how MPT suggests their portfolio would be constructed.
2. While in reality Lagrangian functions/Lagrangian multipliers may be used, we will be utilizing the PyPortfolioOpt library to determine the efficient frontier and identify the portfolios meeting our criteria.

The discussion will be split into two parts.  
**1. Handling the Data**  
**2. Portfolio Optimization**

## 1. Handling the Data

In [1]:
from pandas_datareader import data as web
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import timedelta

As mentioned, we will be defining our investment universe as securities that are currently in Berkshire Hathaway's portfolio (as of Aug 2020).

In [2]:
universe=['AMZN','AXP','AAPL','AXTA','BAC','BK','GOLD','BIIB','CHTR','KO','COST','DVA','GM','GL','JNJ','JPM',
         'KHC','KR','LBTYA','LBTYK','LILA','LILAK','LSXMA','LSXMK','MTB','MA','MDLZ','MCO','PNC','PG','RH',
         'SIRI','SPY','STNE','STOR','SU','SYF','TEVA','USB','UPS','VOO','VRSN','V','WFC']
today_date=datetime.today()
start_date=today_date-timedelta(days=3*365)
start_date

datetime.datetime(2017, 9, 5, 17, 30, 32, 302715)

As a temporary measure for part 1 (before optimizing our portfolio), let us set weightage in each asset to be 1/N, where N is the number of securities in our universe.

In [None]:
weights=np.full(len(universe),1/len(universe))

We will be using yahoo as our data source. Note that we take the adjusted close as it is the most logical and usual convention to use this value for calculations.

In [5]:
for i in universe:
    df[i]=web.DataReader(i,data_source='yahoo',start=start_date,end=today_date)['Adj Close']
df.head()

Unnamed: 0_level_0,AMZN,AXP,AAPL,AXTA,BAC,BK,GOLD,BIIB,CHTR,KO,...,STOR,SU,SYF,TEVA,USB,UPS,VOO,VRSN,V,WFC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-09-05,965.27002,85.309998,40.52,29.129999,23.309999,51.200001,18.219999,315.829987,393.320007,45.919998,...,25.290001,31.83,29.870001,15.41,50.869999,113.919998,226.009995,102.540001,103.010002,50.099998
2017-09-06,967.799988,85.230003,40.477501,29.02,23.41,51.27,17.85,317.359985,402.5,45.959999,...,25.290001,32.540001,29.959999,15.52,50.91,113.809998,226.800003,102.790001,103.18,49.880001
2017-09-07,979.469971,84.410004,40.314999,28.719999,22.969999,50.279999,18.0,323.899994,395.640015,46.279999,...,25.5,32.889999,29.059999,15.47,49.860001,114.669998,226.75,103.480003,104.559998,49.68
2017-09-08,965.900024,84.25,39.657501,28.450001,22.889999,50.43,17.809999,326.329987,392.899994,46.299999,...,25.719999,32.709999,28.549999,15.5,49.98,115.050003,226.399994,103.309998,104.43,49.580002
2017-09-11,977.960022,85.690002,40.375,28.860001,23.360001,51.529999,17.41,326.540009,381.01001,46.52,...,25.889999,33.080002,29.15,18.5,50.880001,116.199997,228.869995,104.650002,106.150002,50.66


The returns will be compounded daily, and the figures given are the expected return of each asset annually.   
We take the size divided by 3 as the number of trading days per year are not 365 but typically 252 instead.

In [6]:
returns = df.pct_change()
compound_returns = (1+returns).prod()**(returns['AMZN'].size/3/returns.count()) -1
compound_returns

AMZN    0.517565
AXP     0.069930
AAPL    0.440250
AXTA   -0.073425
BAC     0.032579
BK     -0.103963
dtype: float64

In [7]:
# To make sure that there are no other na values
# The first row is expected to be na as we are measuring percentage change
filt = returns['AMZN'].isna()
returns['AMZN'].loc[filt]

Date
2017-09-05   NaN
Name: AMZN, dtype: float64

In [8]:
returns['AMZN'].size

756

One of the most important properties of the MPT is that it relies heavily on the covariance matrix, as we will calculate below. Also, we see that with a uniform stake of 1/N in each asset, we get an annual portfolio return of 6.52%, and standard deviation of 25%.

In [9]:
annual_cov_mat = returns.cov()*returns['AMZN'].size/3
annual_port_var = np.dot(weights.T,np.dot(annual_cov_mat,weights))
print("Portfolio variance:",round(annual_port_var,2)*100,"%")
print("Portfolio S.D.:",round(np.sqrt(annual_port_var),2)*100,"%")

Portfolio variance: 6.0 %
Portfolio S.D.: 25.0 %


In [10]:
annual_cov_mat = returns.cov()*returns['AMZN'].size/3
annual_cov_mat.head()

Unnamed: 0,AMZN,AXP,AAPL,AXTA,BAC,BK,GOLD,BIIB,CHTR,KO,...,STOR,SU,SYF,TEVA,USB,UPS,VOO,VRSN,V,WFC
AMZN,0.104716,0.043624,0.068118,0.040073,0.045874,0.036227,0.011249,0.038604,0.037797,0.018849,...,0.023576,0.035688,0.041326,0.05409,0.032464,0.039992,0.045372,0.054051,0.053279,0.036615
AXP,0.043624,0.143075,0.070088,0.088918,0.116526,0.08853,0.009307,0.051487,0.049774,0.05118,...,0.094483,0.098508,0.146435,0.080612,0.105427,0.052967,0.069654,0.062643,0.086555,0.111141
AAPL,0.068118,0.070088,0.111562,0.054685,0.070407,0.051253,0.013412,0.047246,0.046242,0.033523,...,0.048041,0.058598,0.074152,0.067395,0.055406,0.050314,0.059742,0.060569,0.069305,0.059611
AXTA,0.040073,0.088918,0.054685,0.152095,0.086363,0.064936,0.002213,0.041071,0.048084,0.03805,...,0.080725,0.076586,0.109466,0.06766,0.081837,0.0399,0.053376,0.045309,0.062765,0.080413
BAC,0.045874,0.116526,0.070407,0.086363,0.139998,0.095808,-0.002074,0.060689,0.046922,0.047204,...,0.079223,0.087564,0.134459,0.090336,0.115477,0.061203,0.069306,0.059048,0.076744,0.122854


In [11]:
annual_port_return = (compound_returns*weights).sum()
print("Expected annual portfolio return:",round(annual_port_return*100,2),"%")

Expected annual portfolio return: 6.52 %


## 2. Portfolio Optimization
While traditionally Lagrangian functions are used to calculate the efficient frontier, we will be making use of the PyPortfolioOpt library for our exercise today instead.

In [12]:
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns

If we compare these results with the ones that we calculated earlier, we see that the values are equal and that the library is functioning as expected.

In [13]:
miu = expected_returns.mean_historical_return(df)
print(miu[0:6])
sample_cov_matrix = risk_models.sample_cov(df) #Already annualized
sample_cov_matrix.head()

AMZN    0.517565
AXP     0.069930
AAPL    0.440250
AXTA   -0.073425
BAC     0.032579
BK     -0.103963
dtype: float64


Unnamed: 0,AMZN,AXP,AAPL,AXTA,BAC,BK,GOLD,BIIB,CHTR,KO,...,STOR,SU,SYF,TEVA,USB,UPS,VOO,VRSN,V,WFC
AMZN,0.104716,0.043624,0.068118,0.040073,0.045874,0.036227,0.011249,0.038604,0.037797,0.018849,...,0.023576,0.035688,0.041326,0.05409,0.032464,0.039992,0.045372,0.054051,0.053279,0.036615
AXP,0.043624,0.143075,0.070088,0.088918,0.116526,0.08853,0.009307,0.051487,0.049774,0.05118,...,0.094483,0.098508,0.146435,0.080612,0.105427,0.052967,0.069654,0.062643,0.086555,0.111141
AAPL,0.068118,0.070088,0.111562,0.054685,0.070407,0.051253,0.013412,0.047246,0.046242,0.033523,...,0.048041,0.058598,0.074152,0.067395,0.055406,0.050314,0.059742,0.060569,0.069305,0.059611
AXTA,0.040073,0.088918,0.054685,0.152095,0.086363,0.064936,0.002213,0.041071,0.048084,0.03805,...,0.080725,0.076586,0.109466,0.06766,0.081837,0.0399,0.053376,0.045309,0.062765,0.080413
BAC,0.045874,0.116526,0.070407,0.086363,0.139998,0.095808,-0.002074,0.060689,0.046922,0.047204,...,0.079223,0.087564,0.134459,0.090336,0.115477,0.061203,0.069306,0.059048,0.076744,0.122854


In [14]:
# We will need to pass these arguments into the EfficientFrontier class to generate it
efficientfrontier = EfficientFrontier(miu,sample_cov_matrix)

### 2.1 Portfolio with minimum volatility

In [15]:
weights = efficientfrontier.min_volatility()
weights_df = pd.Series(weights).to_frame()
weights_df.reset_index(inplace=True)
weights_df.columns=['Ticker','Weight']
print(weights_df.sort_values(by='Weight',ascending=False).reset_index().drop(columns='index'))
efficientfrontier.portfolio_performance(verbose=True)

   Ticker    Weight
0      KO  0.192601
1    GOLD  0.150493
2      KR  0.147530
3     JNJ  0.135973
4    COST  0.094734
5      PG  0.075792
6    AMZN  0.066445
7   LBTYK  0.036928
8    CHTR  0.031258
9     UPS  0.022358
10   SIRI  0.016251
11   AXTA  0.012518
12    DVA  0.011265
13   STOR  0.005853
14    SPY  0.000000
15   STNE  0.000000
16     RH  0.000000
17   TEVA  0.000000
18     SU  0.000000
19    SYF  0.000000
20    MCO  0.000000
21    USB  0.000000
22    VOO  0.000000
23   VRSN  0.000000
24      V  0.000000
25    PNC  0.000000
26  LSXMA  0.000000
27   MDLZ  0.000000
28    JPM  0.000000
29   AAPL  0.000000
30    BAC  0.000000
31     BK  0.000000
32   BIIB  0.000000
33     GM  0.000000
34     GL  0.000000
35    KHC  0.000000
36     MA  0.000000
37  LBTYA  0.000000
38   LILA  0.000000
39  LILAK  0.000000
40    AXP  0.000000
41  LSXMK  0.000000
42    MTB  0.000000
43    WFC  0.000000
Expected annual return: 14.2%
Annual volatility: 16.9%
Sharpe Ratio: 0.72


(0.14179158003982317, 0.168558968854053, 0.7225458299123588)

### 2.2 Portfolio with Maximum Sharpe Ratio

In [16]:
weights = efficientfrontier.max_sharpe()
weights_df = pd.Series(weights).to_frame()
weights_df.reset_index(inplace=True)
weights_df.columns=['Ticker','Weight']
print(weights_df.sort_values(by='Weight',ascending=False).reset_index().drop(columns='index'))
efficientfrontier.portfolio_performance(verbose=True)

   Ticker    Weight
0    AMZN  0.493832
1    COST  0.261157
2      RH  0.147613
3    GOLD  0.067942
4    AAPL  0.019611
5      KR  0.009844
6    STNE  0.000000
7    MDLZ  0.000000
8     MCO  0.000000
9     PNC  0.000000
10     PG  0.000000
11   SIRI  0.000000
12    SPY  0.000000
13     SU  0.000000
14   STOR  0.000000
15    MTB  0.000000
16    SYF  0.000000
17   TEVA  0.000000
18    USB  0.000000
19    UPS  0.000000
20    VOO  0.000000
21   VRSN  0.000000
22      V  0.000000
23     MA  0.000000
24  LSXMA  0.000000
25  LSXMK  0.000000
26     GM  0.000000
27   AXTA  0.000000
28    BAC  0.000000
29     BK  0.000000
30   BIIB  0.000000
31   CHTR  0.000000
32     KO  0.000000
33    DVA  0.000000
34     GL  0.000000
35    AXP  0.000000
36    JNJ  0.000000
37    JPM  0.000000
38    KHC  0.000000
39  LBTYA  0.000000
40  LBTYK  0.000000
41   LILA  0.000000
42  LILAK  0.000000
43    WFC  0.000000
Expected annual return: 48.3%
Annual volatility: 26.0%
Sharpe Ratio: 1.78


(0.482755847508769, 0.260140749394534, 1.7788672039494486)

### Closing Comments
We see that the portfolio constructed under MPT is extremely different from the portfolio of Berkshire Hathaway, which is shown in the final illustration at end of this discussion.  

Possible reasons include:
1. Evidently, Berkshire Hathaway does not construct their portfolio solely based on MPT!
2. MPT purely bases the "optimal portfolio" using past data, which leaving out information such as fundamental news, earnings, etc. which are of consideration to a human analyst.
3. Certain assumptions of MPT may also be considered too strong; being an economical model it assumes the markets are efficient, and the implicit assumption of covariances being constant may also not be always accurate.

Thank you for reading!

<img src = "Images/Portfolio.png">