<a href="https://colab.research.google.com/github/zhong338/MFM-FM5222/blob/main/Week11_lecture_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FM 5222
# Week 11

## Factor Models 

We will continue our disussion of Factor Models, which will take the general form:


$$R_{j,t} = \beta_{0,j} + \beta_{1,j}F_{1,t} + \beta_{2,j}F_{2,t} + \cdots + \beta_{j,n}F_{p,t} + \epsilon_{j,t}$$


where there are $n$ assets indexed by $j$, $p$ factors and $n\times (p+1)$ factor loadings $\beta_{k,j}$

We assume that the noises are uncorrelated (hopefully independent) both temporally 

$$\mathrm{Cov}(\epsilon_{j,t},\epsilon_{j,s}) = 0, t \neq s$$

and cross-sectionally

$$\mathrm{Cov}(\epsilon_{j,t},\epsilon_{i,t}) = 0, j \neq i$$


Furthermore, it is assumed that the factors are uncorrelated to the noise terms:


$$\mathrm{Cov}(F_{k,t}, \epsilon_{j,s}) = 0, \forall (k,j,t,s)$$


We can also use Matrix Notation to assist via

We let $F$ be the $p\times N$ matrix of observations of the factors.   We can then construct the data matrix

$$X = \begin{bmatrix}\mathbf{1}_c & F^T \\ \end{bmatrix}$$

where $\mathbf{1}_c$ is an $N$ dimenstional column vector of of ones.

Note that $X$ is $N \times (p+1)$

We let

$$B_0 = [\beta_{0,1}, \beta_{0,2}, ..., \beta_{0,n}]$$

be the row vector representing the constanst term of each of the $n$ assets.

And 




$$B = \begin{bmatrix} \mathbf{\beta}_1^T & \mathbf{\beta}_2^T & \cdots & \mathbf{\beta}_n^T \\ \end{bmatrix}   $$

where $\mathbf{\beta}_j^T$ is the column vector with elements $\beta_{k,j}, k>0$

Note that $B$ is $p \times n$


$R^T$ is the $N \times n$ matrix whose $j^{th}$ column correspond to the observed returns of asset $j$.


$E$ is the matrix valuess White Noise Random Variable where the $j^{th}$ column  associated with asset $j$ and each column is IID and uncorrelated with the other columns.   $E$ will be $N \times n$


The model is then:


$$ R^T = X \begin{bmatrix}B_0 \\ B\\ \end{bmatrix}+E $$



Fitting this is no different than the case where $p = 1$, we use the pseudo-inverse:

$$\begin{bmatrix}\hat{B}_0 \\ \hat{B}\\ \end{bmatrix} = (X^TX)^{-1}X^T R^T$$




### Agenda



* Fama- French 3 Factor model
* PCA discussion
* PCA with real data
* Short Comment
    * Cross-Sectional Factor Models
    * Statistical Factor Models




## Fama - French 3 factor model

The FF3 model is an extension of CAPM.  Insted of the broad market as the only factor, it also posits two other factors:

1. The excess returns of small cap stocks over large cap stocks
2. The excess returns of high book to market (value) stocks over low book to market (growth) stocks.


We will make our own "version" of this model by using quoted indices and 10 years of history to train over. 


#### Indices

Risk-free rate: Federal Funds rate
Broad Market:  Wilshire5000  (^W5000)
Large Cap:  S&P500  (^GSPC)
Small Cap:  Russell2000 (^RUT)
Value:Vanguard Value Index Fund (VTV)
Growth: Vanguard Growth Index Fund (VUG)




The model will then be


$$R_{j,t} - rf_t = \beta_{j,1} (M_t - rf) + \beta_{j,2} (SC_t - LC_t) + \beta_{j,3} (V_t - G)t) + \epsilon_{j,t}$$


where in general the variables represent log-Returns.

We will fit (as before) to U.S. Bank, Pepsi, and Otter Tail.  But will will also add in Alcoa (AA) and Intuit (INTU).







In [None]:
! pip install yfinance

Collecting yfinance
  Downloading yfinance-0.1.70-py2.py3-none-any.whl (26 kB)
Collecting lxml>=4.5.1
  Downloading lxml-4.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB)
[K     |████████████████████████████████| 6.4 MB 11.6 MB/s 
Collecting requests>=2.26
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 470 kB/s 
Installing collected packages: requests, lxml, yfinance
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: lxml
    Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import yfinance as yf
import pandas as pd
import pandas_datareader.data as dr

  import pandas.util.testing as tm


In [None]:
start = '2012-01-01'
end = '2022-01-01'


stickers = ['USB', 'PEP','OTTR',"AA", "INTU"  ]

itickers =['^W5000', '^GSPC', '^RUT', 'VTV',"VUG"] 


stocks = yf.download(stickers,start = start, end = end )

indices = yf.download(itickers,start = start, end = end )


rf = dr.DataReader(['DFF'], 'fred', start = start, end= end)

[*********************100%***********************]  5 of 5 completed
[*********************100%***********************]  5 of 5 completed


Now we make log-returns

In [None]:
Srets = np.log(stocks.Close).diff()

Irets = np.log(indices.Close).diff()


Srets['rfd'] = rf/252

Irets['rfd'] = rf/252

data = pd.DataFrame()




data['OTTR_er'] =Srets['OTTR'] - Srets.rfd

data['PEP_er'] = Srets['PEP'] - Srets.rfd

data['USB_er'] = Srets['USB'] - Srets.rfd

data['AA_er'] = Srets['AA'] - Srets.rfd

data['INTU_er'] = Srets['INTU'] - Srets.rfd





data['M_er'] = Irets['^W5000'] - Irets.rfd

data['SMB'] = Irets['^RUT'] - Irets['^GSPC']

data['HML'] = Irets['VTV'] - Irets['VUG']

data = data.dropna()





In [None]:
data.head()

Unnamed: 0_level_0,OTTR_er,PEP_er,USB_er,AA_er,INTU_er,M_er,SMB,HML
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2012-01-04,-0.001628,0.00483,-0.00064,0.023278,-0.003332,-0.000961,-0.006857,-0.001782
2012-01-05,0.00242,-0.0081,0.014484,-0.009847,0.004301,0.003602,0.003742,6e-05
2012-01-06,-0.007488,-0.012891,-0.008532,-0.021877,0.010701,-0.002558,-0.000895,-0.002996
2012-01-09,-0.002128,0.004869,0.017186,0.028732,0.000811,0.002136,0.002797,0.002866
2012-01-10,0.000588,-0.001383,0.000391,0.000742,0.028778,0.009311,0.00596,0.000853


Now we can do our regressions and see what we get.

In [None]:
OTTRfit =  sm.OLS(data.OTTR_er,data.loc[:,['M_er',"SMB", 'HML'] ] ).fit()

OTTRfit.summary()

0,1,2,3
Dep. Variable:,OTTR_er,R-squared (uncentered):,0.445
Model:,OLS,Adj. R-squared (uncentered):,0.444
Method:,Least Squares,F-statistic:,667.4
Date:,"Thu, 07 Apr 2022",Prob (F-statistic):,1.12e-318
Time:,22:40:22,Log-Likelihood:,7348.6
No. Observations:,2504,AIC:,-14690.0
Df Residuals:,2501,BIC:,-14670.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
M_er,0.9569,0.024,39.271,0.000,0.909,1.005
SMB,0.3060,0.042,7.350,0.000,0.224,0.388
HML,0.5601,0.044,12.682,0.000,0.473,0.647

0,1,2,3
Omnibus:,631.765,Durbin-Watson:,2.195
Prob(Omnibus):,0.0,Jarque-Bera (JB):,43641.9
Skew:,-0.031,Prob(JB):,0.0
Kurtosis:,23.452,Cond. No.,2.12


Let's compare with CAPM version (over the same time-period)

In [None]:
OTTRfitCAPM =  sm.OLS(data.OTTR_er,data.loc[:,['M_er'] ] ).fit()

OTTRfitCAPM.summary()

0,1,2,3
Dep. Variable:,OTTR_er,R-squared (uncentered):,0.387
Model:,OLS,Adj. R-squared (uncentered):,0.387
Method:,Least Squares,F-statistic:,1581.0
Date:,"Thu, 07 Apr 2022",Prob (F-statistic):,2.19e-268
Time:,22:40:22,Log-Likelihood:,7225.2
No. Observations:,2504,AIC:,-14450.0
Df Residuals:,2503,BIC:,-14440.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
M_er,0.9604,0.024,39.759,0.000,0.913,1.008

0,1,2,3
Omnibus:,611.589,Durbin-Watson:,2.198
Prob(Omnibus):,0.0,Jarque-Bera (JB):,37326.823
Skew:,-0.03,Prob(JB):,0.0
Kurtosis:,21.915,Cond. No.,1.0


#### Pepsi

In [None]:
Pepsifit =  sm.OLS(data.PEP_er,data.loc[:,['M_er',"SMB", 'HML'] ] ).fit()

PepsifitCAPM =  sm.OLS(data.PEP_er,data.loc[:,['M_er'] ] ).fit()

print(Pepsifit.summary())

print(PepsifitCAPM.summary())


                                 OLS Regression Results                                
Dep. Variable:                 PEP_er   R-squared (uncentered):                   0.534
Model:                            OLS   Adj. R-squared (uncentered):              0.533
Method:                 Least Squares   F-statistic:                              954.4
Date:                Thu, 07 Apr 2022   Prob (F-statistic):                        0.00
Time:                        22:40:22   Log-Likelihood:                          8534.5
No. Observations:                2504   AIC:                                 -1.706e+04
Df Residuals:                    2501   BIC:                                 -1.705e+04
Df Model:                           3                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

#### US Bank

In [None]:
USBfit =  sm.OLS(data.USB_er,data.loc[:,['M_er',"SMB", 'HML'] ] ).fit()

USBfitCAPM =  sm.OLS(data.USB_er,data.loc[:,['M_er'] ] ).fit()

print(USBfit.summary())

print(USBfitCAPM.summary())

                                 OLS Regression Results                                
Dep. Variable:                 USB_er   R-squared (uncentered):                   0.717
Model:                            OLS   Adj. R-squared (uncentered):              0.717
Method:                 Least Squares   F-statistic:                              2117.
Date:                Thu, 07 Apr 2022   Prob (F-statistic):                        0.00
Time:                        22:40:22   Log-Likelihood:                          8297.8
No. Observations:                2504   AIC:                                 -1.659e+04
Df Residuals:                    2501   BIC:                                 -1.657e+04
Df Model:                           3                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

Alcoa

In [None]:
AAfit =  sm.OLS(data.AA_er,data.loc[:,['M_er',"SMB", 'HML'] ] ).fit()

AAfitCAPM =  sm.OLS(data.AA_er,data.loc[:,['M_er'] ] ).fit()

print(AAfit.summary())

print(AAfitCAPM.summary())

                                 OLS Regression Results                                
Dep. Variable:                  AA_er   R-squared (uncentered):                   0.393
Model:                            OLS   Adj. R-squared (uncentered):              0.392
Method:                 Least Squares   F-statistic:                              538.9
Date:                Thu, 07 Apr 2022   Prob (F-statistic):                   3.94e-270
Time:                        22:40:22   Log-Likelihood:                          5881.0
No. Observations:                2504   AIC:                                 -1.176e+04
Df Residuals:                    2501   BIC:                                 -1.174e+04
Df Model:                           3                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

#### Intuit

In [None]:
INTUfit =  sm.OLS(data.INTU_er,data.loc[:,['M_er',"SMB", 'HML'] ] ).fit()

INTUfitCAPM =  sm.OLS(data.INTU_er,data.loc[:,['M_er'] ] ).fit()

print(INTUfit.summary())

print(INTUfitCAPM.summary())

                                 OLS Regression Results                                
Dep. Variable:                INTU_er   R-squared (uncentered):                   0.592
Model:                            OLS   Adj. R-squared (uncentered):              0.592
Method:                 Least Squares   F-statistic:                              1211.
Date:                Thu, 07 Apr 2022   Prob (F-statistic):                        0.00
Time:                        22:40:22   Log-Likelihood:                          7762.2
No. Observations:                2504   AIC:                                 -1.552e+04
Df Residuals:                    2501   BIC:                                 -1.550e+04
Df Model:                           3                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

## PCA discussion


Previously, when discussing PCA, I indicated that you should standardize data before performing PCA.   That is a safe thing to do.  But it isn't always necessary, it really depend on whether the scales are disproportional.  If they are not, then standardizing (potentially) can make interpretion more difficult.

But let's construct an (fake data) example to illustrate when standardizing might make sense.


### Height, Weight, and Salary


We imagine a data set where we have tracked the heights, weights and salaries of $1000$ recent college graduates.  We can imagine that the following marginal distrinutions for each:

Height (measured in inches):  $\mu_H = 67, \sigma_H = 3$

Weight (measured in LBS):  $\mu_W = 170, \sigma_W = 30$

Salary (measured in Dollars):  $\mu_S = 60000, \sigma_S = 10000$


But, such data will be correlated.  So we imagine the folling correlation matrix:


$$Corr = \begin{pmatrix}1& 0.7 & 0.4\\ 0.7& 1 & -0.3 \\ 0.4& -0.3& 1 \\ \end{pmatrix}$$ 


Now, let generate this data.

In [None]:
mean = np.array([67,170, 60000])

vols = np.array([3,30,10000])

corr = np.array([[1,.7,.4],[.7,1,-.3], [.4,-.3,1]])

cov = np.diag(vols)@corr@np.diag(vols)

X = np.random.multivariate_normal(mean = mean, cov =cov, size = 1000)

X.shape

(1000, 3)

PCA is performed on the Covariance matrix of the data, let's call it $C$

In [None]:
C = np.cov(X.T)

C

array([[ 9.20560269e+00,  6.56796896e+01,  1.19385898e+04],
       [ 6.56796896e+01,  9.14648910e+02, -8.35582470e+04],
       [ 1.19385898e+04, -8.35582470e+04,  9.67443050e+07]])

In [None]:
# Get eigenvalue and vectors

evals, evects = np.linalg.eig(C)



We sort the eigenvalues by size to see where the "variance" comes from

In [None]:
evals_sorted = evals[np.argsort(-evals)]

evect_sorted = evects[:,np.argsort(-evals)]


evals_sorted.cumsum()/evals_sorted.sum()

array([0.99999121, 0.99999999, 1.        ])

Now take a look at the first eigenvector.   The conclusion is that only salary matters.  

In [None]:
evect_sorted[:,0]


array([ 1.23402826e-04, -8.63709068e-04,  9.99999619e-01])

But suppose we have first standardized the data

In [None]:
Xstand = (X - X.mean( axis = 0))/X.std(axis=0)

Cstand = np.cov(Xstand.T)
evals, evects = np.linalg.eig(Cstand)


evals_sorted = evals[np.argsort(-evals)]

evect_sorted = evects[:,np.argsort(-evals)]


evals_sorted.cumsum()/evals_sorted.sum()



array([0.57609348, 0.98622915, 1.        ])

Now we see perhaps 2 important vectors.  What do they look like?

In [None]:
evect_sorted[:,0]

array([-0.73317971, -0.66404074, -0.14661998])

This is very different, and all three seem to matter.

Let's look at the second vector.

In [None]:
evect_sorted[:,1]

array([ 0.20882458, -0.42503819,  0.8807581 ])

comments?

#### Similar orders of magnitude

Standardizing forces things to measured in similar orders of magnitude.  But sometimes, you don't have to exactly standardize if the orders of magnitude are comparable.  Let's take our example but change measurements:

inches to inches
lbs to kilograms
dollars to kilodollars


In [None]:
Y = X* np.array([1,1/2.2,1/1000])

Cy = np.cov(Y.T)

evals, evects = np.linalg.eig(Cy)


evals_sorted = evals[np.argsort(-evals)]

evect_sorted = evects[:,np.argsort(-evals)]


evals_sorted.cumsum()/evals_sorted.sum()



array([0.69715718, 0.99724508, 1.        ])

In [None]:
evect_sorted

array([[-0.12391435,  0.24464588,  0.96166191],
       [-0.94105931,  0.27840677, -0.19208603],
       [ 0.31472624,  0.92878311, -0.19572769]])

This is a different result. Neither is right or wrong.  In particular, we do know in advance wethere the response variable is tied to any or all of them.




### Fama French  PCA angle

Suppose that instead of choosing in advance the differences we were going to look at, we just want to evaluate which of the market indices we wanted to use togther and in what way.    We would intuitively wish to do so because we know in advance the martker index returns will be highly correlated.

In [None]:
# recall

Irets.head()

Unnamed: 0_level_0,VTV,VUG,^GSPC,^RUT,^W5000,rfd
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2012-01-03,,,,,,0.000278
2012-01-04,-0.000188,0.001595,0.000188,-0.006669,-0.000683,0.000278
2012-01-05,0.003559,0.003499,0.002939,0.006682,0.00388,0.000278
2012-01-06,-0.002996,0.0,-0.00254,-0.003435,-0.00228,0.000278
2012-01-09,0.003183,0.000318,0.002259,0.005056,0.002453,0.000317


So we drop the first row and perform PCA

In [None]:
X = np.array(Irets.dropna())

C = np.cov(X.T)

C.shape

(6, 6)

In [None]:
evals, evects = np.linalg.eig(C)

evals_sorted = evals[np.argsort(-evals)]

evect_sorted = evects[:,np.argsort(-evals)]


evals_sorted.cumsum()/evals_sorted.sum()


array([0.91153659, 0.95860811, 0.98363612, 0.99846613, 0.99966202,
       1.        ])

Consistent with Fama-French, the PCA suggest we only need three eigenvectors.  Let's see what they look like.

In [None]:
# First one and apparently most important
evect_sorted[:,0]

array([ 0.40688018,  0.44090729,  0.42469571,  0.52102523,  0.43382727,
       -0.00307417])

Roughly an equal weighting of all of the indices.

In [None]:
# Second one

evect_sorted[:,1]

array([-0.01302425, -0.51643598, -0.27167669,  0.7970397 , -0.15431243,
       -0.01521245])

This is roughly the difference between Russell2000 and (mostly) a combination of growth and large cap.

In [None]:
# Third one

evect_sorted[:,2]


array([ 0.78715901, -0.53051387,  0.13611315, -0.2832153 ,  0.00771621,
       -0.0117681 ])

Roughly the difference between Value and (mostly) a combination of Growth and Russell2000

## Cross-Sectional Factor Models

So far, we have focused on so-called *time-series* factor models.


$$R_{j,t} = \beta_{0,t} + \beta_{1,j}F_{1,j} + \beta_{2,t}F_{2,j} + \cdots + \beta_{p,t}F_{p,t} + \epsilon_{j,t}$$


We are assuming that 

* The loadings are unique to each stock and time-invariant
* The factors are observable and not linked to particular stocks



**We regress for each stock. To get more data for fitting, you need more history.**


There is another class of factor models that, instead of looking at time-series, look at an slice of time, accross many stocks.   These models can be written:



$$R_{j,t} = \beta_{0,t} + \beta_{1,t}F_{1,j,t} + \beta_{2,t}F_{2,j,t} + \cdots + \beta_{p,t}F_{p,j,t} + \epsilon_{j,t}$$


What we see here is that

* The loadings are unique to the time slice, not any particular stock
* The factors are observable and linked to a particular stock


**We regress for each time-slice.  To get more data for fitting, you need more stocks.**


As an example, one might model the returns of a stock in a given time-period as a linear combination of the stocks

Price to earnings ratio (P/E), Book to market cap (B/M), size (Large, mid, small), and sector (financial, consumer, untilities, etc).

So for 2019, the fitted model might be:

$R_{j} = .01 + .02 PE - .01 BM + .001L -.002M - .004small + .02 finance + .011 consumer + \cdots$t 


The fitting process is still typically least sqaures.  And of course, one might use PCA to reduce the number of features (explantory variables).



## Statistical Factor Models


We observed earlier that, for a factor model, we have.

$$\mathrm{Cov}(R) = B^T \Sigma_{F} B + \Sigma_{\epsilon}$$


We also noted that a factor model can be useful for estimating $\mathrm{Cov}(R)$ since otherwise we are left attempting to estimate a large number of parameters.

This is great if

1) We can oberve the variables $F$

and

2) The model corresponds relatively well with reality.


But getting both of those things to be true at once can be tricky.  Oftern, it is easy enough to specify a model, but we might find via diagnostics that the residuals are correleted with the factors - a sign that we have missed soomething.


But what if the model is true in some sense, but we just don't know how to specidy the factors?  For example, maybe we only have the return series for $n$ stocks.  We might perform PCA and conclude that there are really only 3-5 meaningful drivers (factors) of stock returns, but we might become frustrated in indendifying them with some easily oberved.

Of course, we could *define* the factors to be the eigenvector combinations we learned from the PCA.  But the process for doing so required first identifying the correlation matrix from the data.  And we alredy noted that this might be problematic.


What can we do?

### Assume there is a factor model with $p$ factors .

We can't observe the factors, so we can treat them as latent variables.  This is similar to mixture models when we cannot observe the mixing variable (e.g. the coin flip for a mixture of two Gaussians). 

Starting with the model, 

$$ R^T = \begin{bmatrix}\mathbf{1}_c & F^T \\ \end{bmatrix} \begin{bmatrix}B_0 \\ B\\ \end{bmatrix}+E $$

we have 

$$\mathrm{Cov}(R) = B^T \Sigma_{F} B + \Sigma_{\epsilon}$$

Let's make some important observations that can simply things.


#### (invertable) Linear Transformation of $F$ don't change the model

Replace $F$ with $AF$ and $B$ with $A^{-T}B$ and we have the exact same model.

This means in particular that $\Sigma_{F}$ only needs to be specified up to a congruence transformation ($\Sigma_{F} \to A^T\Sigma_{F}A$)


But covariance matrices (assuming they are full-rank) can always be transformed to the Identy matrix via a congurence transformation.  Hence we can equally well specify that


$$\mathrm{Cov}(R) = B^T B + \Sigma_{\epsilon}$$

We are still assuming our model, but the transformed factors have a covariance matrix equal to the identidy matrix.

And we still cannot observe the values of $F$ and we still don't have estimated for $B$.

Having said that, $B$ is $p\times n$ and we are assuming that $p << n$, so if we *assume* the covariance matrix satisfies the form above, it should be easier to fit than to directly estimate $\mathrm{Cov}(R)$.

How can we do this?


There are a few ways in the literature, but I will mention one that we have seen before:  The EM algorithm.

The EM algorithm starts with as estimate of the parameters $B$ ($np$ of them) and $\Sigma_{\epsilon}$($p$ of them).

It then calculates the distribution of *unobserved* values $F$ given the return data and the asssumed parameter values.

Using this distribution, it then calculates the expected log-liklihood of the observed data and a set of parameterss $(B, \Sigma_{\epsilon})$

That is the $E$ step.

The $M$ step is to find the parameters $(B, \Sigma_{\epsilon})$  maximize this value.  These are the next estimate for the paramaters.

Iterate until convergence.


**Comment**:  Of course, one could attempt to directly estimate via MLE.  But in practice, this is much harder.










