## MACHINE LEARNING: LINEAR MIXED EFFECTS

Linear Mixed Effects models are used for regression analyses involving dependent data. Such data arise when working with longitudinal and other study designs in which multiple observations are made on each subject. Some specific linear mixed effects models are

- Random intercepts models, where all responses in a group are additively shifted by a value that is specific to the group.

- Random slopes models, where the responses in a group follow a (conditional) mean trajectory that is linear in the observed covariates, with the slopes (and possibly intercepts) varying by group.

- Variance components models, where the levels of one or more categorical covariates are associated with draws from distributions. These random terms additively determine the conditional mean of each observation based on its covariate values.

The statsmodels implementation of LME is primarily group-based, meaning that random effects must be independently-realized for responses in different groups. There are two types of random effects in our implementation of mixed models: (i) random coefficients (possibly vectors) that have an unknown covariance matrix, and (ii) random coefficients that are independent draws from a common univariate distribution. For both (i) and (ii), the random effects influence the conditional mean of a group through their matrix/vector product with a group-specific design matrix.

A simple example of random effects, as in (i), is
$$Y_{ij} = \beta_0 + \beta_1 X_{ij} + \gamma_{0i} + \gamma_{1i} X_{ij} + \epsilon_{ij}$$

Here $Y$ is the response and $X$ is the covariate matrix. The "fixed effects" $\beta_0$ and $\beta_1$ are shared by all subjects. The "random effects" $\gamma_{0i}$ and $\gamma_{1i}$ follow a bivariate distribution with mean zero, describe three parameters: $\text{var}(\gamma_{0i})$, $\text{var}(\gamma_{1i})$, $\text{cov}(\gamma_{0i}, \gamma_{2i})$.

A simple example of variance components, as in (ii), is
$$Y_{ijk} = \beta_0 + \eta_{1i} + \eta_{2j} + \epsilon_{ijk}$$

For more info, please refer to [statsmodel website](https://www.statsmodels.org/dev/mixed_linear.html).

In [2]:
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [6]:
data = sm.datasets.get_rdataset("dietox", "geepack").data
data.head(2)

Unnamed: 0,Weight,Feed,Time,Pig,Evit,Cu,Litter
0,26.5,,1,4601,1,1,1
1,27.59999,5.200005,2,4601,1,1,1


In [4]:
md = smf.mixedlm("Weight ~ Time", data, groups = data['Pig'])
mdf = md.fit()

In [5]:
print(mdf.summary())

         Mixed Linear Model Regression Results
Model:            MixedLM Dependent Variable: Weight    
No. Observations: 861     Method:             REML      
No. Groups:       72      Scale:              11.3669   
Min. group size:  11      Likelihood:         -2404.7753
Max. group size:  12      Converged:          Yes       
Mean group size:  12.0                                  
--------------------------------------------------------
             Coef.  Std.Err.    z    P>|z| [0.025 0.975]
--------------------------------------------------------
Intercept    15.724    0.788  19.952 0.000 14.179 17.268
Time          6.943    0.033 207.939 0.000  6.877  7.008
Group Var    40.394    2.149                            

