Statsmodels linear mixed models crossed effects 
------------------------------------------------------

The Statsmodels linear mixed model class ``MixedLM`` is designed for grouped data.  But it is also possible to fit models with crossed effects by treating the data as belonging to a single group.

In [0]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import numpy as np

We will illustrate fitting a linear mixed model with crossed effects using simulated data.  Imagine a study of student test scores in which the goal is to separate teacher effects from student effects.  Each student takes classes from multiple teachers, and each teacher is assigned to teach classes with multiple students.  There is no nesting of teachers within students or of students within teachers, so there is no way to split the data into groups such that observations in different groups are independent.

The code in the following cell simulates the assignments of students to teachers.  Each student takes the same number of classes, and the assignment of teachers to students is random.

In [0]:
n_students = 200
n_courses = 5
n_teachers = 50
n_obs = n_students * n_courses
teachers = np.kron(np.arange(n_teachers), np.ones(n_obs / n_teachers)).astype(np.int32)
ii = np.random.permutation(len(teachers))
teachers = teachers[ii]
students = np.kron(np.arange(n_students), np.ones(n_courses)).astype(np.int32)
df = pd.DataFrame(index=range(n_obs))
df["teachers"] = teachers
df["students"] = students
df["groups"] = 1

Next we simulate the teacher and student effects.

In [0]:
teacher_effects = np.random.normal(size=n_teachers)
teacher_effects = teacher_effects[teachers]
student_effects = np.random.normal(size=n_students)
student_effects = student_effects[students]

Finally we simulate the outcomes.  The key characteristic of this type of model is that teachers are exchangeable, the students are exchangeable, and the teacher effects and student effects are additive.  We set things up so that the independent errors contribute the most variance (9 units), followed by the student effects (4 units), followed by the teacher effects (1 unit).

In [0]:
y = teacher_effects + 2*student_effects + 3*np.random.normal(size=n_obs)
df["y"] = y

Now we can fit the model and confirm that the variance parameters are accurately estimated.  Note that the error variance appears at the top of the table as the scale parameter.

In [0]:
vcf = {"teachers" : "0 + C(teachers)", "students" : "0 + C(students)"}
model = sm.MixedLM.from_formula("y ~ 1", groups="groups", vc_formula=vcf, data=df)
result = model.fit()
result.summary()