### [Sufficient Sample Sizes for Multilevel Modeling](http://www.joophox.net/publist/methodology05.pdf)

An important problem in multilevel modeling is what constitutes a sufficient sample size for accurate estimation. In
multilevel analysis, the major restriction is often the higher-level sample size. In this paper, a simulation study is used to determine
the influence of different sample sizes at the group level on the accuracy of the estimates (regression coefficients and variances)
and their standard errors. In addition, the influence of other factors, such as the lowest-level sample size and different variance
distributions between the levels (different intraclass correlations), is examined. The results show that only a small sample size
at level two (meaning a sample of 50 or less) leads to biased estimates of the second-level standard errors. In all of the other
simulated conditions the estimates of the regression coefficients, the variance components, and the standard errors are unbiased
and accurate.

 One
important class of analysis methods is the hierarchical
linear regression model, or multilevel regression model.


 Multilevel modeling mostly uses ML estimation. The maximum likelihood (ML) estimation methods
used commonly in multilevel analysis are asymptotic,
which translates to the assumption that the sample size
must be sufficiently large. This raises questions about
the acceptable lower limit to the sample size, and the
accuracy of the estimates and the associated standard
errors with relatively small sample sizes. In multilevel
studies, the main problem is usually the sample size at
the group level, because the group-level sample size is
always smaller than the individual-level sample size.     

 Two ML functions are
common in multilevel modeling: full ML (FML) and
restricted ML (RML). __The difference between FML and RML is
that RML maximizes a likelihood function that is invariant for the fixed effects (Goldstein, 1995). Since
RML takes the uncertainty in the fixed parameters into
account when estimating the random parameters, it
should in theory lead to better estimates of the variance components, especially when the number of groups is
small (Raudenbush & Bryk, 2002).__

The software MLwiN (or HLM) was used for both simulation and estimation.

##### The Multilevel Regression Model

Assume that we have data from $J$ groups, with a different number of respondents $n_j$ in each group. On the respondent level, we have the outcome of respondent $i$ in
group $j$, variable $Y_{ij}$. We have one explanatory variable
$X_{ij}$ on the respondent level, and one group-level explanatory variable $Z_j$. To model these data, we have a separate regression model in each group as follows:
$$Y_{ij}=\beta_{0j} +  \beta_{1j}X_{ij}+ e_{ij} $$
The variation of the regression coefficients $b_j$ is modeled
by a group-level regression model:
$$\beta_{0j}= \gamma_{00} +  \beta_{01}Z_{j}+ u_{0j} \\ \beta_{1j}= \gamma_{10} +  \beta_{11}Z_{j}+ u_{1j} $$
The individual-level residuals $e_{ij}$ are assumed to have a
normal distribution with mean zero and variance $\sigma^2_e$. 
The group-level residuals $u_{0j }$ and $u_{1j}$ are assumed to have
a multivariate normal distribution with expectation zero,
and to be independent from the residual errors $e_{ij}$. The
variance of the residual errors is specified as $\sigma^2_{u_0}$ and $\sigma^2_{u_1}$ 

Multilevel models are needed because grouped data violate the assumption of independence of all observations. The amount of dependence
can be expressed as the intraclass correlation (ICC):
$$\rho=\frac{\sigma^2_{u_0}}{\sigma^2_{u_0}+\sigma^2_{e}}$$



### [The Effect of Small Sample Size on Two-Level Model Estimates](https://link.springer.com/article/10.1007%2Fs10648-014-9287-x)