Multilevel model


Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models are also known as hierarchical linear models, linear mixed-effect models, mixed models, nested data models, random coefficient, random-effects models, random parameter models, or split-plot designs. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.
Multilevel models are particularly appropriate for research designs where data for participants are organized at more than one level. The units of analysis are usually individuals who are nested within contextual/aggregate units. While the lowest level of data in multilevel models is usually an individual, repeated measurements of individuals may also be examined. As such, multilevel models provide an alternative type of analysis for univariate or multivariate analysis of repeated measures. Individual differences in growth curves may be examined. Furthermore, multilevel models can be used as an alternative to ANCOVA, where scores on the dependent variable are adjusted for covariates before testing treatment differences. Multilevel models are able to analyze these experiments without the assumptions of homogeneity-of-regression slopes that is required by ANCOVA.
Multilevel models can be used on data with many levels, although 2-level models are the most common and the rest of this article deals only with these. The dependent variable must be examined at the lowest level of analysis.

Level 1 regression equation

When there is a single level 1 independent variable, the level 1 model is
  • refers to the score on the dependent variable for an individual observation at Level 1.
  • refers to the Level 1 predictor.
  • refers to the intercept of the dependent variable for group j.
  • refers to the slope for the relationship in group j between the Level 1 predictor and the dependent variable.
  • refers to the random errors of prediction for the Level 1 equation.
At Level 1, both the intercepts and slopes in the groups can be either fixed, non-randomly varying, or randomly varying.
When there are multiple level 1 independent variables, the model can be expanded by substituting vectors and matrices in the equation.
When the relationship between the response and predictor can not be described by the linear relationship, then one can find some non linear functional relationship between the response and predictor, and extend the model to nonlinear mixed-effects model. For example, when the response is the cumulative infection trajectory of the -th country, and represents the -th time points, then the ordered pair for each country may show a shape similar to logistic function.

Level 2 regression equation

The dependent variables are the intercepts and the slopes for the independent variables at Level 1 in the groups of Level 2.
  • refers to the overall intercept. This is the grand mean of the scores on the dependent variable across all the groups when all the predictors are equal to 0.
  • refers to the average slope between the dependent variable and the Level 1 predictor.
  • refers to the Level 2 predictor.
  • and refer to the effect of the Level 2 predictor on the Level 1 intercept and slope respectively.
  • refers to the deviation in group j from the overall intercept.
  • refers to the deviation in group j from the average slope between the dependent variable and the Level 1 predictor.

    Types of models

Before conducting a multilevel model analysis, a researcher must decide on several aspects, including which predictors are to be included in the analysis, if any. Second, the researcher must decide whether parameter values will be fixed or random. Fixed parameters are composed of a constant over all the groups, whereas a random parameter has a different value for each of the groups. Additionally, the researcher must decide whether to employ a maximum likelihood estimation or a restricted maximum likelihood estimation type.

Random intercepts model

A random intercepts model is a model in which intercepts are allowed to vary, and therefore, the scores on the dependent variable for each individual observation are predicted by the intercept that varies across groups. This model assumes that slopes are fixed. In addition, this model provides information about intraclass correlations, which are helpful in determining whether multilevel models are required in the first place.

Random slopes model

A random slopes model is a model in which slopes are allowed to vary according to a correlation matrix, and therefore, the slopes are different across grouping variable such as time or individuals. This model assumes that intercepts are fixed.

Random intercepts and slopes model

A model that includes both random intercepts and random slopes is likely the most realistic type of model, although it is also the most complex. In this model, both intercepts and slopes are allowed to vary across groups, meaning that they are different in different contexts.

Developing a multilevel model

In order to conduct a multilevel model analysis, one would start with fixed coefficients. One aspect would be allowed to vary at a time, and compared with the previous model in order to assess better model fit. There are three different questions that a researcher would ask in assessing a model. First, is it a good model? Second, is a more complex model better? Third, what contribution do individual predictors make to the model?
In order to assess models, different model fit statistics would be examined. One such statistic is the chi-square likelihood-ratio test, which assesses the difference between models. The likelihood-ratio test can be employed for model building in general, for examining what happens when effects in a model are allowed to vary, and when testing a dummy-coded categorical variable as a single effect. However, the test can only be used when models are nested. When testing non-nested models, comparisons between models can be made using the Akaike information criterion or the Bayesian information criterion, among others. See further Model selection.

Assumptions

Multilevel models have the same assumptions as other major general linear models, but some of the assumptions are modified for the hierarchical nature of the design.
;Linearity
The assumption of linearity states that there is a rectilinear relationship between variables. However, the model can be extended to nonlinear relationships. Particularly, when the mean part of the level 1 regression equation is replaced with a non-linear parametric function, then such a model framework is widely called the nonlinear mixed-effects model.
;Normality
The assumption of normality states that the error terms at every level of the model are normally distributed. However, most statistical software allows one to specify different distributions for the variance terms, such as a Poisson, binomial, logistic. The multilevel modelling approach can be used for all forms of Generalized Linear models.
;Homoscedasticity
The assumption of homoscedasticity, also known as homogeneity of variance, assumes equality of population variances. However, different variance-correlation matrix can be specified to account for this, and the heterogeneity of variance can itself be modeled.
;Independence of observations
Independence is an assumption of general linear models, which states that cases are random samples from the population and that scores on the dependent variable are independent of each other. One of the main purposes of multilevel models is to deal with cases where the assumption of independence is violated; multilevel models do, however, assume that 1) the level 1 and level 2 residuals are uncorrelated and 2) The errors at the highest level are uncorrelated.
;Orthogonality of regressors to random effects
The regressors must not correlate with the random effects,. This assumption is testable but often ignored, rendering the estimator inconsistent. If this assumption is violated, the random-effect must be modeled explicitly in the fixed part of the model, either by using dummy variables or including cluster means of all regressors. This assumption is probably the most important assumption the estimator makes, but one that is misunderstood by most applied researchers using these types of models.

Statistical tests

The type of statistical tests that are employed in multilevel models depend on whether one is examining fixed effects or variance components. When examining fixed effects, the tests are compared with the standard error of the fixed effect, which results in a Z-test. A t-test can also be computed. When computing a t-test, it is important to keep in mind the degrees of freedom, which will depend on the level of the predictor. For a level 1 predictor, the degrees of freedom are based on the number of level 1 predictors, the number of groups and the number of individual observations. For a level 2 predictor, the degrees of freedom are based on the number of level 2 predictors and the number of groups.

Statistical power

Statistical power for multilevel models differs depending on whether it is level 1 or level 2 effects that are being examined. Power for level 1 effects is dependent upon the number of individual observations, whereas the power for level 2 effects is dependent upon the number of groups. To conduct research with sufficient power, large sample sizes are required in multilevel models. However, the number of individual observations in groups is not as important as the number of groups in a study. In order to detect cross-level interactions, given that the group sizes are not too small, recommendations have been made that at least 20 groups are needed, although many fewer can be used if one is only interested in inference on the fixed effects and the random effects are control, or "nuisance", variables. The issue of statistical power in multilevel models is complicated by the fact that power varies as a function of effect size and intraclass correlations, it differs for fixed effects versus random effects, and it changes depending on the number of groups and the number of individual observations per group.