Confirmatory factor analysis
In statistics, confirmatory factor analysis is a special form of factor analysis, most commonly used in social science research. It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct. As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske.
In confirmatory factor analysis, the researcher first develops a hypothesis about what factors they believe are underlying the measures used and may impose constraints on the model based on these a priori hypotheses. By imposing these constraints, the researcher is forcing the model to be consistent with their theory. For example, if it is posited that there are two factors accounting for the covariance in the measures, and that these factors are unrelated to each other, the researcher can create a model where the correlation between factor A and factor B is constrained to zero. Model fit measures could then be obtained to assess how well the proposed model captured the covariance between all the items or measures in the model. If the constraints the researcher has imposed on the model are inconsistent with the sample data, then the results of statistical tests of model fit will indicate a poor fit, and the model will be rejected. If the fit is poor, it may be due to some items measuring multiple factors. It might also be that some items within a factor are more related to each other than others.
For some applications, the requirement of "zero loadings" has been regarded as too strict. A newly developed analysis method, "exploratory structural equation modeling", specifies hypotheses about the relation between observed indicators and their supposed primary latent factors while allowing for estimation of loadings with other latent factors as well.
Statistical model
In confirmatory factor analysis, researchers are typically interested in studying the degree to which responses on a p x 1 vector of observable random variables can be used to assign a value to one or more unobserved variable . The investigation is largely accomplished by estimating and evaluating the loading of each item used to tap aspects of the unobserved latent variable. That is, y is the vector of observed responses predicted by the unobserved latent variable , which is defined as:where is the p x 1 vector of observed random variables, are the unobserved latent variables and is a p x k matrix with k equal to the number of latent variables. Since, are imperfect measures of, the model also consists of error,. Estimates in the maximum likelihood case generated by iteratively minimizing the fit function,
where is the variance-covariance matrix implied by the proposed factor analysis model and is the observed variance-covariance matrix. That is, values are found for free model parameters that minimize the difference between the model-implied variance-covariance matrix and observed variance-covariance matrix.
Alternative estimation strategies
Although numerous algorithms have been used to estimate CFA models, maximum likelihood remains the primary estimation procedure. That being said, CFA models are often applied to data conditions that deviate from the normal theory requirements for valid ML estimation. For example, social scientists often estimate CFA models with non-normal data and indicators scaled using discrete ordered categories. Accordingly, alternative algorithms have been developed that attend to the diverse data conditions applied researchers encounter. The alternative estimators have been characterized into two general type: robust and limited information estimator.When ML is implemented with data that deviates away from the assumptions of normal theory, CFA models may produce biased parameter estimates and misleading conclusions. Robust estimation typically attempts to correct the problem by adjusting the normal theory model χ2 and standard errors. For example, Satorra and Bentler recommended using ML estimation in the usual way and subsequently dividing the model χ2 by a measure of the degree of multivariate kurtosis. An added advantage of robust ML estimators is their availability in common SEM software.
Unfortunately, robust ML estimators can become untenable under common data conditions. In particular, when indicators are scaled using few response categories robust ML estimators tend to perform poorly. Limited information estimators, such as weighted least squares, are likely a better choice when manifest indicators take on an ordinal form. Broadly, limited information estimators attend to the ordinal indicators by using polychoric correlations to fit CFA models. Polychoric correlations capture the covariance between two latent variables when only their categorized form is observed, which is achieved largely through the estimation of threshold parameters.
Exploratory factor analysis
Both exploratory factor analysis and confirmatory factor analysis are employed to understand shared variance of measured variables that is believed to be attributable to a factor or latent construct. Despite this similarity, however, EFA and CFA are conceptually and statistically distinct analyses.The goal of EFA is to identify factors based on data and to maximize the amount of variance explained. The researcher is not required to have any specific hypotheses about how many factors will emerge, and what items or variables these factors will comprise. If these hypotheses exist, they are not incorporated into and do not affect the results of the statistical analyses. By contrast, CFA evaluates a priori hypotheses and is largely driven by theory. CFA analyses require the researcher to hypothesize, in advance, the number of factors, whether or not these factors are correlated, and which items/measures load onto and reflect which factors. As such, in contrast to exploratory factor analysis, where all loadings are free to vary, CFA allows for the explicit constraint of certain loadings to be zero.
EFA is often considered to be more appropriate than CFA in the early stages of scale development because CFA does not show how well the items load on the non-hypothesized factors. Another strong argument for the initial use of EFA is that the misspecification of the number of factors at an early stage of scale development will typically not be detected by confirmatory factor analysis. At later stages of scale development, confirmatory techniques may provide more information by the explicit contrast of competing factor structures.
EFA is sometimes reported in research when CFA would be a better statistical approach. It has been argued that CFA can be restrictive and inappropriate when used in an exploratory fashion. However, the idea that CFA is solely a “confirmatory” analysis may sometimes be misleading, as modification indices used in CFA are somewhat exploratory in nature. Modification indices show the improvement in model fit if a particular coefficient were to become unconstrained. Likewise, EFA and CFA do not have to be mutually exclusive analyses; EFA has been argued to be a reasonable follow up to a poor-fitting CFA model.