Cointegration
In econometrics, cointegration is a statistical property that describes a long-run equilibrium relationship among two or more time series variables, even if the individual series are non-stationary. In such cases, the variables may drift in the short run, but their linear combination is stationary, implying that they move together over time and remain bound by a stable equilibrium.
More formally, if several time series are individually integrated of order d but a linear combination of them is integrated of a lower order, then those time series are said to be cointegrated. That is, if are each integrated of order d, and there exist coefficients a,''b,c'' such that is integrated of order less than d, then X, Y, and Z are cointegrated.
Cointegration is a crucial concept in time series analysis, particularly when dealing with variables that exhibit trends, such as macroeconomic data. In an influential paper, Charles Nelson and Charles Plosser provided statistical evidence that many US macroeconomic time series have stochastic trends.
Introduction
If two or more series are individually integrated but some linear combination of them has a lower order of integration, then the series are said to be cointegrated. A common example is where the individual series are first-order integrated but some vector of coefficients exists to form a stationary linear combination of them.History
The first to introduce and analyse the concept of spurious—or nonsense—regression was Udny Yule in 1926.Before the 1980s, many economists used linear regressions on non-stationary time series data, which Nobel laureate Clive Granger and Paul Newbold showed to be a dangerous approach that could produce spurious correlation, since standard detrending techniques can result in data that are still non-stationary. Granger's 1987 paper with Robert Engle formalized the cointegrating vector approach, and coined the term.
For integrated processes, Granger and Newbold showed that de-trending does not work to eliminate the problem of spurious correlation, and that the superior alternative is to check for co-integration. Two series with trends can be co-integrated only if there is a genuine relationship between the two. Thus the standard current methodology for time series regressions is to check all-time series involved for integration. If there are series on both sides of the regression relationship, then it is possible for regressions to give misleading results.
The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots. The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run ordinary least squares regressions on data which had been differenced. This method is biased if the non-stationary variables are cointegrated.
For example, regressing the consumption series for any country against the GNP for a randomly selected dissimilar country might give a high R-squared relationship. This is called spurious regression: two integrated series which are not directly causally related may nonetheless show a significant correlation.
Tests
The six main methods for testing for cointegration are:Engle–Granger two-step method
If and both have order of integration d=1 and are cointegrated, then a linear combination of them must be stationary for some value of and . In other words:where is stationary.
If is known, we can test for stationarity with an Augmented Dickey–Fuller test or Phillips–Perron test. If is unknown, we must first estimate it. This is typically done by using ordinary least squares. Then, we can run an ADF test on. However, when is estimated, the critical values of this ADF test are non-standard, and increase in absolute value as more regressors are included.
If the variables are found to be cointegrated, a second-stage regression is conducted. This is a regression of on the lagged regressors, and the lagged residuals from the first stage,. The second stage regression is given as:
If the variables are not cointegrated, then and we estimate a differences model: