Chi-square automatic interaction detection


Chi-square automatic interaction detection is a decision tree technique based on adjusted significance testing.

History

CHAID is based on a formal extension of AID and THAID procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed by Belson in the UK in the 1950s.
In 1975, the CHAID technique itself was developed in South Africa. It was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on the topic.
A history of earlier supervised tree methods can be found in Ritschard, including a detailed description of the original CHAID algorithm and the exhaustive CHAID extension by Biggs, De Ville, and Suen.
CHAID was used as the data mining technique. It is a technique based on multiway splitting to create discrete groups and understand their impact on the dependent variable. CHAID was preferred for analysis because of five major criteria:
1. A good proportion of input data was categorical;
2. Its efficiency in large datasets;
3. Its highly visual and ease of interpretation;
4. Ease of implementation/integration of business rules generated from CHAID in business; and
5. Input data quality can be handled efficiently

Properties

CHAID can be used for prediction as well as classification, and for detection of interaction between variables.
In practice, CHAID is often used in the context of direct marketing to select groups of consumers to predict how their responses to some variables affect other variables, although other early applications were in the fields of medical and psychiatric research.
Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.
One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric.

External lkinks

  • Luchman, J.N.; CHAID: Stata module to conduct chi-square automated interaction detection, Available for free , or type within Stata: ssc install chaid.
  • Luchman, J.N.; CHAIDFOREST: Stata module to conduct random forest ensemble classification based on chi-square automated interaction detection as base learner, Available for free , or type within Stata: ssc install chaidforest.
  • grows exhaustive CHAID trees as well as a few other types of trees such as CART.
  • An R package is available on R-Forge.
Category:Market research
Category:Market segmentation
Category:Statistical algorithms
Category:Statistical classification
Category:Decision trees
Category:Classification algorithms