Foundations of statistics
The Foundations of Statistics are the mathematical and philosophical bases for statistical methods. These bases are the theoretical frameworks that ground and justify methods of statistical inference, estimation, hypothesis testing, uncertainty quantification, and the interpretation of statistical conclusions. Further, a foundation can be used to explain statistical paradoxes, provide descriptions of statistical laws, and guide the application of statistics to real-world problems.
Different statistical foundations may provide different, contrasting perspectives on the analysis and interpretation of data, and some of these contrasts have been subject to centuries of debate. Examples include the Bayesian inference versus frequentist inference; the distinction between Fisher's significance testing and the Neyman-Pearson hypothesis testing; and whether the likelihood principle holds.
Certain frameworks may be preferred for specific applications, such as the use of Bayesian methods in fitting complex ecological models.
Bandyopadhyay & Forster identify four statistical paradigms: classical statistics, Bayesian statistics, likelihood-based statistics, and information-based statistics using the Akaike Information Criterion. More recently, Judea Pearl reintroduced formal mathematics by attributing causality in statistical systems that addressed the fundamental limitations of both Bayesian and Neyman-Pearson methods, as discussed in his book Causality.
Fisher's "significance testing" vs. Neyman–Pearson "hypothesis testing"
During the 20th century, the development of classical statistics led to the emergence of two competing foundations for inductive statistical testing. The merits of these models were extensively debated. Although a hybrid approach combining elements of both methods is commonly taught and utilized, the philosophical questions raised during the debate still remain unresolved.Significance testing
Publications by Fisher, like "Statistical Methods for Research Workers" in 1925 and "The Design of Experiments" in 1935, contributed to the popularity of significance testing, which is a probabilistic approach to deductive inference. In practice, a statistic is computed based on the experimental data and the probability of obtaining a value greater than that statistic under a default or "null" model is compared to a predetermined threshold. This threshold represents the level of discord required. One common application of this method is to determine whether a treatment has a noticeable effect based on a comparative experiment. In this case, the null hypothesis corresponds to the absence of a treatment effect, implying that the treated group and the control group are drawn from the same population. Statistical significance measures probability and does not address practical significance. It can be viewed as a criterion for the statistical signal-to-noise ratio. It is important to note that the test cannot prove the hypothesis, but it can provide evidence against it.The Fisher significance test involves a single hypothesis, but the choice of the test statistic requires an understanding of relevant directions of deviation from the hypothesized model.
Hypothesis testing
and Pearson collaborated on the problem of selecting the most appropriate hypothesis based solely on experimental evidence, which differed from significance testing. Their most renowned joint paper, published in 1933, introduced the Neyman-Pearson lemma, which states that a ratio of probabilities serves as an effective criterion for hypothesis selection. The paper demonstrated the optimality of the Student's t-test, one of the significance tests. Neyman believed that hypothesis testing represented a generalization and improvement of significance testing. The rationale for their methods can be found in their collaborative papers.Hypothesis testing involves considering multiple hypotheses and selecting one among them, akin to making a multiple-choice decision. The absence of evidence is not an immediate factor to be taken into account. The method is grounded in the assumption of repeated sampling from the same population, although Fisher criticized this assumption.
Grounds of disagreement
The duration of the dispute allowed for a comprehensive discussion of various fundamental issues in the field of statistics.An example exchange from 1955–1956
Fisher's attack
Repeated sampling of the same population- Such sampling is the basis of frequentist probability
- Fisher preferred fiducial inference
- Which result from an alternative hypothesis
Neyman's rebuttal
Fisher's theory of fiduciary inference is flawed
- Paradoxes are common
Discussion
Fisher's attack based on frequentist probability failed but was not without result. He identified a specific case where the two schools of testing reached different results. This case is one of several that are still troubling. Commentators believe that the "right" answer is context-dependent. Fiducial probability has not fared well, being virtually without advocates, while frequentist probability remains a mainstream interpretation.Fisher
During this exchange, Fisher also discussed the requirements for inductive inference, specifically criticizing cost functions that penalize erroneous judgments. Neyman countered by mentioning the use of such functions by Gauss and Laplace. These arguments occurred 15 years after textbooks began teaching a hybrid theory of statistical testing.
Fisher and Neyman held different perspectives on the foundations of statistics :
- The interpretation of probability
- * The disagreement between Fisher's inductive reasoning and Neyman's inductive behavior reflected the Bayesian-Frequentist divide. Fisher was willing to revise his opinion based on calculated probability, while Neyman was more inclined to adjust his observable behavior based on computed costs.
- The appropriate formulation of scientific questions, with a particular focus on modelling
- Whether it is justifiable to reject a hypothesis based on a low probability without knowing the probability of an alternative
- Whether a hypothesis could ever be accepted based solely on data
- * In mathematics, deduction proves, while counter-examples disprove.
- * In the Popperian philosophy of science, progress is made when theories are disproven.
- Subjectivity: Although Fisher and Neyman endeavored to minimize subjectivity, they both acknowledged the significance of "good judgment." Each accused the other of subjectivity.
- * Fisher subjectively selected the null hypothesis.
- * Neyman-Pearson subjectively determined the criterion for selection.
- * Both subjectively established numeric thresholds.
Related history
In 1938, Neyman relocated to the West Coast of the United States of America, effectively ending his collaboration with Pearson and their work on hypothesis testing. Subsequent developments in the field were carried out by other researchers.By 1940, textbooks began presenting a hybrid approach that combined elements of significance testing and hypothesis testing. However, none of the main contributors were directly involved in the further development of the hybrid approach currently taught in introductory statistics.
Statistics subsequently branched out into various directions, including decision theory, Bayesian statistics, exploratory data analysis, robust statistics, and non-parametric statistics. Neyman-Pearson hypothesis testing made significant contributions to decision theory, which is widely employed, particularly in statistical quality control. Hypothesis testing also extended its applicability to incorporate prior probabilities, giving it a Bayesian character. While Neyman-Pearson hypothesis testing has evolved into an abstract mathematical subject taught at the post-graduate level, much of what is taught and used in undergraduate education under the umbrella of hypothesis testing can be attributed to Fisher.
Contemporary opinion
There have been no major conflicts between the two classical schools of testing in recent decades, although occasional criticism and disputes persist. However, it is highly unlikely that one theory of statistical testing will completely supplant the other in the foreseeable future.The hybrid approach, which combines elements from both competing schools of testing, can be interpreted in different ways. Some view it as an amalgamation of two mathematically complementary ideas, while others see it as a flawed union of philosophically incompatible concepts. Fisher's approach had certain philosophical advantages, while Neyman and Pearson emphasized rigorous mathematics. Hypothesis testing remains a subject of controversy for some users, but the most widely accepted alternative method, confidence intervals, is based on the same mathematical principles.
Due to the historical development of testing, there is no single authoritative source that fully encompasses the hybrid theory as it is commonly practiced in statistics. Additionally, the terminology used in this context may lack consistency. Empirical evidence indicates that individuals, including students and instructors in introductory statistics courses, often have a limited understanding of the meaning of hypothesis testing.