Hierarchy of evidence
A hierarchy of evidence, comprising levels of evidence, that is, evidence levels, is a heuristic used to rank the relative strength of results obtained from experimental research, especially medical research. There is broad agreement on the relative strength of large-scale, epidemiological studies. More than 80 different hierarchies have been proposed for assessing medical evidence. The design of the study and the endpoints measured affect the strength of the evidence. In clinical research, the best evidence for treatment efficacy is mainly from meta-analyses of randomized controlled trials and the least relevant evidence is expert opinion, including consensus of such. Systematic reviews of completed, high-quality randomized controlled trials – such as those published by the Cochrane Collaboration – rank the same as systematic review of completed high-quality observational studies in regard to the study of side effects. Evidence hierarchies are often applied in evidence-based practices and are integral to evidence-based medicine.
Definition
In 2014, Jacob Stegenga defined a hierarchy of evidence as "rank-ordering of kinds of methods according to the potential for that method to suffer from systematic bias". At the top of the hierarchy is a method with the most freedom from systemic bias or best internal validity relative to the tested medical intervention's hypothesized efficacy.In 1997, Greenhalgh suggested it was "the relative weight carried by the different types of primary study when making decisions about clinical interventions".
The National Cancer Institute defines levels of evidence as "a ranking system used to describe the strength of the results measured in a clinical trial or research study. The design of the study... and the endpoints measured... affect the strength of the evidence."
Examples
A large number of hierarchies of evidence have been proposed. Similar protocols for evaluation of research quality are still in development. So far, the available protocols pay relatively little attention to whether outcome research is relevant to efficacy or to effectiveness. In 2025 Francis PT suggested that the Hierarchy of evidence pyramid for Therapeutic studies and Etiological studies be shown separately as they follow separate paths.GRADE
The GRADE approach is a method of assessing the certainty in evidence and the strength of recommendations. The GRADE began in the year 2000 as a collaboration of methodologists, guideline developers, biostatisticians, clinicians, public health scientists and other interested members.Over 100 organizations have endorsed and/or are using GRADE to evaluate the quality of evidence and strength of health care recommendations..
GRADES rates quality of evidence as follows:
| High | There is a lot of confidence that the true effect lies close to that of the estimated effect. |
| Moderate | There is moderate confidence in the estimated effect: The true effect is likely to be close to the estimated effect, but there is a possibility that it is substantially different. |
| Low | There is limited confidence in the estimated effect: The true effect might be substantially different from the estimated effect. |
| Very low | There is very little confidence in the estimated effect: The true effect is likely to be substantially different from the estimated effect. |
Guyatt and Sackett
In 1995, Guyatt and Sackett published the first such hierarchy.Greenhalgh put the different types of primary study in the following order:
- Systematic reviews and meta-analyses of "RCTs with definitive results".
- RCTs with definitive results
- RCTs with non-definitive results
- Cohort studies
- Case–control studies
- Cross-sectional surveys
- Case reports
Saunders et al.
Khan et al.
A protocol for evaluation of research quality was suggested by a report from the Centre for Reviews and Dissemination, prepared by Khan et al. and intended as a general method for assessing both medical and psychosocial interventions. While strongly encouraging the use of randomized designs, this protocol noted that such designs were useful only if they met demanding criteria, such as true randomization and concealment of the assigned treatment group from the client and from others, including the individuals assessing the outcome. The Khan et al. protocol emphasized the need to make comparisons on the basis of "intention to treat" in order to avoid problems related to greater attrition in one group. The Khan et al. protocol also presented demanding criteria for nonrandomized studies, including matching of groups on potential confounding variables and adequate descriptions of groups and treatments at every stage, and concealment of treatment choice from persons assessing the outcomes. This protocol did not provide a classification of levels of evidence, but included or excluded treatments from classification as evidence-based depending on whether the research met the stated standards.U.S. National Registry of Evidence-Based Practices and Programs
An assessment protocol has been developed by the U.S. National Registry of Evidence-Based Practices and Programs. Evaluation under this protocol occurs only if an intervention has already had one or more positive outcomes, with a probability of less than.05, reported, if these have been published in a peer-reviewed journal or an evaluation report, and if documentation such as training materials has been made available. The NREPP evaluation, which assigns quality ratings from 0 to 4 to certain criteria, examines reliability and validity of outcome measures used in the research, evidence for intervention fidelity, levels of missing data and attrition, potential confounding variables, and the appropriateness of statistical handling, including sample size.History
Canada
The term was first used in a 1979 report by the "Canadian Task Force on the Periodic Health Examination" to "grade the effectiveness of an intervention according to the quality of evidence obtained".The task force used three levels, subdividing level II:
- Level I: Evidence from at least one randomized controlled trial,
- Level II1: Evidence from at least one well designed cohort study or case control study, preferably from more than one center or research group.
- Level II2: Comparisons between times and places with or without the intervention
- Level III: Opinions of respected authorities, based on clinical experience, descriptive studies or reports of expert committees.
The CTF updated their report in 1984, in 1986 and 1987.
United States
In 1988, the United States Preventive Services Task Force came out with its guidelines based on the CTF using the same three levels, further subdividing level II.- Level I: Evidence obtained from at least one properly designed randomized controlled trial.
- Level II-1: Evidence obtained from well-designed controlled trials without randomization.
- Level II-2: Evidence obtained from well-designed cohort or case-control analytic studies, preferably from more than one center or research group.
- Level II-3: Evidence obtained from multiple time series designs with or without the intervention. Dramatic results in uncontrolled trials might also be regarded as this type of evidence.
- Level III: Opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.