Reference range


In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.
The standard definition of a reference range originates in what is most prevalent in a reference group taken from the general population. This is the general reference range. However, there are also optimal health ranges and ranges for particular conditions or statuses.
Values within the reference range are those within normal limits. The limits are called the upper reference limit or upper limit of normal and the lower reference limit or lower limit of normal. In health care–related publishing, style sheets sometimes prefer the word reference over the word normal to prevent the nontechnical senses of normal from being conflated with the statistical sense. Values outside a reference range are not necessarily pathologic, and they are not necessarily abnormal in any sense other than statistically. Nonetheless, they are indicators of probable pathosis. Sometimes the underlying cause is obvious; in other cases, challenging differential diagnosis is required to determine what is wrong and thus how to treat it.
A cutoff or threshold is a limit used for binary classification, mainly between normal versus pathological. Establishment methods for cutoffs include using an upper or a lower limit of a reference range.

Standard definition

The standard definition of a reference range for a particular measurement is defined as the interval between which 95% of values of a reference population fall into, in such a way that 2.5% of the time a value will be less than the lower limit of this interval, and 2.5% of the time it will be larger than the upper limit of this interval, whatever the distribution of these values.
Reference ranges that are given by this definition are sometimes referred as standard ranges.
Since a range is a defined statistical value that describes the interval between the smallest and largest values, many, including the International Federation of Clinical Chemistry prefer to use the expression reference interval rather than reference range.
Regarding the target population, if not otherwise specified, a standard reference range generally denotes the one in healthy individuals, or without any known condition that directly affects the ranges being established. These are likewise established using reference groups from the healthy population, and are sometimes termed normal ranges or normal values. However, using the term normal may not be appropriate as not everyone outside the interval is abnormal, and people who have a particular condition may still fall within this interval.
However, reference ranges may also be established by taking samples from the whole population, with or without diseases and conditions. In some cases, diseased individuals are taken as the population, establishing reference ranges among those having a disease or condition. Preferably, there should be specific reference ranges for each subgroup of the population that has any factor that affects the measurement, such as, for example, specific ranges for each sex, age group, race or any other general determinant.

Establishment methods

Methods for establishing reference ranges can be based on assuming a normal distribution or a log-normal distribution, or directly from percentages of interest, as detailed respectively in following sections. When establishing reference ranges from bilateral organs, both results from the same individual can be used, although intra-subject correlation must be taken into account.

Normal distribution

The 95% interval, is often estimated by assuming a normal distribution of the measured parameter, in which case it can be defined as the interval limited by 1.96 population standard deviations from either side of the population mean.
However, in the real world, neither the population mean nor the population standard deviation are known. They both need to be estimated from a sample, whose size can be designated n. The population standard deviation is estimated by the sample standard deviation and the population mean is estimated by the sample mean. To account for these estimations, the 95% prediction interval is calculated as:
where is the 97.5% quantile of a Student's t-distribution with n−1 degrees of freedom.
When the sample size is large
This method is often acceptably accurate if the standard deviation, as compared to the mean, is not very large. A more accurate method is to perform the calculations on logarithmized values, as described in separate section later.
The following example of this method is based on values of fasting plasma glucose taken from a reference group of 12 subjects:
Fasting plasma glucose

in mmol/L
Deviation from
mean m
Squared deviation
from mean m
Subject 15.50.170.029
Subject 25.2-0.130.017
Subject 35.2-0.130.017
Subject 45.80.470.221
Subject 55.60.270.073
Subject 64.6-0.730.533
Subject 75.60.270.073
Subject 85.90.570.325
Subject 94.7-0.630.397
Subject 105-0.330.109
Subject 115.70.370.137
Subject 125.2-0.130.017
Mean = 5.33
n=12
Mean = 0.00Sum/ = 1.95/11 =0.18

= standard deviation

As can be given from, for example, a table of selected values of Student's t-distribution, the 97.5% percentile with degrees of freedom corresponds to
Subsequently, the lower and upper limits of the standard reference range are calculated as:
Thus, the standard reference range for this example is estimated to be 4.4 to 6.3 mmol/L.
Confidence interval of limit
The 90% confidence interval of a standard reference range limit as estimated assuming a normal distribution can be calculated by:
where SD is the standard deviation, and n is the number of samples.
Taking the example from the previous section, the number of samples is 12 and the standard deviation is 0.42 mmol/L, resulting in:
Thus, the lower limit of the reference range can be written as 4.4 mmol/L.
Likewise, with similar calculations, the upper limit of the reference range can be written as 6.3 mmol/L.
These confidence intervals reflect random error, but do not compensate for systematic error, which in this case can arise from, for example, the reference group not having fasted long enough before blood sampling.
As a comparison, actual reference ranges used clinically for fasting plasma glucose are estimated to have a lower limit of approximately 3.8 to 4.0, and an upper limit of approximately 6.0 to 6.1.

Log-normal distribution

In reality, biological parameters tend to have a log-normal distribution, rather than the normal distribution or Gaussian distribution.
An explanation for this log-normal distribution for biological parameters is: The event where a sample has half the value of the mean or median tends to have almost equal probability to occur as the event where a sample has twice the value of the mean or median. Also, only a log-normal distribution can compensate for the inability of almost all biological parameters to be of negative numbers, with the consequence that there is no definite limit to the size of outliers on the high side, but, on the other hand, they can never be less than zero, resulting in a positive skewness.
As shown in diagram at right, this phenomenon has relatively small effect if the standard deviation is relatively small, as it makes the log-normal distribution appear similar to a normal distribution. Thus, the normal distribution may be more appropriate to use with small standard deviations for convenience, and the log-normal distribution with large standard deviations.
In a log-normal distribution, the geometric standard deviations and geometric mean more accurately estimate the 95% prediction interval than their arithmetic counterparts.
Necessity
Reference ranges for substances that are usually within relatively narrow limits such as electrolytes can be estimated by assuming normal distribution, whereas reference ranges for those that vary significantly such as most hormones are more accurately established by log-normal distribution.
The necessity to establish a reference range by log-normal distribution rather than normal distribution can be regarded as depending on how much difference it would make to not do so, which can be described as the ratio:
where:
  • Limitlog-normal is the limit as estimated by assuming log-normal distribution
  • Limitnormal is the limit as estimated by assuming normal distribution.
This difference can be put solely in relation to the coefficient of variation, as in the diagram at right, where:
where:
In practice, it can be regarded as necessary to use the establishment methods of a log-normal distribution if the difference ratio becomes more than 0.1, meaning that a limit estimated from an assumed normal distribution would be more than 10% different from the corresponding limit as estimated from a log-normal distribution. As seen in the diagram, a difference ratio of 0.1 is reached for the lower limit at a coefficient of variation of 0.213, and for the upper limit at a coefficient of variation at 0.413. The lower limit is more affected by increasing coefficient of variation, and its "critical" coefficient of variation of 0.213 corresponds to a ratio of / of 2.43, so as a rule of thumb, if the upper limit is more than 2.4 times the lower limit when estimated by assuming normal distribution, then it should be considered to do the calculations again by log-normal distribution.
Taking the example from previous section, the standard deviation is estimated at 0.42 and the arithmetic mean is estimated at 5.33. Thus the coefficient of variation is 0.079. This is less than both 0.213 and 0.413, and thus both the lower and upper limit of fasting blood glucose can most likely be estimated by assuming normal distribution. More specifically, the coefficient of variation of 0.079 corresponds to a difference ratio of 0.01 for the lower limit and 0.007 for the upper limit.