Choropleth map


A choropleth map is a type of statistical thematic map that uses pseudocolor, meaning color corresponding with an aggregate summary of a geographic characteristic within spatial enumeration units, such as population density or per-capita income.
Choropleth maps provide an easy way to visualize how a variable varies across a geographic area or show the level of variability within a region. A heat map or isarithmic map is similar but uses regions drawn according to the pattern of the variable, rather than the a priori geographic areas of choropleth maps. The choropleth is likely the most common type of thematic map because published statistical data is generally aggregated into well-known geographic units, such as countries, states, provinces, and counties, and thus they are relatively easy to create using GIS, spreadsheets, or other software tools.

History

The earliest known choropleth map was created in 1826 by Baron Pierre Charles Dupin, depicting the availability of basic education in France by department. More "cartes teintées" were soon produced in France to visualize other "moral statistics" on education, disease, crime, and living conditions. Choropleth maps quickly gained popularity in several countries due to the increasing availability of demographic data compiled from national Censuses, starting with a series of choropleth maps published in the official reports of the 1841 Census of Ireland. When Chromolithography became widely available after 1850, color was increasingly added to choropleth maps.
The term "choropleth map" was introduced in 1938 by the geographer John Kirtland Wright, and was in common usage among cartographers by the 1940s. Also in 1938, Glenn Trewartha reintroduced them as "ratio maps", but this term did not survive.

Structure

A choropleth map brings together two datasets: spatial data representing a partition of geographic space into distinct districts, and statistical data representing a variable aggregated within each district. There are two common conceptual models of how these interact in a choropleth map: in one view, which may be called "district dominant", the districts are the focus, in which a variety of attributes are collected, including the variable being mapped. In the other view, which may be called "variable dominant", the focus is on the variable as a geographic phenomenon, with a real-world distribution, and the partitioning of it into districts is merely a convenient measurement technique.

Geometry: aggregation districts

In a choropleth map, the districts are usually previously defined entities such as governmental or administrative units, or districts created specifically for statistical aggregation, and thus have no expectation of correlation with the geography of the variable. That is, boundaries of the colored districts may or may not coincide with the location of changes in the geographic distribution being studied. This is in direct contrast to chorochromatic and isarithmic maps, in which region boundaries are defined by patterns in the geographic distribution of the subject phenomenon.
Using pre-defined aggregation regions has a number of advantages, including: easier compilation and mapping of the variable, recognizability of the districts, and the applicability of the information to further inquiry and policy tied to the individual districts. A prime example of this would be elections, in which the vote total for each district determines its elected representative.
However, it can result in a number of issues, generally due to the fact that the constant color applied to each aggregation district makes it look homogeneous, masking an unknown degree of variation of the variable within the district. For example, a city may include neighborhoods of low, moderate, and high family income, but be colored with one constant "moderate" color. Thus, real-world spatial patterns may not conform to the regional unit symbolized. Because of this, issues such as the ecological fallacy and the modifiable areal unit problem can lead to major misinterpretations of the data depicted, and other techniques are preferable if one can obtain the necessary data.
These issues can be somewhat mitigated by using smaller districts, because they show finer variations in the mapped variable, and their smaller visual size and increased number reduces the likelihood that the map user makes judgments about the variation within a single district. However, they can make the map overly complex, especially if there is not a meaningful geographic pattern in the variable. Although representing specific data in large regions can be misleading, the familiar district shapes can make the map clearer and easier to interpret and remember. The choice of regions will ultimately depend on the map's intended audience and purpose. Alternatively, the dasymetric technique can sometimes be employed to refine the region boundaries to more closely match actual changes in the subject phenomenon.
Because of these issues, for many variables, one may prefer an isarithmic or chorochromatic map, in which the region boundaries are based on the data itself. However, in many cases such detailed information is simply not available, and the choropleth map is the only feasible option.

Property: aggregate statistical summaries

The variable to be mapped may come from a wide variety of disciplines in the human or natural world, although human topics are generally more common because of the role of governmental units in human activity, which often leads to the original collection of the statistical data. The variable can also be in any of Stevens' levels of measurement: nominal, ordinal, interval, or ratio, although quantitative variables are more commonly used in choropleth maps than qualitative variables. It is important to note that the level of measurement of the individual datum may be different than the aggregate summary statistic. For example, a census may ask each individual for his or her "primary spoken language", but this may be summarized over all of the individuals in a county as "percent primarily speaking Spanish" or as "predominant primary language".
Broadly speaking, a choropleth map may represent two types of variables, a distinction common to physics and chemistry as well as Geostatistics and spatial analysis:
  • A spatially extensive variable is one that can apply only to the entire district, commonly in the form of total counts or amounts of a phenomenon. Extensive variables are said to be accumulative over space; for example, if the population of the United Kingdom is 65 million, it is not possible that the populations of England, Wales, Scotland, and Northern Ireland could also be 65 million. Instead, their total populations must sum to calculate the total population of the collective entity. However, while it is possible to map an extensive variable in a choropleth map, this is almost universally discouraged because patterns can be easily misinterpreted. For example, if a choropleth map assigned a particular shade of red to total populations between 60 and 70 million, a situation in which United Kingdom has 65 million inhabitants would be indistinguishable from a situation in which the four constituent countries each had 65 million inhabitants, even though these are vastly different geographic realities. Another source of interpretation error is that if a large district and a small district have the same value, the larger one will naturally look like more. Other types of thematic maps, especially proportional symbols and cartograms, are designed to represent extensive variables and are generally preferred.
  • A spatially intensive variable, also known as a field, statistical surface, or localized variable, represents a property that could be measured at any location in space, independent of any boundaries, although its variation over a district can be summarized as a single value. Common intensive variables include densities, proportions, rates of change, mean allotments, and descriptive statistics. Intensive variables are said to be distributive over space; for example, if the population density of the United Kingdom is 250 people per square kilometer, then it would be reasonable to estimate that the most likely density of each of the five constituent countries is also 250/km2. Traditionally in cartography, the predominant conceptual model for this kind of phenomenon has been the statistical surface, in which the variable is imagined as a third-dimension "height" above the two-dimensional space that varies continuously. In Geographic information science, the more common conceptualization is the field, adopted from Physics and usually modeled as a scalar function of location. Choropleth maps are better suited to intensive variables than extensive; if a map user sees the United Kingdom filled with a color for "100-200 people per square km", estimating that Wales and England may each have 100-200 people per square km may not be accurate, but it is possible and a reasonable estimate.

    Normalization

Normalization is the technique of deriving a spatially intensive variable from one or more spatially extensive variables, so that it can be appropriately used in a choropleth map. It is similar, but not identical, to the technique of normalization or standardization in statistics. Typically, it is accomplished by computing the ratio between two spatially extensive variables. Although any such ratio will result in an intensive variable, only a few are especially meaningful and commonly used in choropleth maps:
  • Density = total / area. Example: population density
  • Proportion = subgroup total / grand total. Example: Wealthy households as a percentage of all households.
  • Mean allocation = total amount / total individuals. Example: gross domestic product per capita
  • Rate of change = total at later time / total at earlier time. Example: annual population growth rate.
These are not equivalent, nor is one better than another. Rather, they tell different aspects of a geographic narrative. For example, a choropleth map of the population density of the Latino population in Texas visualizes a narrative about the spatial clustering and distribution of that group, while a map of the percent Latino visualizes a narrative of composition and predominance. Failure to employ proper normalization will lead to an inappropriate and potentially misleading map in almost all cases. This is one of the most common mistakes in cartography, with one study finding that at one point, more than half of United States COVID-19 dashboards hosted by state governments were not employing normalization to their choropleth maps. This is one of many issues that contributed to the infodemic surrounding the COVID-19 pandemic, and "might also be a subtle facilitator of the extreme political polarization surrounding measures to combat COVID that has occurred in the United States".