Cartographic generalization


Cartographic generalization, or map generalization, includes all changes in a map that are made when one derives a smaller-scale map from a larger-scale map or map data. It is a core part of cartographic design. Whether done manually by a cartographer or by a computer or set of algorithms, generalization seeks to abstract spatial information at a high level of detail to information that can be rendered on a map at a lower level of detail.
The cartographer has license to adjust the content within their maps to create a suitable and useful map that conveys spatial information, while striking the right balance between the map's purpose and the precise detail of the subject being mapped. Well generalized maps are those that emphasize the most important map elements while still representing the world in the most faithful and recognizable way.

History

During the first half of the 20th century, cartographers began to think seriously about how the features they drew depended on scale. Eduard Imhof, one of the most accomplished academic and professional cartographers at the time, published a study of city plans on maps at a variety of scales in 1937, itemizing several forms of generalization that occurred, including those later termed symbolization, merging, simplification, enhancement, and displacement. As analytical approaches to geography arose in the 1950s and 1960s, generalization, especially line simplification and raster smoothing, was a target of study.
Generalization was probably the most thoroughly studied aspect of cartography from the 1970s to the 1990s. This is probably because it fit within both of the major two research trends of the era: cartographic communication, and the opportunities afforded by technological advance. Early research focused primarily on algorithms for automating individual generalization operations. By the late 1980s, academic cartographers were thinking bigger, developing a general theory of generalization, and exploring the use of expert systems and other nascent Artificial intelligence technologies to automate the entire process, including decisions on which tools to use when. These tracks foundered somewhat in the late 1990s, coinciding with a general loss of faith in the promise of AI, and the rise of post-modern criticisms of the impacts of the automation of design.
In recent years, the generalization community has seen a resurgence, fueled in part by the renewed opportunities of AI. Another recent trend has been a focus on multi-scale mapping, integrating GIS databases developed for several target scales, narrowing the scope of need for generalization to the scale "gaps" between them, a more manageable level for automation.

Theories of Map detail

Generalization is often defined simply as removing detail, but it is based on the notion, originally adopted from Information theory, of the volume of information or detail found on the map, and how that volume is controlled by map scale, map purpose, and intended audience. If there is an optimal amount of information for a given map project, then generalization is the process of taking existing available data, often called the digital landscape model, which usually but not always has a larger amount of information than needed, and processing it to create a new data set, often called the digital cartographic model, with the desired amount.
Many general conceptual models have been proposed for understanding this process, often attempting to capture the decision process of the human master cartographer. One of the most popular models, developed by McMaster and Shea in 1988, divides these decisions into three phases: Philosophical objectives, the general reasons why generalization is desirable or necessary, and criteria for evaluating its success; Cartometric evaluation, the characteristics of a given map that demands generalization; and Spatial and attribute transformations, the set of generalization operators available to use on a given feature, layer, or map. In the first, most conceptual phase, McMaster and Shea show how generalization plays a central role in resolving the often conflicting goals of Cartographic design as a whole: functionality vs. aesthetics, information richness vs. clarity, and the desire to do more vs. the limitations of technology and medium. These conflicts can be reduced to a basic conflict between the need for more data on the map, and the need for less, with generalization as the tool for balancing them.
One challenge with the information theory approach to generalization is its basis on measuring the amount of information on the map, before and after generalization procedures. One could conceive of a map being quantified by its map information density, the average number of "bits" of information per unit area on the map, and by its ground information density or resolution, the same measures per unit area on the Earth. Scale would thus be proportional to the ratio between them, and a change in scale would require the adjustment of one or both of them by means of generalization.
But what counts as a "bit" of map information? In specific cases, that is not difficult, such as counting the total number of features on the map, or the number of vertices in a single line ; such straightforwardness explains why these were early targets for generalization research. However, it is a challenge for the map in general, in which questions arise such as "how much graphical information is there in a map label: one bit, a bit for each character, or bits for each vertex or curve in every character, as if they were each area features?" Each option can be relevant at different times.
This measurement is further complicated by the role of map symbology, which can affect the apparent information density. A map with a strong visual hierarchy carries an aesthetic of being "clear" because it appears at first glance to contain less data than it really does; conversely, a map with no visual hierarchy, in which all layers seem equally important, might be summarized as "cluttered" because one's first impression is that it contains more data than it really does. Designing a map to achieve the desired gestalt aesthetic is therefore about managing the apparent information density more than the actual information density. In the words of Edward Tufte,
There is recent work that recognizes the role of map symbols, including the Roth-Brewer typology of generalization operators, although they clarify that symbology is not a form of generalization, just a partner with generalization in achieving a desired apparent information density.

Operators

There are many cartographic techniques that are used to adjust the amount of geographic data on the map. Over the decades of generalization research, over a dozen unique lists of such generalization operators have been published, with significant differences. In fact, there are multiple reviews comparing the lists, and even they miss a few salient ones, such as that found in John Keates' first textbook that was apparently ahead of its time. Some of these operations have been automated by multiple algorithms, with tools available in Geographic information systems and other software; others have proven much more difficult, with most cartographers still performing them manually.
File:Oklahoma osm.png|thumb|300px|This OpenStreetMap map of Oklahoma shows the challenges of automated Selection from raw GIS data. The gaps in the highways are not due to missing data, but to shortcomings in the selection process. Note also that the point and label for Oklahoma City is missing, although its suburbs Norman and Edmond are included.

Select

Also called filter, omission
One of the first operators to be recognized and analyzed, first appearing in the 1973 Keates list, selection is the process of simply removing entire geographic features from the map. There are two types of selection, which are combined in some models, and separated in others:
  • Layer Selection: the choice of which data layers or themes to include or not.
  • Feature Selection: the choice of which specific features to include or remove within an included layers.
In feature selection, the choice of which features to keep or exclude is more challenging than it might seem. Using a simple attribute of real-world size, while often easily available in existing GIS data, often produces a selection that is excessively concentrated in some areas and sparse in others. Thus, cartographers often filter them using their degree of regional importance, their prominence in their local area rather than the map as a whole, which produces a more balanced map, but is more difficult to automate. Many formulas have been developed for automatically ranking the regional importance of features, for example by balancing the raw size with the distance to the nearest feature of significantly greater size, similar to measures of Topographic prominence, but this is much more difficult for line features than points, and sometimes produces undesirable results.
Another approach is to manually encode a subjective judgment of regional importance into the GIS data, which can subsequently be used to filter features; this was the approach taken for the Natural Earth dataset created by cartographers.

Simplify

Another early focus of generalization research, simplification is the removal of vertices in lines and area boundaries. A variety of algorithms have been developed, but most involve searching through the vertices of the line, removing those that contribute the least to the overall shape of the line. The Ramer–Douglas–Peucker algorithm is one of the earliest and still most common techniques for line simplification. Most of these algorithms, especially the early ones, placed a higher priority on reducing the size of datasets in the days of limited digital storage, than on quality appearance on maps, and often produce lines that look excessively angular, especially on curves such as rivers. Some other algorithms include the Wang-Müller algorithm which looks for critical bends and is typically more accurate at the cost of processing time, and the Zhou-Jones algorithm and Visvalingam-Whyatt algorithm which use properties of the triangles within the polygon to determine which vertices to remove.
File:Mt. Kinesava geologic map.png|thumb|right|300px|1:24,000 and 1:100,000 geological maps of the same area in Zion National Park, Utah. Deriving the smaller from the larger would require several generalization operations, including selection to eliminate less important features, smoothing of area boundaries, classification of similar formations into broader categories, merging of small areas into dissimilar but larger ones, exaggeration of very narrow areas, and displacement of areas adjacent to exaggerated areas. Actually, both maps were compiled independently.