Geographic Names Information System


The Geographic Names Information System is a database of name and location information about more than two million physical and cultural features, encompassing the United States and its territories; the associated states of the Marshall Islands, Micronesia, and Palau; and Antarctica. It is a type of gazetteer. It was developed by the United States Geological Survey in cooperation with the United States Board on Geographic Names to promote the standardization of feature names.
Data were collected in two phases.
Although a third phase was considered, which would have handled name changes where local usages differed from maps, it was never begun.
The database is part of a system that includes topographic map names and bibliographic references. The names of books and historic maps that confirm the feature or place name are cited. Variant names, alternatives to official federal names for a feature, are also recorded. Each feature receives a permanent, unique feature record identifier, sometimes called the GNIS identifier. The database never removes an entry, "except in cases of obvious duplication."

Original purposes

The GNIS was originally designed for four major purposes: to eliminate duplication of effort at various other levels of government that were already compiling geographic data, to provide standardized datasets of geographic data for the government and others, to index all of the names found on official U.S. government federal and state maps, and to ensure uniform geographic names for the federal government.

Phase 1

Phase 1 lasted from 1978 to 1981, with a precursor pilot project run over the states of Kansas and Colorado in 1976, and produced 5 databases.
It excluded several classes of feature because they were better documented in non-USGS maps, including airports, the broadcasting masts for radio and television stations, civil divisions, regional and historic names, individual buildings, roads, and triangulation depot names.
The databases were initially available on paper, on microfiche, and on magnetic tape encoded in EBCDIC with 248-byte fixed-length records in 4960-byte blocks.
The feature classes for association with each name included "locale", "populated place", "spring", "lava", and "well".
Mountain features would fall into "ridge", "range", or "summit" classes.
A feature class "tank" was sometimes used for lakes, which was problematic in several ways.
This feature class was undocumented, and it was "an unreasonable determination", with the likes of Cayuga Lake being labelled a "tank".
The USACE report assumed that "tank" meant "reservoir", and observed that often the coordinates of "tanks" were outside of their boundaries and were "possibly at the point where a dam is thought to be".

National Geographic Names database

The National Geographic Names database was originally 57 computer files, one for each state and territory of the United States plus one for the District of Columbia.
The second Alaska file was an earlier database, the Dictionary of Alaska Place Names that had been compiled by the USGS in 1967.
A further two files were later added, covering the entire United States and that were abridged versions of the data in the other 57: one for the 50,000 most well known populated places and features, and one for most of the populated places.
The files were compiled from all of the names to be found on USGS topographic maps, plus data from various state map sources.
In phase 1, elevations were recorded in feet only, with no conversion to metric, and only if there was an actual elevation recorded for the map feature.
They were of either the lowest or highest point of the feature, as appropriate.
Interpolated elevations, calculated by interpolation between contour lines, were added in phase 2.
Names were the official name, except where the name contained diacritic characters that the computer file encodings of the time could not handle.
Generic designations were given after specific names, so Mount Saint Helens was recorded as "Saint Helens, Mount", although cities named Mount Olive, not actually being mountains, would not take "Mount" to be a generic part and would retain their order "Mount Olive".
The primary geographic coordinates of features which occupy an area, rather than being a single point feature, were the location of the feature's mouth, or of the approximate center of the area of the feature.
Such approximate centers were "eye-balled" estimates by the people performing the digitization, subject to the constraint that centers of areal features were not placed within other features that are inside them.
alluvial fans and river deltas counted as mouths for this purpose. For cities and other large populated places, the coordinates were taken to be those of a primary civic feature such as the city hall or town hall, main public library, main highway intersection, main post office, or central business district regardless of changes over time; these coordinates are called the "primary point".
Secondary coordinates were only an aid to locating which topographic map the feature extended across, and were "simply anywhere on the feature and on the topographic map with which it is associated".
River sources were determined by the shortest drain, subject to the proximities of other features that were clearly related to the river by their names.

USGS Topographic Map Names database

The USGS Topographic Map Names database was also 57 computer files containing the names of maps: 56 for 1:24000 scale USGS maps as with the NGNDB, the 57th being data from the 1:100000 and 1:250000 scale USGS maps.
Map names were recorded exactly as on the maps themselves, with the exceptions for diacritics as with the NGNDB.
Unlike the NGNDB, locations were the geographic coordinates of the south-east corner of the given map, except for American Samoa and Guam maps where they were of the north-east corner.
The TMNDB was later renamed the Geographic Cell Names database in the 1990s.

Generic database

The Generic database was in essence a machine-readable glossary of terms and abbreviations taken from the map sources, with their definitions, grouped into collections of related terms.

National Atlas database

The National Atlas database was an abridged version of the NGNDB that contained only those entries that were in the index to the USGS National Atlas of the United States, with the coordinates published in the latter substituted for the coordinates from the former.

Board on Geographic Names database

The Board on Geographic Names database was a record of investigative work of the USGS Board on Geographic Names' Domestic Names Committee, and decisions that it had made from 1890 onwards, as well as names that were enshrined by Acts of Congress.
Elevation and location data followed the same rules as for the NGNDB.
So too did names with diacritic characters.

Phase 2

Phase 2 was broader in scope than phase 1, extending the scope to a much larger set of data sources.
It ran from the end of phase 1 and had managed to completely process data from 42 states by 2003, with 4 still underway and the remaining 4 awaiting the initial systematic compilation of the sources to use.
Many more feature classes were included, including abandoned Native American settlements, ghost towns, railway stations on railway lines that no longer existed, housing developments, shopping centers, and highway rest areas.
The actual compilation was outsourced by the U.S. government, state by state, to private entities such as university researchers.

Antarctica Geographic Names database

The Antarctica Geographic Names database was added in the 1990s and comprised records for BGN-approved names in Antarctica and various off-lying islands such as the South Orkney Islands, the South Shetland Islands, the Balleny Islands, Heard Island, South Georgia, and the South Sandwich Islands.
It only contained records for natural features, not for scientific outposts.

Additional media

The media on which one could obtain the databases were extended in the 1990s to floppy disc, over FTP, and on CD-ROM.
The CD-ROM edition only included the NGNDB, the AGNDB, the GCNDB, and a bibliographic reference database ; but came with database search software that ran on PC DOS version 3.0 or later.
The FTP site included extra topical databases: a subset of the NGNDB that only included the records with feature classes for populated places, a "Concise" subset of the NGNDB that listed "major features", and a "Historical" subset that included the features that no longer exist.

Populated places

There is no differentiation amongst different types of populated places.
In the words of the aforementioned 1986 USACE report, " subdivision having one inhabitant is as significant as a major metropolitan center such as New York City".
In comparing GNIS populated place records with data from the Thematic Mapper of the Landsat program, researchers from the University of Connecticut in 2001 discovered that "a significant number" of populated places in Connecticut had no identifiable human settlement in the land use data and were at road intersections.
They found that such populated places with no actual settlement often had "Corner" in their names, and hypothesized that either these were historical records or were "cartographic locators".
In surveying in the United States, a "Corner" is a corner of the surveyed polygon enclosing an area of land, whose location is, or was, marked in various ways including with trees known as "bearing trees" or "corner monuments".
From analysing Native American names in the database in order to compile a dictionary, professor William Bright of UCLA observed in 2004 that some GNIS entries are "erroneous; or refer to long-vanished railroad sidings where no one ever lived". Such false classifications have propagated to other geographical information sources, such as incorrectly classified train stations appearing as towns or neighborhoods on Google Maps.