Data model (GIS)
A geographic data model, geospatial geographical measurements, or simply data from modules in the context of geographic information systems, is a mathematical and digital structure for representing phenomena over the Earth. Generally, such data modules represent various aspects of these phenomena by means of statistical data measurement, including locations, change over time. For example, the vector graphic data model represents geography as collections of points, lines, and arrays, and the elimination data model represent geography as space matrices that store numeric values. Data models are implemented throughout the GIS ecosystem, including the software tools for data management and spatial analysis, data stored in very specific languages of GIS file formats specifications and standards, and specific designs for GIS installations.
While the unique nature of spatial information has led to its own set of model structures, much of the process of data modeling is similar to the rest of information technology, including the progression from conceptual models to logical models, and the difference between generic models and application-specific design.
History
The earliest computer systems that represented geographic phenomena were quantitative analysis models developed during the quantitative revolution in geography in the 1950s and 1960s; these could not be called a geographic information system because they did not attempt to store geographic data in a consistent permanent structure, but were usually statistical or mathematical models. The first true GIS software modeled spatial information using data models that would come to be known as raster or vector:- SYMAP produced raster maps, although data was usually entered as vector-like region outlines or sample points then interpolated into a raster structure for output. The GRID package, developed at the lab in 1969 by David Sinton, was based on VAR but was more focused on the permanent storage and analysis of gridded data, thus becoming perhaps the first general purpose raster GIS software.
- The Canadian Geographic Information System stored natural resource data as "faces", although these were typically derived from raster scans of paper maps.
- Dual Independent Map Encoding was perhaps the first robust vector data model incorporating network and polygon topology and attributes sufficient to allow address geocoding.
- Like the CGIS, early GIS installations in the United States were often focused on inventories of land use and natural resources, including the Minnesota Land Management Information System, the Land Use and Natural Resources Inventory of New York, and the Oak Ridge Regional Modelling Information System. Unlike CGIS, these were all raster systems inspired by SYMAP, although the MLMIS was based on subsections of the Public Land Survey System, which is not a perfect regular grid.
As commercial off-the-shelf GIS software, GIS installations, and GIS data proliferated in the 1980s, scholars began to look for conceptual models of geographic phenomena that seemed to underlay the common data models, trying to discover why the raster and vector data models seemed to make common sense, and how they measured and represented the real world. This was one of the primary threads that formed the subdiscipline of geographic information science in the early 1990s.
Further developments in GIS data modeling in the 1990s were driven by rapid increases in both the GIS user base and computing capability. Major trends included 1) the development of extensions to the traditional data models to handle more complex needs such as time, three-dimensional structures, uncertainty, and multimedia; and 2) the need to efficiently manage exponentially increasing volumes of spatial data with enterprise needs for multiuser access and security. These trends eventually culminated in the emergence of spatial databases incorporated into relational databases and object-relational databases.
Types of data models
Because the world is much more complex than can be represented in a computer, all geospatial data are incomplete approximations of the world. Thus, most geospatial data models encode some form of strategy for collecting a finite sample of an often infinite domain, and a structure to organize the sample in such a way as to enable interpolation of the nature of the unsampled portion. For example, a building consists of an infinite number of points in space; a vector polygon represents it with a few ordered points, which are connected into a closed outline by straight lines and assuming all interior points are part of the building; furthermore, a "height" attribute may be the only representation of its three-dimensional volume.The process of designing geospatial data models is similar to data modeling in general, at least in its overall pattern. For example, it can be segmented into three distinct levels of model abstraction:
- Conceptual data model, a high-level specification of how information is organized in the mind and in enterprise processes, without regard to the restrictions of GIS and other computer systems. It is common to develop and represent a conceptual model visually using tools such as an entity-relationship model.
- Logical data model, a broad strategy for how to represent the conceptual model in the computer, sometimes novel but often within the framework of existing software, hardware, and standards. The unified modeling language, specifically the class diagram, is commonly used for visually developing logical and physical models.
- Physical data model, the detailed specification of how data will be structured in memory or in files.
- A generic data model is intended to be employed in a wide variety applications, by discovering consistent patterns in the ways that society in general conceptualizes information and/or structures that work most efficiently in computers. For example, the field is a generic conceptual model of geographic phenomena, the relational database model and var are generic logical models, while the shapefile format is a generic physical model. These models are typically implemented directly info software and GIS file formats. In the past, these models have been designed by academic researchers, by standards bodies such as the Open Geospatial Consortium, and by software vendors such as Esri. While academic and standard models are public, companies may choose to keep the details of their model a secret or to publish them openly.
- A specific data model or GIS design is a specification of the data needed for a particular enterprise or project GIS, GIT, Read,_
Conceptual spatial models
Generic geospatial conceptual models attempt to capture both the physical nature of geographic phenomena and how people think about them and work with them. Contrary to the standard modeling process described above, the data models upon which GIS is built were not originally designed based on a general conceptual model of geographic phenomena, but were largely designed according to technical expediency, likely influenced by common sense conceptualizations that had not yet been documented.That said, an early conceptual framework that was very influential in early GIS development was the recognition by Brian Berry and others that geographic information can be decomposed into the description of three very different aspects of each phenomenon: space, time, and attribute/property/theme. As a further development in 1978, David Sinton presented a framework that characterized different strategies for measurement, data, and mapping as holding one of the three aspects constant, controlling a second, and measuring the third.
During the 1980s and 1990s, a body of spatial information theories gradually emerged as a major subfield of geographic information science, incorporating elements of philosophy, linguistics, and sciences of spatial cognition. By the early 1990s, a basic dichotomy had emerged of two alternative ways of making sense of the world and its contents:
- An object is a distinct "thing," comprehended as a whole. It may be a visible, material object, such as a building or road, or an abstract entity such as a county or the market area of a retail store.
- A field is a property that varies over space, so that it potentially has a distinct measurable value at any location within its extent. It may be a physical, directly measurable characteristic of matter akin to the intensive properties of chemistry, such as temperature or density; or it may be an abstract concept defined via a mathematical model, such as the likelihood that a person living at each location will use a local park.