Metadata


Metadata is data that defines and describes the characteristics of other data. It often helps to describe, explain, locate, or otherwise make data easier to retrieve, [|use], or manage. For example, the title, author, and publication date of a book are metadata about the book. But, while a data asset is finite, its metadata is infinite.
As such, efforts to [|define], classify [|types], or [|structure] metadata are expressed as examples in the context of its use. The term "metadata" has a [|history] dating to the 1960s where it occurred in computer science and in popular culture. Different types of metadata serve different functions. For example, descriptive metadata for a document might include the author, creation date, file size and keywords.

Types

There are many distinct types of metadata, including:
  • Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords.
  • Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships, and other characteristics of digital materials.
  • Administrative metadata – the information to help manage a resource, like resource type, and permissions, and when and how it was created.
  • Reference metadata – the information about the contents and quality of statistical data.
  • Statistical metadata – also called process data, may describe processes that collect, process, or produce statistical data.
  • Legal metadata – provides information about the creator, copyright holder, and public licensing, if provided.
Metadata is not strictly bound to one of these categories, as it can describe a piece of data in many other ways.
While the metadata application is manifold, covering a large variety of fields, there are specialized and well-accepted models to specify types of metadata. Bretherton & Singley distinguish between two distinct classes: structural/control metadata and guide metadata. Structural metadata describes the structure of database objects such as tables, columns, keys and indexes. Guide metadata helps humans find specific items and is usually expressed as a set of keywords in a natural language. According to Ralph Kimball, metadata can be divided into three categories: technical metadata, business metadata, and process metadata.
NISO distinguishes three types of metadata: descriptive, structural, and administrative. Descriptive metadata is typically used for discovery and identification, as information to search and locate an object, such as title, authors, subjects, keywords, and publisher. Structural metadata describes how the components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, administrative metadata gives information to help manage the source. Administrative metadata refers to the technical information, such as file type, or when and how the file was created. Two sub-types of administrative metadata are rights management metadata and preservation metadata. Rights management metadata explains intellectual property rights, while preservation metadata contains information to preserve and save a resource.
Statistical data repositories have their own requirements for metadata in order to describe not only the source and quality of the data but also what statistical processes were used to create the data, which is of particular importance to the statistical community in order to both validate and improve the process of statistical data production.
An additional type of metadata beginning to be more developed is accessibility metadata. Accessibility metadata is not a new concept to libraries; however, advances in universal design have raised its profile. Projects like Cloud4All and GPII identified the lack of common terminologies and models to describe the needs and preferences of users and information that fits those needs as a major gap in providing universal access solutions. Those types of information are accessibility metadata. has incorporated several accessibility properties based on IMS Global Access for All Information Model Data Element Specification. The Wiki page lists several properties and their values. While the efforts to describe and standardize the varied accessibility needs of information seekers are beginning to become more robust, their adoption into established metadata schemas has not been as developed. For example, while Dublin Core 's "audience" and MARC 21's "reading level" could be used to identify resources suitable for users with dyslexia and DC's "format" could be used to identify resources available in braille, audio, or large print formats, there is more work to be done.

History

Metadata has various purposes. It can help users find relevant information and discover resources. It can also help organize electronic resources, provide digital identification, and archive and preserve resources. Metadata allows users to access resources by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information". Metadata of telecommunication activities including Internet traffic is very widely collected by various national governmental organizations. This data is used for the purposes of traffic analysis and can be used for mass surveillance.
Metadata was traditionally used in the card catalogs of libraries until the 1980s when libraries converted their catalog data to digital databases. In the 2000s, as data and information were increasingly stored digitally, this digital data was described using metadata standards.
The first description of "meta data" for computer systems is purportedly noted by MIT's Center for International Studies experts David Griffel and Stuart McIntosh in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data."
Unique metadata standards exist for different disciplines. Describing the contents and context of data or data files increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in, what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find the web page online. A CD may include metadata providing information about the musicians, singers, and songwriters whose work appears on the disc.
In many countries, government organizations routinely store metadata about emails, telephone calls, web pages, video traffic, IP connections, and cell phone locations.

Definition

Metadata means "data about data". Metadata is defined as the data providing information about one or more aspects of the data; it is used to summarize basic information about data that can make tracking and working with specific data easier. Some examples include:
For example, a digital image may include metadata that describes the size of the image, its color depth, resolution, when it was created, the shutter speed, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document. Metadata within web pages can also contain descriptions of page content, as well as key words linked to the content. These links are often called "Metatags", which were used as the primary factor in determining order for a web search until the late 1990s. The reliance on metatags in web searches was decreased in the late 1990s because of "keyword stuffing", whereby metatags were being largely misused to trick search engines into thinking some websites had more relevance in the search than they really did.
Metadata can be stored and managed in a database, often called a metadata registry or metadata repository. However, without context and a point of reference, it might be impossible to identify metadata just by looking at it. For example: by itself, a database containing several numbers, all 13 digits long could be the results of calculations or a list of numbers to plug into an without any other context, the numbers themselves can be perceived as the data. But if given the context that this database is a log of a book collection, those 13-digit numbers may now be identified as information that refers to the book, but is not itself the information within the book. The term "metadata" was coined in 1968 by Philip Bagley, in his book "Extension of Programming Language Concepts" where it is clear that he uses the term in the ISO 11179 "traditional" sense, which is "structural metadata" i.e. "data about the containers of data"; rather than the alternative sense "content about individual instances of data content" or metacontent, the type of data usually found in library catalogs. Since then the fields of information management, information science, information technology, librarianship, and GIS have widely adopted the term. In these fields, the word metadata is defined as "data about data". While this is the generally accepted definition, various disciplines have adopted their own more specific explanations and uses of the term.
Slate reported in 2013 that the United States government's interpretation of "metadata" could be broad, and might include message content such as the subject lines of emails.