EIDR
The Entertainment Identifier Registry, or EIDR, is a global unique identifier system for a broad array of audiovisual objects, including motion pictures, television, and radio programs. The identification system resolves an identifier to a metadata record that is associated with top-level titles, edits, DVDs, encodings, clips, and mashups. EIDR also provides identifiers for video service providers, such as broadcast and cable networks.
As of June 2020, EIDR contains over two million records, including almost 400 thousand movies and almost one million episodes from over 40,000 TV series.
EIDR is an implementation of a digital object identifier.
History
Media asset identification systems have existed for decades. The common motivation for their creation is to enable the management of media assets through the assignment of a unique id to a set of metadata representing salient characteristics of each asset. Over time such systems tend to proliferate, with each arising to deal with a specific set of issues. As a result, there is considerable variation between systems in terms of which assets are categorized, which metadata is associated with each asset, and the very definition of an asset. To name a few examples, should a "director's cut" of a film be distinct from the original theatrical release? How should regional variations be accounted for? Further complications include the procedures for adding new assets, editing existing assets, and creating derivative assets.EIDR was created to address these issues, as well as others encountered in video asset workflows, both in a business-to-business context and the intramural post-production activities of content producers. EIDR has the following characteristics:
- A central registry available to all participants
- Ability to easily register new assets
- An asset ID that is immutable
- Detection/prevention of duplicates of the same asset being created
- Ability to create a set of video assets derived from an abstract work
- Ability to group video assets by more general relationships
- A core set of metadata to differentiate assets, even when closely related
- Scalable, immutable, persistent
Content model
EIDR is built on a collection of records that are stored in a central registry. These records are referenced externally by DOIs, which are assigned when a record is created, and each identifier is immutable thereafter. The identifier resolution system underlying DOIs is the Handle System and so each native EIDR Content ID is a handle formatted, in increasing specificity, to handle, DOI and EIDR standards.Content ID format
The canonical form of an EIDR Content ID is an instance of a handle and has the format:where10.5240 is the DOI prefix for an EIDR asset. The "10" indicates the handle is a DOI; other prefixes are assigned to other asset types. The digits between the "." and "/" form the sub-prefix, which indicates which registration agency within the International DOI Foundation has rights to manage these handles. "5240" is assigned to the EIDR Association.XXXX-XXXX-XXXX-XXXX-XXXX-C is the DOI suffix. Each "X" denotes a hexadecimal digit, and "C" is an ISO 7064 Mod 37,36 check digit.
There is also a 96-bit compact binary form that is intended for embedding in small payloads such as watermarks. This form is generated from the canonical format as follows:
- 16-bit sub-prefix: generated by interpreting the sub-prefix as a binary value, e.g. B'0001010001111000'
- 80-bit suffix: the non-checksum part of the suffix, represented as 10 bytes
For use on the web an EIDR content ID can be represented as a URI in one of these forms:
Record types
There are four types of content records, each associated with a reserved prefix:Content ID : is associated with an entertainment asset such as a movie or TV series. Content records are hierarchical, allowing relationships to be expressed such as a Series, whose children would be Seasons, whose children in turn would be individual episodes. Many other relationships are supported, as described below. Content records form the bulk of the data in the EIDR registry.Party ID : identifies entities such as registrants, content producers, and distributors.Video Service ID : Identifies a video service, colloquially known as a "channel" or "network": a linear sequence of content scheduled to be broadcast at specified times.User ID : Identifies a user using a string of 2–32 alphanumeric and selected special characters. A User is primarily an administrative concept that is subordinate to Parties. Unlike the other EIDR DOIs, the User ID can only be used within EIDR.The sub-prefixes 5237, 5238, 5239, and 5240 are all assigned to the EIDR Association.
Content Records
Content records are objects categorized by their types and relationships. Each has three different kinds of type:Object Type: there are a total of 10 of these. First is the Basic Type, which has the minimal fields necessary to describe a content record. The other 9 are derived from the basic type, and contain extra fields for describing more complex objects.Structural Type: these distinguish representations of a work and are listed in increasing order of specificity:- * Abstraction: Used for objects having no reality, such as a series container or the most basic concept of the original work. This corresponds to the International Standard Musical Work Code for musical works, the International Standard Text Code for textual works, or the International Standard Audiovisual Number for audiovisual works.
- * Performance: Used for items that are particular versions of a work, such as the original theatrical release or director's cut of a film or a locally censored version of a TV show. This roughly corresponds to the International Standard Recording Code for musical works and to some uses of the Version ISAN for audiovisual works.
- * Digital: A particular digital representation of a work, such as an MPEG-2 encoding of a movie. This corresponds to some uses of the V-ISAN.Referent Type: the type of the content asset, independent of a particular manifestation :
- * Series: An Abstraction that contains ordered or unordered individual items.
- * Season: A second level of grouping below a Series, usually covering a time interval
- * TV: Content that first appeared via broadcast.
- * Movie: Long-form content that first appeared in a cinema or theater.
- * Short: Loosely defined to cover a work that is 40 minutes or less, such as music videos, theatrical newsreels, or theatrical or DTV cartoon shorts.
- * Web: Content that first appeared on the Web. This is different from content from elsewhere that has been made available on the Web.
- * Interactive Material: Content that is not strictly audio-visual. It covers DVD menus, interactive TV overlays, customized players, etc.
- * Compilation: Content composed of multiple other assets that cannot be more precisely described, such as a box set of a film franchise.
- * Supplemental: This type is for secondary content whose primary purpose is to support, augment, or promote other content. Examples include trailers, outtakes, and promotion documentaries.
Basic metadata
The following fields comprise the base object data of a content record:Structural Type: e.g. AbstractionMode: e.g. AudioVisual ; "Audio" for a radio program; "Visual" for a silent work.Referent Type: e.g. MovieTitle: the primary title. Titles and Alternate Titles are further distinguished by:- * Lang: the language of the title expressed as ISO 639-1 code
- * Class: release or regionalAlternate Title 1..N: one or more alternate titles Original Language: the language of the original release expressed as ISO 639-1 codeAssociated Org 1..N: Party ID of producer, studio, etc.Release Date: date title was originally releasedCountry of Origin: ISO 3166-1 alpha 2 code, with extensions for defunct countriesApproximate Length: expressed as XML Schema xs:duration datatypeAlternate ID 1..N: one or more equivalent IDs expressed in a different asset ID system.Credits: only skeletal credits are provided, typically restricted to the director and up to four of the main actors. As noted, it is a non-goal for EIDR to compete with proprietary systems with rich metadata. The main goal is to assist with disambiguating the title, and helping with validation and de-duplication efforts.Registrant: the party that created this content record Creation Date: date this content record was createdStatus: normally "valid" Last Modification Date: last time this content record was changed
Deleted content records
An EIDR ID must be always resolvable, thus under normal circumstances the corresponding Content Record will be permanent. There are two mechanisms available to deal with errors or other unusual circumstances. The preferred one is aliasing, whereby an EIDR ID is transparently redirected to another content record. Aliasing is commonly employed to deal with an asset being registered twice.The other mechanism is the use of tombstone records. This is employed when the Content Record is corrupted, or an otherwise invalid asset was accidentally registered. In this case the ID will be aliased to a special tombstone record. The tombstone can be recognized by applications because its EIDR ID field will be set to the distinguished value "". Note that "X" means the 24th letter of the Latin alphabet.
Alternate ID
Having a rich set of alternate IDs for content is one of the primary goals of EIDR. This allows EIDR IDs to be used everywhere in content workflows; if an alternate ID is needed it can be found in the metadata for the EIDR ID. EIDR supports the inclusion both proprietary and other standard ID references. Additional Alternate IDs can be added when needed. Below is an example of alternate IDs for the EIDR asset . If an alternate ID is resolvable algorithmically, for example by placing it appropriately in a template URL, EIDR makes that link available.| Alternate ID | |
| Alternate ID | Type: ISAN |
| Alternate ID #2 | |
| Alternate ID #2 | Type: IVA |
| Alternate ID #3 | |
| Alternate ID #3 | Type: Proprietary Domain: amazon.com |
| Alternate ID #4 | |
| Alternate ID #4 | Type: Proprietary Domain: flixster.com |
| Alternate ID #5 | 15042 |
| Alternate ID #5 | Type: Proprietary Domain: thecinemasource.com |
| Alternate ID #6 | |
| Alternate ID #6 | Type: IMDB Relation: IsSameAs |
| Alternate ID #7 | E0087486000 |
| Alternate ID #7 | Type: Proprietary Domain: spe.sony.com/MPM |
| Alternate ID #8 | 3929 |
| Alternate ID #8 | Type: Proprietary Domain: spe.sony.com/ProductID |
| Alternate ID #9 | 2002029 |
| Alternate ID #9 | Type: Proprietary Domain: warnerbros.com/MPM |
| Alternate ID #10 | 389785 |
| Alternate ID #10 | Type: Proprietary Domain veronicamagazine.nl |
| Alternate ID #11 | |
| Alternate ID #11 | Type: Proprietary Domain: amazon.com |
| Alternate ID #12 | |
| Alternate ID #12 | Type: Proprietary Domain: bfi.org.uk |
Alternate IDs are partitioned into non-proprietary and proprietary. The former have distinguished, predefined types, whereas proprietary IDs are all of type "Proprietary", and are further distinguished by an associated DNS domain. As of July 2017, there are over 2 million alternate IDs directly available through EIDR.
Relationships between objects
Content objects can be related to each other according to the following table. These relations are expressed as additional fields in the content record and are thus relative to that object. Note that the subject object is the child and the target is the parent. Additional constraints are noted in the table.Use in standards and applications
EIDR has been incorporated into many standards. A few of the more significant ones are listed here:SMPTE/AMWA: SMPTE Recommended Practice RP 2079 standardizes use of EIDR in MXF media containers, at the heart of professional content workflows, including AMWA AS-03 and AS-11 specifications. SMTPE Recommended Practice 2021-5 allows an EIDR Identifier to be carried wherever BXF is used for exchange of data among broadcast systems.European Broadcasting Union : EBUCore is a common core set of descriptive and technical metadata that describe media resources. EBU and EIDR staff have produced a mapping of EBUCore for base records to EIDR root objects:.. EIDR and EBU are working together in the SMPTE Core working group to define descriptive metadata for SMPTE-based specifications and workflows. EIDR is one of the standards supported by the EBU Core.DVB: EIDR is referenced in draft DVB specifications for companion screens.MPEG: EIDR has been proposed as a content identifier in the Multimedia Preservation Application Format that is being defined for archival use.CableLabs : EIDR is part of the CableLabs Metadata standard for the distribution of video on demand assets. EIDR is one program identifier that can be present in an SCTE-35 2013 segmentation descriptor, a standard used in IP distribution over cable. EIDR is also used in Dynamic Ad Insertion products using the SCTE 130 standard architecture.EIDR and Alternate IDs: In order to promote interoperability of EIDR with a wide variety of systems, EIDR includes an "AlternateID" field to cross-reference existing IDs systems. Alternate IDs may include, for example, CRID, ISAN, ISRC, UPC, or URI, as well as commercial ID systems such as Ad-ID, Baseline, IMDb, etc. Currently about half of EIDR records carry an ID from at least one other system.Mapping from other Standard Metadata and Identifiers to EIDR: Other metadata and identifier systems can be directly mapped into EIDR:- * EN 15907 and EN 15744: These standards are under the auspices of the European Committee for Standardization CEN/TC 372 and filmstandards.org. Best practices and mappings are available for EN 15907 and EN 15744 root objects. EIDR is also working with film archives to extend interoperability with these standards to a more granular level of detail, including a project with the British Film Institute to register their EN 15907-based records with EIDR.
- * International Standard Audiovisual Number : ISAN is widely used in rights management and collection systems. A complete mapping of an ISAN registration to an EIDR registration is available. The UK Audio-Visual Registration Agency, a joint venture between EIDR and ISAN-UK provides joint registration services for both identifiers. Precursors to this service have been used to obtain EIDR IDs and ISANs for broadcast content from ITV.
Operations & Administrative
EIDR is administered by the non-profit EIDR Association, which was founded in October 2010 by MovieLabs, CableLabs, Comcast and Rovi. Membership has grown steadily since then: as of late-2014 it has 79 members divided between the Industry Promoters and Industry Contributor levels. The fastest growing category is non-US companies, which now accounts for about 20% of membership.The EIDR Association operates two EIDR registries: Production and Sandbox. The former is the official site, and the latter is reserved for test and development. Both systems are available publicly online, but the contents of the sandbox are not guaranteed to be correct, complete, or even to refer to assets that exist. Only members of the EIDR association may modify the registry.
Registration
Registration of new assets can be done individually or in bulk. In either case, the workflow comprises a combination of automated and manual processes. It is also iterative, as the initial matching process may identify a variety of gaps and errors that need to be dealt with.Registering new assets is a complex process that requires some preparation, particularly in the case of bulk submission. The automated processes will check syntax, make sure that the basic metadata is supplied, and that any dependencies are honored. Manual steps include making sure the correct Parties are associated with the asset. One of the most important steps is ensuring that a new asset does not already exist in the registry: this is covered in the next section.
In order to register a new asset a user must be associated with a party that has been granted the "Registrant" role by the EIDR operator. A registrant may be a principal agent, such as a studio or an encoding house, but it may also be a Party doing bulk registration of back-catalogue items, or a Party acting on behalf of someone else. It is also a requirement that a registrant be an EIDR member. In general, content ownership, metadata authority, and registration capability are separate and unrelated concepts.
Deduplication
This refers to flagging assets being submitted to the registry as falling into one of the following three categories:- Candidate asset is unique.
- Candidate asset is a duplicate of an existing record.
- Candidate asset has a high probability of being a duplicate.
Assets falling between the low and high threshold are deemed to have a high possibility of being a duplicate: the proposed record addition/modification will not proceed until manually reviewed by EIDR operations staff.
Architecture
The components of the EIDR system are shown below.The principal functional blocks are as follows:Core Registry: This module is a customization and configuration of the CNRI Digital Object Repository. It performs various functions including registration, generation of unique identifiers, indexing, object storage management, and access control.Repository: This stores and provides access to registered objects; for EIDR, these objects are collections of metadata, not the media assets themselves. The metadata includes standard object information, relationships, and access control settings.REST API: A REST interface that provides access to the full set of non-administrative registry features. Services can make individual or batched calls, which can be dispatched synchronously or asynchronously. A general query syntax enables the retrieval of registry records satisfying a set of criteria specified by the caller.
- * EIDR SDK: this is provided to developers to facilitate the creation of third party applications. It comprises a Java SDK, a.NET SDK, and sample programs built upon the two SDKs. Using the SDK is recommended over direct calls to the REST API.
- * Command Line Tools: these are simple Java and.NET applications, built on the SDK, each of which provides a single function, such as resolve, query, match, and register.
- * Web UI: a Web-based user interface primarily for search, lookup, and browsing the object hierarchy. It also supports simple registrations.DOI Proxy: Using the handle prefix, this forwards EIDR DOI resolution requests to the EIDR registry.Handle System: Provides distributed lookup and resolution services
Relation to DOI and Handle System
An EIDR ID is a specialized example of a Digital Object Identifier, which in turn is built on top of the Handle System developed by the Corporation for National Research Initiatives. The EIDR-specific aspects of the lower layers are described in more detail below.Digital Object Identifier (EIDR Aspects)
A Digital Object Identifier, standardized as ISO 26324, seeks to uniquely identify a wide range of digital artifacts including books, recordings, research data, and other digital content. The goal is not just for the IDs to be unique, but persistent and immutable. As opposed to URLs, DOI identifiers stay the same even if the objects move to another location, or become owned by another organization. Here are some of the characteristics of DOI:- The International DOI Foundation enforces previously agreed rules on the constituent Registration Agencies to ensure continuity. In particular, if an RA ceases operation, the names it hosts will be taken over by another RA.
- The IDF defines rules to which all DOI names must adhere
- The DOI system provides a data model, based on a data dictionary, to enable a structured means of expressing metadata.
- The DOI system has its own highly redundant and distributed set of handle and proxy servers.
- All DOI prefixes are of the form "10.NNNN" where 10 is a directory indicator and "NNNN" is a registrant code in the range 1-65535
To foster interoperability between RAs, DOI has the concept of a metadata Kernel. This is a core set of metadata that all objects stored within the DOI framework should have. The full set may be found in the DOI handbook. Interoperability is a large topic extending beyond the scope of EIDR, but the following subset is particularly relevant to EIDR assets:
referent: an object maintained in the DOI system. referentName: the name of the referent primaryReferentType: For EIDR, this includes creation and party.structuralType: these are mutually exclusive categories that identify the form of an asset. Two particularly relevant to EIDR assets are an abstraction and performance.principalAgent: for creations, the entity principally responsible for its existence.registrationAuthorityCode: denotes the agency that issued the DOI. This would be the EIDR RA for EIDR assets.EIDR metadata is available in standard DOI kernel metadata format as well as EIDR-specific formats. The DOI for the DOI metadata schema is.
Handle System (EIDR Aspects)
DOI is in turn implemented on top of the Handle System, a distributed, highly scalable, name resolution service. A handle is defined as:The Naming Authority is globally unique and defines both an administrative space and the syntax of the Handle Local Name. For EIDR in the definition above, the "10.5240" is the EIDR Naming Authority, and is responsible for resolving the suffix. The range of allowable Naming Authorities is more general than is employed by DOI.
The distributed nature of the Handle System allows each local namespace to be hosted on multiple geographically distributed service sites. This is a federated model where each local name space has complete control over the placement and operation of its service sites. Furthermore, each service site may contain multiple resolution servers: requests directed to a particular service site will be dispatched evenly across its constituent servers.
The data model of the Handle System is simple but flexible. An arbitrary number of values may be associated with each handle. Over time, these values may be created, modified, and destroyed. Each such datum has the following attributes:index: an unsigned integer that identifies a data value from the others that may exist for this handle.type: a UTF-8 string identifying the type. The type system is extensible and common types are maintained as handles in the "0.TYPE" naming authority. There are no restrictions on the creation of new types, although using resolvable handles as type names is recommended best practice. Common types include URL for a single of indirection, "10320/loc" for a set of context-based resolution alternatives, and various administrative types for Handle System management, all of which are based on handle resolution.data: the value itself, represented as a sequence of octets which are interpreted in the context of the associated typepermission: access rights to this particular value. Note that different data values of a handle may have different permissionsTTL: an integer that specifies how long a value may be cachedtimestamp: an integer that records the last time the value was updatedreference: a list of references to other handle values. These are usually used to add credentials.
Accessing the Handle System is done via a wire protocol defined in RFC 3652; EIDR applications don't have to be concerned with this because of the layering of protocols.