Document type definition


A document type definition is a specification file that contains a set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.
A DTD defines the valid building blocks of an XML document. It defines the document structure with a list of validated elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.
A namespace-aware version of DTDs is being developed as Part 9 of ISO DSDL. DTDs persist in applications that need special publishing characters, such as the XML and HTML Character Entity References, which derive from larger sets defined as part of the ISO SGML standard effort. XML uses a subset of SGML DTD.
, newer XML namespace-aware schema languages have largely superseded DTDs as a better way to validate XML structure.

Associating DTDs with documents

A DTD is associated with an XML or SGML document by means of a document type declaration. The DOCTYPE appears in the syntactic fragment doctypedecl near the start of an XML document. The declaration establishes that the document is an instance of the type defined by the referenced DTD.
DOCTYPEs make two sorts of declarations:
  • an optional external subset
  • an optional internal subset.
The declarations in the internal subset form part of the DOCTYPE in the document itself. The declarations in the external subset are located in a separate text file. The external subset may be referenced via a public identifier and/or a system identifier. Programs for reading documents may not be required to read the external subset.
Any valid SGML or XML document that references an external subset in its DTD, or whose body contains references to parsed external [|entities] declared in its DTD, may only be partially parsed but cannot be fully validated by validating SGML or XML parsers in their standalone mode.
However, such documents are still fully parsable in the non-standalone mode of validating parsers, which signals an error if it can not locate these external entities with their specified public identifier or system identifier, or are inaccessible.. Non-validating parsers may eventually attempt to locate these external entities in the non-standalone mode, but do not validate the content model of these documents.

Examples

The following example of a DOCTYPE contains both public and system identifiers:

XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

All HTML 4.01 documents conform to one of three SGML DTDs. The public identifiers of these DTDs are constant and are as follows:
The system identifiers of these DTDs, if present in the DOCTYPE, are URI references. A system identifier usually points to a specific set of declarations in a resolvable location. SGML allows mapping public identifiers to system identifiers in catalogs that are optionally available to the URI resolvers used by document parsing software.
This DOCTYPE can only appear after the optional XML declaration, and before the document body, if the document syntax conforms to XML. This includes XHTML documents:


"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

...


An additional internal subset can also be provided after the external subset:


"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >

...


Alternatively, only the internal subset may be provided:




...


Finally, the document type definition may include no subset at all; in that case, it just specifies that the document has a single top-level element, and it indicates the type name of the root element:




...

Markup declarations

DTDs describe the structure of a class of documents via element and attribute-list declarations. Element declarations name the allowable set of elements within the document, and specify whether and how declared elements and runs of character data may be contained within each element. Attribute-list declarations name the allowable set of attributes for each declared element, including the type of each attribute value, if not an explicit set of valid values.
DTD markup declarations declare which element types, attribute lists, entities, and notations are allowed in the structure of the corresponding class of XML documents.

Element type declarations

An [|element type] declaration defines an element and its possible content. A valid XML document contains only elements that are defined in the DTD.
Various keywords and characters specify an element's content:
  • EMPTY for specifying that the defined element allows no content, i.e., it cannot have any children elements, not even text elements ;
  • ANY for specifying that the defined element allows any content, without restriction, i.e., that it may have any number and type of children elements ;
  • or an expression, specifying the only elements allowed as direct children in the content of the defined element; this content can be either:
  • * a mixed content, which means that the content may include at least one text element and zero or more named elements, but their order and number of occurrences cannot be restricted; this can be:
  • ** : historically meaning parsed character data, this means that only one text element is allowed in the content ;
  • ** *: a limited choice of two or more child elements may be used in any order and number of occurrences in the content.
  • * an element content, which means that there must be no text elements in the children elements of the content. Such element content is specified as content particle in a variant of Backus–Naur form without terminal symbols and element names as non-terminal symbols. Element content consists of:
  • ** a content particle can be either the name of an element declared in the DTD, or a sequence list or choice list. It may be followed by an optional quantifier.
  • *** a sequence list means an ordered list of one or more content particles: all the content particles must appear successively as direct children in the content of the defined element, at the specified position and relative order;
  • *** a choice list means a mutually exclusive list of two or more content particles: only one of these content particles may appear in the content of the defined element at the same position.
  • ** A quantifier is a single character that immediately follows the specified item it applies to, to restrict the number of successive occurrences of these items at the specified position in the content of the element; it may be either:
  • *** + for specifying that there must be one or more occurrences of the item — the effective content of each occurrence may be different;
  • *** * for specifying that any number of occurrences is allowed — the item is optional and the effective content of each occurrence may be different;
  • *** ? for specifying that there must not be more than one occurrence — the item is optional;
  • *** If there is no quantifier, the specified item must occur exactly one time at the specified position in the content of the element.
For example:




Element type declarations are ignored by non-validating SGML and XML parsers, but these declarations are still checked for form and validity.

Attribute list declarations

An [|attribute list] specifies for a given element type the list of all possible attribute associated with that type. For each possible attribute, it contains:
  • the declared name of the attribute,
  • its data type,
  • and its default value.
For example:

src CDATA #REQUIRED
id ID #IMPLIED
sort CDATA #FIXED "true"
print "yes"

Here are some attribute types supported by both SGML and XML:
; CDATA: this type means characters data and indicates that the effective value of the attribute can be any textual value, unless the attribute is specified as fixed ;
; ID: the effective value of the attribute must be a valid identifier, and it is used to define and anchor to the current element the target of references using this defined identifier ; it is an error if distinct elements in the same document are defining the same identifier; the uniqueness constraint also implies that the identifier itself carries no other semantics and that identifiers must be treated as opaque in applications; XML also predefines the standard pseudo-attribute "xml:id" with this type, without needing any declaration in the DTD, so the uniqueness constraint also applies to these defined identifiers when they are specified anywhere in a XML document.
; IDREF or IDREFS: the effective value of the attribute can only be a valid identifier and must be referencing the unique element defined in the document with an attribute declared with the type ID in the DTD and whose effective value is the same identifier;
; NMTOKEN or NMTOKENS: the effective value of the attribute can only be a valid name token, but it is not restricted to a unique identifier within the document; this name may carry supplementary and application-dependent semantics and may require additional naming constraints, but this is out of scope of the DTD;
; ENTITY or ENTITIES: the effective value of the attribute can only be the name of an unparsed external entity, which must also be declared in the document type declaration; this type is not supported in HTML parsers, but is valid in SGML and XML 1.0 or 1.1 ;
; : the effective value of the attribute can only be one of the enumerated list of textual values, where each value in the enumeration is possibly specified between 'single' or "double" quotation marks if it's not a simple name token;
; NOTATION : the effective value of the attribute can only be any one of the enumerated list of [|notation] names, where each notation name in the enumeration must also be declared in the document type declaration; this type is not supported in HTML parsers, but is valid in SGML and XML 1.0 or 1.1.
A default value can define whether an attribute must occur or not, or whether it has a fixed value, or which value should be used as a default value in case the given attribute is left out in an XML tag.
Attribute list declarations are ignored by non-validating SGML and XML parsers, but these declarations are still checked for well-formedness and validity.