Standard Generalized Markup Language
The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
- Declarative: Markup should describe a document's structure and other attributes rather than specify the processing that needs to be performed, because it is less likely to conflict with future developments.
- Rigorous: In order to allow markup to take advantage of the techniques available for processing, markup should rigorously define objects like programs and databases.
Standard versions
SGML is an ISO standard: "ISO 8879:1986 Information processing – Text and office systems – Standard Generalized Markup Language ", of which there are three versions:- Original SGML, which was accepted in October 1986, followed by a minor Technical Corrigendum.
- SGML , in 1996, resulted from a Technical Corrigendum to add extended naming rules allowing arbitrary-language and -script markup.
- SGML , in 1998, resulted from a to better support XML and WWW requirements.
- SGML – Generalized markup language
- * SGML was reworked in 1998 into XML, a successful profile of SGML. Full SGML is rarely found or used in new projects.
- DSSSL – Document processing and styling language based on Scheme.
- * DSSSL was reworked into W3C XSLT and XSL-FO which use an XML syntax. Nowadays, DSSSL is rarely used in new projects apart from Linux documentation.
- HyTime – Generalized hypertext and scheduling.
- * HyTime was partially reworked into W3C XLink. HyTime is rarely used in new projects.
- ISO/IEC TR 9573 – Information processing – SGML support facilities – Techniques for using SGML
- * Part 13: Public entity sets for mathematics and science
- ** In 2007, the W3C MathML working group agreed to assume the maintenance of these entity sets.
History
Document validity
SGML defines two kinds of validity. According to the revised Terms and Definitions of ISO 8879 :
A conforming SGML document must be either a type-valid SGML document, a tag-valid SGML document, or both. Note: A user may wish to enforce additional constraints on a document, such as whether a document instance is integrally-stored or free of entity references.
A type-valid SGML document is defined by the standard as:
An SGML document in which, for each document instance, there is an associated document type declaration to whose DTD that instance conforms.
A tag-valid SGML document is defined by the standard as:
An SGML document, all of whose document instances are fully tagged. There need not be a document type declaration associated with any of the instances. Note: If there is a document type declaration, the instance can be parsed with or without reference to it.
Terminology
Tag-validity was introduced in SGML to support XML which allows documents with no DOCTYPE declaration but which can be parsed without a grammar, or documents which have a DOCTYPE declaration that makes no XML Infoset contributions to the document. The standard calls this fully tagged. Integrally stored reflects the XML requirement that elements end in the same entity in which they started. Reference-free reflects the HTML requirement that entity references are for special characters and do not contain markup. SGML validity commentary, especially commentary that was made before 1997 or that is unaware of SGML, covers type-validity only.The SGML emphasis on validity supports the requirement for generalized markup that ''markup should be rigorous.''
Syntax
An SGML document may have three parts:- the SGML Declaration,
- the Prologue, containing a DOCTYPE declaration with the various markup declarations that together make a Document Type Definition, and
- the instance itself, containing one top-most element and its contents.
Although full SGML allows implicit markup and some other kinds of tags, the XML specification states:
For introductory information on a basic, modern SGML syntax, see XML. The following material concentrates on features not in XML and is not a comprehensive summary of SGML syntax.
Optional features
SGML generalizes and supports a wide range of markup languages as found in the mid 1980s. These ranged from terse Wiki-like syntaxes to RTF-like bracketed languages to HTML-like matching-tag languages. SGML did this by a relatively simple default reference concrete syntax augmented with a large number of optional features that could be enabled in the SGML Declaration. Not every SGML parser can necessarily process every SGML document. Because each processor's System Declaration can be compared to the document's SGML Declaration it is always possible to know whether a document is supported by a particular processor.Many SGML features relate to markup minimization. Other features relate to concurrent markup, to linking processing attributes, and to embedding SGML documents within SGML documents.
The notion of customizable features was not appropriate for Web use, so one goal of XML was to minimize optional features. However, XML's well-formedness rules cannot support Wiki-like languages, leaving them unstandardized and difficult to integrate with non-text information systems.
Concrete and abstract syntaxes
The usual SGML concrete syntax resembles this example, which is the default HTML concrete syntax:
typically something likethis
SGML provides an abstract syntax that can be implemented in many different types of concrete syntax. Although the markup norm is using angle brackets as start- and end-tag delimiters in an SGML document, it is possible to use other characters—provided a suitable concrete syntax is defined in the document's SGML declaration. For example, an SGML interpreter might be programmed to parse GML, wherein the tags are delimited with a left colon and a right full stop, and an
:e prefix denotes an end tag: :xmp.Hello, world:exmp.. According to the reference syntax, letter case is not distinguished in tag names, so the three tags <quote>, <QUOTE>, and <quOtE> are equivalent.Markup minimization
SGML has features for reducing the number of characters required to mark up a document, which must be enabled in the SGML Declaration. SGML processors need not support every available feature, thus allowing applications to tolerate many types of inadvertent markup omissions; however, SGML systems usually are intolerant of invalid structures. XML is intolerant of syntax omissions, and does not require a DTD for checking well-formedness.OMITTAG
Both start tags and end tags may be omitted from a document instance, provided:- the OMITTAG feature is enabled in the SGML Declaration,
- the DTD indicates that the tags are permitted to be omitted,
- the element has no associated required attributes, and
- the tag can be unambiguously inferred by context.
then this excerpt:
...
which omits two tags and two tags, would represent valid markup.
Omitting tags is optional – the same excerpt could be tagged like this:
...
and would still represent valid markup.
Note: The OMITTAG feature is unrelated to the tagging of elements whose declared content is
EMPTY as defined in the DTD:Elements defined like this have no end tag, and specifying one in the document instance would result in invalid markup. This is syntactically different from XML empty elements in this regard.