Analyzed Layout and Text Object
Analyzed Layout and Text Object is an open XML schema originally developed by the EU-funded METAe project. ALTO files describe the placement, size, and style of text in an image of a digitized document, as well as other elements of the document's layout, such as margins, headings, columns, and illustrations.
The text and placement information in ALTO files is usually generated by specialized optical character recognition software, and is often used in combination with the Metadata Encoding and Transmission Standard to describe a larger digitized object and create references across ALTO files, as might be necessary to describe a reading sequence.
From version 1.0 in June 2004 to 1.4 in 2007, ALTO was developed and maintained by . In August 2009, maintenance for the schema was transferred to the Library of Congress, and from then overseen by a separate editorial board created for that purpose.
Structure
An ALTO file consists of three major sections as children of the root element:- section contains metadata about the ALTO file itself and processing information on how the file was created.
- section contains the text and paragraph styles with their individual descriptions:
- * has font descriptions
- * has paragraph descriptions, e.g. alignment information
- section contains the content information. It is subdivided into elements.