Page Analysis and Ground Truth Elements
Page Analysis and Ground Truth Elements is an XML standard for encoding digitised documents. Comparable to ALTO, it allows the organisation and structure of a page and its contents to be described.
PAGE XML can be used to describe:
- page content
- the evaluation of the layout analysis
- the cutting of the document image
It was designed to be used in conjunction with automatic segmentation and transcription techniques : indeed, PAGE aims to support each of the different steps in the processing chain for image document analysis.
The PAGE XML schema is notably used as an export and import format by automatic transcription software such as eScriptorium and Transkribus. It is also an export format used by Kraken, a turnkey OCR system optimised for documents in historical and non-Latin scripts and by the OCR software Tesseract.