Data Format Description Language
Data Format Description Language is a modeling language for describing general text and binary data in a standard way. It was published as an Open Grid Forum Recommendation in February 2021, and in April 2024 was published as an ISO standard.
A DFDL model or schema allows any text or binary data to be read from its native format and to be presented as an instance of an information set.. The same DFDL schema also allows data to be taken from an instance of an information set and written out to its native format.
DFDL is descriptive and not prescriptive. DFDL is not a data format, nor does it impose the use of any particular data format. Instead it provides a standard way of describing many different kinds of data formats. This approach has several advantages. It allows an application author to design an appropriate data representation according to their requirements while describing it in a standard way which can be shared, enabling multiple programs to directly interchange the data.
DFDL achieves this by building upon the facilities of W3C XML Schema 1.0. A subset of XML Schema is used, enough to enable the modeling of non-XML data. The motivations for this approach are to avoid inventing a completely new schema language, and to make it easy to convert general text and binary data, via a DFDL information set, into a corresponding XML document.
Educational material is available in the form of DFDL Tutorials, videos and several hands-on DFDL labs.
History
DFDL was created in response to a need for grid APIs to be able to understand data regardless of source. A language was needed capable of modeling a wide variety of existing text and binary data formats. A was established at the Global Grid Forum in 2003 to create a specification for such a language.A decision was made early on to base the language on a subset of W3C XML Schema, using
Work continued on the language, resulting in the publication of a DFDL 1.0 specification as OGF Proposed Recommendation GFD.174 in January 2011.
The official OGF Recommendation is now published in February 2021 which obsoletes all prior versions and incorporates all issues noted to date. A of DFDL and its features is available at the OGF. Any issues with the specification are being tracked using GitHub .
In April 2024, DFDL was published as by way of the process. The standard is available from ISO but will remain publicly available from the Open Grid Forum as well.
Implementations
Implementations of DFDL processors that can parse and serialize data using DFDL schemas are available.- IBM has multiple DFDL implementations.
- * a production-ready DFDL 1.0 streaming parser, modeler and visual tester. This is available in several IBM products including IBM App Connect Enterprise. A is available.
- * which is part of the IBM Mainframe z/Transaction Processing Facility.
- is an open-source DFDL processor having both parser and unparser, an IDE that is an extension of VSCode, as well as integrations into Apache NiFi, the, and . It continues to be under active development.
- European Space Agency project includes a parser DFDL4S that implements a subset of the DFDL 1.0 specification.
Example
Take as an example the following text data stream which gives the name, age and location of a person:The logical model for this data can be described by the following fragment of an XML Schema document. The order, names, types and cardinality of the fields are expressed by the XML schema model.
To additionally model the physical representation of the data stream, DFDL augments the XML schema fragment with annotations on the xs:element and xs:sequence objects, as follows:
The property attributes on these DFDL annotations express that the data are represented in an ASCII text format with fields being of variable length and delimited by commas
An alternative, more compact syntax is also provided, where DFDL properties are carried as non-native attributes on the XML Schema objects themselves.
dfdl:textNumberRep="standard" dfdl:textNumberPattern="##0" dfdl:textNumberBase="10"/>
Features
The goal of DFDL is to provide a rich modeling language capable of representing any text or binary data format. The 1.0 release is a major step towards this goal. The capability includes support for:- Text data types such as strings, numbers, zoned decimals, calendars and Booleans
- Binary data types such as two's complement integers, BCD, packed decimals, floats, calendars and Booleans
- Fixed length data and data delimited by text or binary markup
- Language data structures found in languages like COBOL, C and PL/1
- Industry standards such as CSV, SWIFT, FIX, HL7, X12, HIPAA, EDIFACT, ISO 8583
- Any encoding and endian-ness
- Bit data of arbitrary length
- Pattern languages for text numbers and calendars
- Ordered, unordered and floating content
- Default values on parsing and serializing
- Nil values capability for handling out-of-band data
- Fixed and variable arrays
- XPath 2.0 expression language including variables to model dynamic data
- Speculative parsing and other mechanisms to resolve choices and optionality
- Validation to XML Schema 1.0 rules
- A scoping mechanism that allows common property values to be applied at multiple annotation points
- Hiding elements in the data from the information set
- Calculating element values for the information set