Augmented Backus–Naur form
In computer science, augmented Backus–Naur form is a metalanguage based on Backus–Naur form but consisting of its own syntax and derivation rules. The motivation for ABNF is to define an easily usable tool for defining the format of communications protocol payload objects and protocol units. It is defined by, which was, and it often serves as the definition language for IETF communication protocols.
supersedes. updates it, adding a syntax for specifying case-sensitive string literals.
Overview
An ABNF specification is a set of derivation rules, written aswhere rule is a case-insensitive nonterminal, the definition consists of sequences of symbols that define the rule, a comment for documentation, and ending with a carriage return and line feed.
Rule names are case-insensitive:
, , , and all refer to the same rule. Rule names consist of a letter followed by letters, numbers, and hyphens.Angle brackets are not required around rule names. However, they may be used to delimit a rule name when used in prose to discern a rule name.
Terminal values
Terminal symbols are the basic building blocks in ABNF. The nonterminal symbols are built on top of them. They specify a sequence of characters that are literally matched.Terminal values may be specified numerically, in which case it corresponds to a character code or a sequence of codes. Such a value is specified as the percent sign
%, followed by the base, followed by the value, or concatenation of values. For example, a carriage return is specified by %d13 in decimal or %x0D in hexadecimal. A carriage return followed by a line feed may be specified with concatenation as %d13.10.Literal text is specified through the use of a string enclosed in quotation marks. These strings are case-insensitive, and the character set used is ASCII. Therefore, the string
"abc" will match "abc", "Abc", "aBc", "abC", "ABc", "AbC", "aBC", and "ABC". RFC 7405 added a syntax for case-sensitive strings: %s"aBc" will only match "aBc". Prior to that, a case-sensitive string could only be specified by listing the individual characters: to match "aBc", the definition would be %d97.66.99. A string can also be explicitly specified as case-insensitive with a %i prefix.A further type of building-block value is the "prose-val", a bracketed string describing what a rule is meant to match in natural language prose. This is only to be used as a last resort as ABNF grammar that use such a construct cannot be automatically implemented by a computer program.
Operators
White space
White space is used to separate elements of a definition; for space to be recognized as a delimiter, it must be explicitly included. The explicit reference for a single whitespace character isWSP, and LWSP is for zero or more whitespace characters with newlines permitted.Definitions are left-aligned. When multiple lines are required, continuation lines are indented by whitespace.
Comment
; commentA semicolon starts a comment that continues to the end of the line.
Concatenation
Rule1 Rule2A rule may be defined by listing a sequence of rule names.
To match the string “aba”, the following rules could be used:
*
Alternative
Rule1 / Rule2A rule may be defined by a list of alternative rules separated by a solidus.
To accept the rule fu or the rule bar, the following rule could be constructed:
*
Incremental alternatives
Rule1 =/ Rule2Additional alternatives may be added to a rule through the use of
=/ between the rule name and the definition.The rule
is therefore equivalent to
*
Value range
%c##-##A range of numeric values may be specified through the use of a hyphen.
The rule
is equivalent to
*
Sequence group
Elements may be placed in parentheses to group rules in a definition.
To match "a b d" or "a c d", the following rule could be constructed:
To match “a b” or “c d”, the following rules could be constructed:
*
Variable repetition
n*nRuleTo indicate repetition of an element, the form
<a>*<b>element is used. The optional <a> gives the minimal number of elements to be included. The optional <b> gives the maximal number of elements to be included.Use
*element for zero or more elements, *1element for zero or one element, 1*element for one or more elements, and 2*3element for two or three elements, cf. regular expressions e*, e?, e+ and e.Specific repetition
nRuleTo indicate an explicit number of elements, the form
<a>element is used and is equivalent to <a>*<a>element.Use
2DIGIT to get two numeric digits, and 3DIGIT to get three numeric digits.Optional sequence
To indicate an optional element, the following constructions are equivalent:
*
Operator precedence
The following operators have the given precedence from tightest binding to loosest binding:- Strings, names formation
- Comment
- Value range
- Repetition
- Grouping, optional
- Concatenation
- Alternative
Core rules
The core rules are defined in the ABNF standard. They provide definitions for commonly-used constructs. The exact content of the core rules depends on the code page in use, but for the Internet's common baseline of 7-bit ASCII, it is defined as:| Rule | Formal definition | Meaning |
| ALPHA | Upper- and lower-case ASCII letters | |
| DIGIT | Decimal digits | |
| HEXDIG | Hexadecimal digits | |
| DQUOTE | Double quote | |
| SP | Space | |
| HTAB | Horizontal tab | |
| WSP | Space and horizontal tab | |
| LWSP | Linear white space | |
| VCHAR | Visible characters | |
| CHAR | Any ASCII character, excluding NUL | |
| OCTET | 8 bits of data | |
| CTL | Controls | |
| CR | Carriage return | |
| LF | Linefeed | |
| CRLF | Internet-standard newline | |
| BIT | Binary digit |
LWSP
The ABNF language originated in, the 1977 standard for ARPA NETWORK TEXT MESSAGES, an early form of email. It defined the "linear-white-space" as a deliminator in mail headers:linear-white-space = 1* ; similar to modern LWSP
LWSP-char = SPACE / HTAB ; equivalent to modern WSP
The ABNF language was described independently of email in of 1997. It included the "LWSP" rule in the modern form, which does not have the
1 part specifying the minimum repetition. This is quite unusual because logically speaking, at least one whitespace character is needed to form a delimiter between two fields. This difference was noticed in RFC Erratum 3096 of 2012, but by then it was too late to change the definition as other standards had already used the LWSP rule for their own purposes.of 2008 adds a warning in conjunction to the definition of LWSP as follows, referring to its departure from email standards:
The contemporary email standard, of 2008, does not use such a term as "linear white space", nor does it use the predefined LWSP value. In its place it uses folding whitespace :
FWS = / obs-FWS
; Folding white space
obs-FWS = 1*WSP * ; Obsolete folding white space
; equivalent to: LWSP-char
; equivalent to: WSP LWSP
Examples
US postal address
The postal address example given in the augmented Backus–Naur form page may be specified as follows:postal-address = name-part street zip-part
name-part = * last-name CRLF
name-part =/ personal-part CRLF
personal-part = first-name /
first-name = *ALPHA
initial = ALPHA
last-name = *ALPHA
suffix =
street = house-num SP street-name CRLF
apt = 1*4DIGIT
house-num = 1*8
street-name = 1*VCHAR
zip-part = town-name "," SP state 1*2SP zip-code CRLF
town-name = 1*
state = 2ALPHA
zip-code = 5DIGIT
ABNF representation of itself
ABNF's syntax itself may be represented with a ABNF like the following:rulelist = 1*
rule = rulename defined-as elements c-nl
; continues if next line starts
; with white space
rulename = ALPHA *
defined-as = *c-wsp *c-wsp
; basic rules definition and
; incremental alternatives
elements = alternation *WSP
c-wsp = WSP /
c-nl = comment / CRLF
; comment or newline
comment = ";" * CRLF
alternation = concatenation
*
concatenation = repetition *
repetition = element
repeat = 1*DIGIT /
element = rulename / group / option /
char-val / num-val / prose-val
group = ""
option = ""
char-val = DQUOTE * DQUOTE
; quoted string of SP and VCHAR
; without DQUOTE
num-val = "%"
bin-val = "b" 1*BIT
; series of concatenated bit values
; or single ONEOF range
dec-val = "d" 1*DIGIT
hex-val = "x" 1*HEXDIG
prose-val = "<" * ">"
; bracketed string of SP and VCHAR
; without angles
; prose description, to be used as
; last resort