XPath
XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.
The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use, an XPath expression is often referred to simply as "an XPath".
Originally motivated by a desire to provide a common syntax and behavior model between XPointer and XSLT, subsets of the XPath query language are used in other W3C specifications such as XML Schema, XForms and the Internationalization Tag Set.
XPath has been adopted by a number of XML processing libraries and tools, many of which also offer CSS Selectors, another W3C standard, as a simpler alternative to XPath.
Versions
There are several versions of XPath in use. XPath 1.0 was published in 1999, XPath 2.0 in 2007, XPath 3.0 in 2014, and XPath 3.1 in 2017. However, XPath 1.0 is still the version that is most widely available.- XPath 1.0 became a Recommendation on 16 November 1999 and is widely implemented and used, either on its own, or embedded in languages such as XSLT, XProc, XML Schema or XForms.
- XPath 2.0 became a Recommendation on 23 January 2007, with a second edition published on 14 December 2010. A number of implementations exist but are not as widely used as XPath 1.0. The XPath 2.0 language specification is much larger than XPath 1.0 and changes some of the fundamental concepts of the language such as the type system.
- :The most notable change is that XPath 2.0 is built around the XQuery and XPath Data Model that has a much richer type system. Every value is now a sequence. XPath 1.0 node-sets are replaced by node sequences, which may be in any order.
- :To support richer type sets, XPath 2.0 offers a greatly expanded set of functions and operators.
- :XPath 2.0 is in fact a subset of XQuery 1.0. They share the same data model. It offers a
forexpression that is a cut-down version of the "FLWOR" expressions in XQuery. It is possible to describe the language by listing the parts of XQuery that it leaves out: the main examples are the query prolog, element and attribute constructors, the remainder of the "FLWOR" syntax, and thetypeswitchexpression. - XPath 3.0 became a Recommendation on 8 April 2014. The most significant new feature is support for functions as first-class values. XPath 3.0 is a subset of XQuery 3.0, and most current implementations exist as part of an XQuery 3.0 engine.
- XPath 3.1 became a Recommendation on 21 March 2017. This version adds new data types: maps and arrays, largely to underpin support for JSON.
Syntax and semantics (XPath 1.0)
The most important kind of expression in XPath is a location path. A location path consists of a sequence of location steps. Each location step has three components:- an [|axis]
- a [|node test]
- zero or more [|predicates].
The XPath syntax comes in two flavors: the abbreviated syntax, is more compact and allows XPaths to be written and read easily using intuitive and, in many cases, familiar characters and constructs. The full syntax is more verbose, but allows for more options to be specified, and is more descriptive if read carefully.
Abbreviated syntax
The compact notation allows many defaults and abbreviations for common cases. Given source XML containing at leastthe simplest XPath takes a form such as
-
/A/B/C
More complex expressions can be constructed by specifying an axis other than the default 'child' axis, a node test other than a simple name, or predicates, which can be written in square brackets after any step. For example, the expression
-
A//B/*
binds more tightly than the / operator. To select the first node selected by the expression A//B/*, write . Note also, index values in XPath predicates start from 1, not 0 as common in languages like C and Java.Expanded syntax
In the full, unabbreviated syntax, the two examples above would be written:: and then the node test, such as A or node in the examples above.Here the same, but shorter:
Axis specifiers
Axis specifiers indicate navigation direction within the tree representation of the XML document. The axes available are:| Full syntax | Abbreviated syntax | Notes |
| is short for | ||
| is short for | ||
| is short for | ||
| is short for | ||
| is short for |
As an example of using the attribute axis in abbreviated syntax,
//a/@href selects the attribute called href in a elements anywhere in the document tree.The expression . is most commonly used within a predicate to refer to the currently selected node.
For example,
h3 selects an element called h3 in the current context, whose text content is See also.Node tests
Node tests may consist of specific node names or more general expressions. In the case of an XML document in which the namespace prefixgs has been defined, //gs:enquiry will find all the enquiry elements in that namespace, and //gs:* will find all elements, regardless of local name, in that namespace.Other node test formats are:
; :finds an XML comment node, e.g.
; :finds a node of type text excluding any children, e.g. the
hello in hello world ; :finds XML processing instructions such as. In this case,
processing-instruction would match.; :finds any node at all.
Predicates
Predicates, written as expressions in square brackets, can be used to filter a node-set according to some condition. For example,a returns a node-set, and keeps only those elements having an href attribute with the value help.php.There is no limit to the number of predicates in a step, and they need not be confined to the last step in an XPath. They can also be nested to any depth. Paths specified in predicates begin at the context of the current step and do not alter that context. All predicates must be satisfied for a match to occur.
When the value of the predicate is numeric, it is syntactic-sugar for comparing against the node's position in the node-set. So
p is shorthand for and selects the first p element child, while p is shorthand for and selects the last p child of the context node.In other cases, the value of the predicate is automatically converted to a Boolean. When the predicate evaluates to a node-set, the result is true when the node-set is. Thus
p selects those p elements that have an attribute named x.A more complex example: the expression selects the value of the
target attribute of the first a element among the children of the context node that has its href attribute set to help.php, provided the document's html top-level element also has a lang attribute set to en. The reference to an attribute of the top-level element in the first predicate affects neither the context of other predicates nor that of the location step itself.Predicate order is significant if predicates test the position of a node. Each predicate takes a node-set returns a smaller node-set. So will find a match only if the first
a child of the context node satisfies the condition @href='help.php', while will find the first a child that satisfies this condition.Functions and operators
XPath 1.0 defines four data types: node-sets, strings, numbers and Booleans.The available operators are:
- The, and operators, used in path expressions, as described above.
- A union operator,, which forms the union of two node-sets.
- Boolean operators and, and a function
- Arithmetic operators,,, , and
- Comparison operators,,,,,
- Functions to manipulate strings:
- Functions to manipulate numbers:
- Functions to get properties of nodes:
- Functions to get information about the processing context:
- Type conversion functions: