Perl Compatible Regular Expressions
Perl Compatible Regular Expressions is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and than that of many other regular-expression libraries.
While PCRE originally aimed at feature-equivalence with Perl, the two implementations are not fully equivalent. During the PCRE 7.x and Perl 5.9.x phase, the two projects coordinated development, with features being ported between them in both directions.
In 2015, a fork of PCRE was released with a revised programming interface. The original software, now called PCRE1, has had bugs mended, but no further development., it is considered obsolete, and the current 8.45 release is likely to be the last. The new PCRE2 code has had a number of extensions and coding improvements and is where development takes place.
A number of prominent open-source programs, such as the Apache and Nginx HTTP servers, and the PHP and R scripting languages, incorporate the PCRE library; proprietary software can do likewise, as the library is BSD-licensed. As of Perl 5.10, PCRE is also available as a replacement for Perl's default regular-expression engine through the
The library can be built on Unix, Windows, and several other environments. PCRE2 is distributed with a POSIX C wrapper, several test programs, and the utility program
Features
Just-in-time compiler support
The just-in-time compiler can be enabled when the PCRE2 library is built. Large performance benefits are possible when the calling program utilizes the feature with compatible patterns that are executed repeatedly. The just-in-time compiler support was written by Zoltan Herczeg and is not addressed in the POSIX wrapper.Flexible memory management
The use of the system stack for backtracking can be problematic in PCRE1, which is why this feature of the implementation was changed in PCRE2. The heap is now used for this purpose, and the total amount can be limited. The problem of stack overflow, which came up regularly with PCRE1, is no longer an issue with PCRE2 from release 10.30.Consistent escaping rules
Like Perl, PCRE2 has consistent escaping rules: any non-alpha-numeric character may be escaped to mean its literal value by prefixing aExtended character classes
Single-letter character classes are supported in addition to the longer POSIX names. For example,Minimal matching (a.k.a. "ungreedy")
AIf the
U flag is set, then quantifiers are ungreedy by default, while ? makes them greedy.Unicode character properties
defines several properties for each character. Patterns in PCRE2 can match these properties: e.g.Multiline matching
Newline/linebreak options
When PCRE is compiled, a newline default is selected. Which newline/linebreak is in effect affects where PCRE detectsThe newline option can be altered with external options when PCRE is compiled and when it is run. Some applications using PCRE provide users with the means to apply this setting through an external option. So the newline option can also be stated at the start of the pattern using one of the following:
-
Newline is a linefeed character. Corresponding linebreaks can be matched with \n . -
Newline is a carriage return. Corresponding linebreaks can be matched with \r . -
Newline/linebreak is a carriage return followed by a linefeed. Corresponding linebreaks can be matched with \r\n . -
Any of the above encountered in the data will trigger newline processing. Corresponding linebreaks can be matched with or with \R . See below for configuration and options concerning what matches backslash-R. -
Any of the above plus special Unicode linebreaks.
In UTF-8 mode, two additional characters are recognized as line breaks with
- LS,
- PS.
For example,
See below for configuration and options concerning what matches backslash-R.
Backslash-R options
When PCRE is compiled, a default is selected for what matches option can be provided in addition to a rest-of-pattern. The backslash-R options also can be changed with external options by the application calling PCRE2, when a pattern is compiled.