single |
Introduction
------------
This regex implementation is backwards-compatible with the standard 're'
module, but offers additional functionality.
Note
----
The re module's behaviour with zero-width matches changed in Python 3.7,
and this module will follow that behaviour when compiled for Python 3.7.
PyPy
----
This module is targeted at CPython. It expects that all codepoints are the
same width, so it won't behave properly with PyPy outside U+0000..U+007F
because PyPy stores strings as UTF-8.
Old vs new behaviour
--------------------
In order to be compatible with the re module, this module has 2 behaviours:
* **Version 0** behaviour (old behaviour, compatible with the re module):
Please note that the re module's behaviour may change over time, and I'll
endeavour to match that behaviour in version 0.
* Indicated by the VERSION0 or V0 flag, or ``(?V0)`` in the pattern.
* Zero-width matches are not handled correctly in the re module before
Python 3.7. The behaviour in those earlier versions is:
* ``.split`` won't split a string at a zero-width match.
* ``.sub`` will advance by one character after a zero-width match.
* Inline flags apply to the entire pattern, and they can't be turned off.
* Only simple sets are supported.
* Case-insensitive matches in Unicode use simple case-folding by default.
* **Version 1** behaviour (new behaviour, possibly different from the re
module):
* Indicated by the VERSION1 or V1 flag, or ``(?V1)`` in the pattern.
* Zero-width matches are handled correctly.
* Inline flags apply to the end of the group or pattern, and they can be
turned off.
* Nested sets and set operations are supported.
* Case-insensitive matches in Unicode use full case-folding by default.
If no version is specified, the regex module will default to
``regex.DEFAULT_VERSION``.
Case-insensitive matches in Unicode
-----------------------------------
The regex module supports both simple and full case-folding for
case-insensitive matches in Unicode. Use of full case-folding can be turned
on using the FULLCASE or F flag, or ``(?f)`` in the pattern. Please note
that this flag affects how the IGNORECASE flag works; the FULLCASE flag
itself does not turn on case-insensitive matching.
In the version 0 behaviour, the flag is off by default.
In the version 1 behaviour, the flag is on by default.
Nested sets and set operations
------------------------------
It's not possible to support both simple sets, as used in the re module,
and nested sets at the same time because of a difference in the meaning of
an unescaped ``"["`` in a set.
For example, the pattern ``[[a-z]--[aeiou]]`` is treated in the version 0
behaviour (simple sets, compatible with the re module) as:
* Set containing "[" and the letters "a" to "z"
* Literal "--"
* Set containing letters "a", "e", "i", "o", "u"
* Literal "]"
but in the version 1 behaviour (nested sets, enhanced behaviour) as:
* Set which is:
* Set containing the letters "a" to "z"
* but excluding:
|