single |
Introduction
------------
This regex implementation is backwards-compatible with the standard 're'
module, but offers additional functionality.
Note
----
The re module's behaviour with zero-width matches changed in Python 3.7,
and this module follows that behaviour when compiled for Python 3.7.
Python 2
--------
Python 2 is no longer supported. The last release that supported Python 2
was 2021.11.10.
PyPy
----
This module is targeted at CPython. It expects that all codepoints are the
same width, so it won't behave properly with PyPy outside U+0000..U+007F
because PyPy stores strings as UTF-8.
Multithreading
--------------
The regex module releases the GIL during matching on instances of the
built-in (immutable) string classes, enabling other Python threads to run
concurrently. It is also possible to force the regex module to release the
GIL during matching by calling the matching methods with the keyword
argument ``concurrent=True``. The behaviour is undefined if the string
changes during matching, so use it *only* when it is guaranteed that that
won't happen.
Unicode
-------
This module supports Unicode 15.0.0. Full Unicode case-folding is
supported.
Flags
-----
There are 2 kinds of flag: scoped and global. Scoped flags can apply to
only part of a pattern and can be turned on or off; global flags apply to
the entire pattern and can only be turned on.
The scoped flags are: ``ASCII (?a)``, ``FULLCASE (?f)``, ``IGNORECASE
(?i)``, ``LOCALE (?L)``, ``MULTILINE (?m)``, ``DOTALL (?s)``, ``UNICODE
(?u)``, ``VERBOSE (?x)``, ``WORD (?w)``.
The global flags are: ``BESTMATCH (?b)``, ``ENHANCEMATCH (?e)``, ``POSIX
(?p)``, ``REVERSE (?r)``, ``VERSION0 (?V0)``, ``VERSION1 (?V1)``.
If neither the ASCII, LOCALE nor UNICODE flag is specified, it will default
to UNICODE if the regex pattern is a Unicode string and ASCII if it's a
bytestring.
The ENHANCEMATCH flag makes fuzzy matching attempt to improve the fit of
the next match that it finds.
The BESTMATCH flag makes fuzzy matching search for the best match instead
of the next match.
Old vs new behaviour
--------------------
In order to be compatible with the re module, this module has 2 behaviours:
* **Version 0** behaviour (old behaviour, compatible with the re module):
Please note that the re module's behaviour may change over time, and I'll
endeavour to match that behaviour in version 0.
* Indicated by the VERSION0 flag.
* Zero-width matches are not handled correctly in the re module before
Python 3.7. The behaviour in those earlier versions is:
* ``.split`` won't split a string at a zero-width match.
* ``.sub`` will advance by one character after a zero-width match.
* Inline flags apply to the entire pattern, and they can't be turned off.
* Only simple sets are supported.
* Case-insensitive matches in Unicode use simple case-folding by default.
* **Version 1** behaviour (new behaviour, possibly different from the re
module):
* Indicated by the VERSION1 flag.
* Zero-width matches are handled correctly.
* Inline flags apply to the end of the group or pattern, and they can be
turned off.
|