Regular expressionsCommon needs
Regular expressions (or regexps) are lightweight parsers – e.g. instead of many1 digit
(with Parsec) you might write [0-9]+
as a regular expression, and then use it to find all numbers in a string. Regular expressions can also contain capturing groups – e.g. if you want to extract fractional numbers that look like “123.45” from the text, you could write [0-9]+\.[0-9]+
and get “123.45” as a string, or you could write ([0-9]+)\.([0-9]+)
and then you'd be able to extract the first and second parts from each match.
Some libraries also provide functions to do replacement based on regexps, and to split a string with the separator specified as a regexp.
Generally, regexps aren't a good replacement for parsers (the terse syntax makes them awkward to use, and they're error-prone), but sometimes they're faster and more convenient.
PCRE vs POSIX
There are 2 main flavors of regexps – PCRE and POSIX. PCRE-style expressions are used in PHP, Perl, Javascript; POSIX-style expressions are used in grep (by default), PostgreSQL and other places. Here's a description of the difference between them. Despite the differences, simple regular expressions look the same in both flavors.
Recommendations
So many libraries to choose from, ugh.
I recommend text-icu, merely because it's bindings to a well-known library, by a well-known developer (Bryan O'Sullivan, the author of aeson, attoparsec, and Real World Haskell). If you want POSIX expressions, then you might try regex-tdfa, or regex-compat-tdfa for a simpler interface. regex-applicative is a parsing library that is somewhere between regexes and parsers – it's not cumbersome to use, but it's less terse than regexes.
Benchmarks
TODO: add benchmarks
Uses regex-base. Seems to be the most popular library.
Regex flavor: POSIX.
-
Handles corner cases better than other POSIX implementations (including glibc and the rest).
-
Doesn't require any installed libraries (since it's written in pure Haskell).
-
regex-genex can generate all strings matching some regex, and quickcheck-regex can use that to generate test cases for Quickcheck
-
regex-compat-tdfa is a wrapper over regex-tdfa with a simple interface
<notes are empty>
Bindings to International Components for Unicode, which among other things provides regexes. See this section of the Data.Text.ICU
module.
Regex flavor: PCRE.
<notes are empty>
- regex-easy – a convenience wrapper (TODO: probably useless?)
- pcre-heavy – another convenience wrapper
- rex – quasiquoter
<notes are empty>
<notes are empty>
<notes are empty>
<notes are empty>
pcre-utils – provides split/replace
<notes are empty>
<notes are empty>
<notes are empty>
<notes are empty>
<notes are empty>