Regular expressionsCommon needseditdelete

Regular expressions (or regexps) are lightweight parsers – e.g. instead of many1 digit (with Parsec) you might write [0-9]+ as a regular expression, and then use it to find all numbers in a string. Regular expressions can also contain capturing groups – e.g. if you want to extract fractional numbers that look like “123.45” from the text, you could write [0-9]+\.[0-9]+ and get “123.45” as a string, or you could write ([0-9]+)\.([0-9]+) and then you'd be able to extract the first and second parts from each match.

Some libraries also provide functions to do replacement based on regexps, and to split a string with the separator specified as a regexp.

Generally, regexps aren't a good replacement for parsers (the terse syntax makes them awkward to use, and they're error-prone), but sometimes they're faster and more convenient.

PCRE vs POSIX

There are 2 main flavors of regexps – PCRE and POSIX. PCRE-style expressions are used in PHP, Perl, Javascript; POSIX-style expressions are used in grep (by default), PostgreSQL and other places. Here's a description of the difference between them. Despite the differences, simple regular expressions look the same in both flavors.

Recommendations

So many libraries to choose from, ugh.

I recommend text-icu, merely because it's bindings to a well-known library, by a well-known developer (Bryan O'Sullivan, the author of aeson, attoparsec, and Real World Haskell). If you want POSIX expressions, then you might try regex-tdfa, or regex-compat-tdfa for a simpler interface. regex-applicative is a parsing library that is somewhere between regexes and parsers – it's not cumbersome to use, but it's less terse than regexes.

Benchmarks

TODO: add benchmarks

edit description

Regular expressions (or regexps) are lightweight parsers – e.g. instead of `many1 digit` (with Parsec) you might write `[0-9]+` as a regular expression, and then use it to find all numbers in a string. Regular expressions can also contain capturing groups – e.g. if you want to extract fractional numbers that look like “123.45” from the text, you could write `[0-9]+\.[0-9]+` and get “123.45” as a string, or you could write `([0-9]+)\.([0-9]+)` and then you'd be able to extract the first and second parts from each match.

Some libraries also provide functions to do replacement based on regexps, and to split a string with the separator specified as a regexp.

Generally, regexps aren't a good replacement for parsers (the terse syntax makes them awkward to use, and they're error-prone), but sometimes they're faster and more convenient.

# PCRE vs POSIX

There are 2 main flavors of regexps – PCRE and POSIX. PCRE-style expressions are used in PHP, Perl, Javascript; POSIX-style expressions are used in grep (by default), PostgreSQL and other places. Here's a description of [the difference between them](https://wiki.haskell.org/Regular_expressions#.28apple.7Corange.29). Despite the differences, simple regular expressions look the same in both flavors.

# Recommendations

So many libraries to choose from, ugh.

I recommend [text-icu](@hk), merely because it's bindings to a well-known library, by a well-known developer (Bryan O'Sullivan, the author of aeson, attoparsec, and Real World Haskell). If you want POSIX expressions, then you might try [regex-tdfa](@hk), or [regex-compat-tdfa](@hk) for a simpler interface. [regex-applicative](@hk) is a parsing library that is somewhere between regexes and parsers – it's not cumbersome to use, but it's less terse than regexes.

# Benchmarks

TODO: add benchmarks

or press Ctrl+Enter to save

regex-tdfa (Hackage)

POSIX

Summary

Uses regex-base. Seems to be the most popular library.

Regex flavor: POSIX.

Summary

Pros

Handles corner cases better than other POSIX implementations (including glibc and the rest).
Doesn't require any installed libraries (since it's written in pure Haskell).

press Ctrl+Enter or Enter to add

Aelve Guide | Haskell

Regular expressionsCommon needseditdelete

PCRE vs POSIX

Recommendations

Benchmarks

Links

Imports and pragmas

Search

Replacement

Splitting

Regex settings