Regular expressions – Haskell – Aelve Guide

regex-type

2016-08-21T22:59:44Z

regex-type (Hackage)

Pros

Cons

relit

2016-04-09T14:08:28Z

relit (Hackage)

Not a library, but a quasiquoter for various regex* libraries.

Pros

Cons

regex-pderiv

2016-04-09T14:08:20Z

regex-pderiv (Hackage)

Pros

Cons

regex-posix

2016-04-09T14:07:23Z

regex-posix (Hackage)

The library is bundled.

Regex flavor: POSIX.

Pros

Cons

Ecosystem

lens-regex

regex-pcre

2016-04-09T14:06:22Z

regex-pcre (Hackage)

Uses system PCRE library.

Pros

Cons

regex-pcre-builtin

2016-04-09T14:06:15Z

regex-pcre-builtin (Hackage)

Regex flavor: PCRE.

Pros

Cons

Ecosystem

pcre-utils – provides split/replace

regex-tdfa

2016-04-09T14:03:31Z

regex-tdfa (Hackage)

Uses regex-base. Seems to be the most popular library.

Regex flavor: POSIX.

Pros

Handles corner cases better than other POSIX implementations (including glibc and the rest).

Doesn't require any installed libraries (since it's written in pure Haskell).

Cons

Slightly complicated to use, and documentation isn't particularly good.

Ecosystem

regex-tdfa-pipes, regex-tdfa-quasiquoter, regex-tdfa-text
regex-genex can generate all strings matching some regex, and quickcheck-regex can use that to generate test cases for Quickcheck
regex-compat-tdfa is a wrapper over regex-tdfa with a simple interface

regexpr

2016-04-09T14:01:52Z

regexpr (Hackage)

Pros

Cons

regex-applicative

2016-04-09T14:00:50Z

regex-applicative (Hackage)

Regex-like parsing combinators.

Pros

Cons

Ecosystem

lexer-applicative, regex-applicative-text

weighted-regexp

2016-04-09T13:59:17Z

weighted-regexp (Hackage)

See http://sebfisch.github.io/haskell-regexp.

Pros

Cons

Ecosystem

regexp-tries

hxt-regex-xmlschema

2016-04-09T13:59:05Z

hxt-regex-xmlschema (Hackage)

Pros

Cons

re2

2016-04-09T13:56:56Z

re2 (Hackage)

Bindings to Google's RE2 library.

Pros

Cons

regexdot

2016-04-09T13:55:09Z

regexdot (Hackage)

Regex flavor: POSIX.

Pros

Works on lists of arbitrary objects.

Cons

Ecosystem

regexchar

pcre-light

2016-04-09T13:46:23Z

pcre-light (Hackage)

Binds to the C PCRE library.

Pros

Cons

Only works with bytestrings unless you use a wrapper.

Ecosystem

regex-easy – a convenience wrapper (TODO: probably useless?)
pcre-heavy – another convenience wrapper
rex – quasiquoter

text-icu

2016-04-03T09:50:05Z

text-icu (Hackage)

Bindings to International Components for Unicode, which among other things provides regexes. See this section of the Data.Text.ICU module.

Regex flavor: PCRE.

Pros

Supports Unicode classes – e.g. [:lower:] matches all lowercase letters, not just Latin ones.

Allows limiting time/memory used by the matcher.

Cons

Requires ICU installed.

Ecosystem

text-regex-replace

Notes

Imports and pragmas

{-# LANGUAGE OverloadedStrings #-}

It's better to import the module qualified, because some functions from it (like find and span) clash with those from Prelude. Additionally, many functions clash with ones from Data.Text, so don't import it as T either.

import qualified Data.Text.ICU as ICU

If you want replacement as well:

-- from text-regex-replace
import qualified Data.Text.ICU.Replace as ICU

Search

To search, use findAll (or find if you only need the 1st match):

> ICU.findAll "[0-9]+" "12 + 34 = 55"
[Match ["12"],Match ["34"],Match ["55"]]

findAll returns a list of Matches. A Match holds information about the matched piece of text, groups inside of the match, and text occuring between the matches.

For example, let's construct a regex that would match a name and a surname: (\p{Lu}\w*) (\p{Lu}\w*) – here \w means “character that can occur inside a word”, and \p{Lu} means “character from Unicode category Lu”, which is “Letter, uppercase”:

> let regex = "(\\p{Lu}\\w*) (\\p{Lu}\\w*)"
> let [zaphod, ford] = ICU.findAll regex "Zaphod Beeblebrox and Ford Prefect"

To get the match itself, use group 0 (which will always return Just, but unfortunately the library doesn't provide an easier way to get the match without having to unwrap Just):

> ICU.group 0 ford
Just "Ford Prefect"

You can also use group to get a particular capturing group:

> ICU.group 1 ford
Just "Ford"

> ICU.group 2 ford
Just "Prefect"

span returns the text between the previous match and this match:

> ICU.span ford
" and "

Finally, you can use prefix and suffix to get the whole string before/after the match:

> ICU.prefix 0 ford
Just "Zaphod Beeblebrox and "

> ICU.suffix 0 ford
Just ""

Replacement

Simple replacement is done with replaceAll (to replace only the 1st match, use replace):

> ICU.replaceAll "[0-9]+" "<num>" "12 + 34 = 55"
"<num> + <num> = <num>"

Replacement with groups:

> ICU.replaceAll "(.*), (.*)" "$2 $1" "Beeblebrox, Zaphod"
"Zaphod Beeblebrox"

(To have a literal $ in the output, write $$ instead of $.)

Splitting

text-icu doesn't export a splitting function, which makes it a bit complicated. Here's one that you could use:

split :: ICU.Regex -> Text -> [Text]
split r s = go (ICU.findAll r s)
  where go [] = [s]
        go [m] = [ICU.span m, fromJust (ICU.suffix 0 m)]
        go (m:ms) = ICU.span m : go ms

Regex settings

You can customise the way regexes are applied by using regex and MatchOption. For instance, if you want the matching to be case-insensitive, use CaseInsensitive:

> let regex = ICU.regex [ICU.CaseInsensitive] "xxx_(\\w+)_xxx"
> let str = "xxx_Overlord_xxx XXX_dp_ak_XXX"

> mapMaybe (ICU.group 1) (ICU.findAll regex str)
["Overlord","dp_ak"]

There are other settings available – look at the docs for MatchOption to see the full list.

Regular expressions – Haskell – Aelve Guide

regex-type

regex-type (Hackage)

Pros

Cons

relit

relit (Hackage)

Pros

Cons

regex-pderiv

regex-pderiv (Hackage)

Pros

Cons

regex-posix

regex-posix (Hackage)

Pros

Cons

Ecosystem

regex-pcre

regex-pcre (Hackage)

Pros

Cons

regex-pcre-builtin

regex-pcre-builtin (Hackage)

Pros

Cons

Ecosystem

regex-tdfa

regex-tdfa (Hackage)

Pros

Cons

Ecosystem

regexpr

regexpr (Hackage)

Pros

Cons

regex-applicative

regex-applicative (Hackage)

Pros

Cons

Ecosystem

weighted-regexp

weighted-regexp (Hackage)

Pros

Cons

Ecosystem

hxt-regex-xmlschema

hxt-regex-xmlschema (Hackage)

Pros

Cons

re2

re2 (Hackage)

Pros

Cons

regexdot

regexdot (Hackage)

Pros

Cons

Ecosystem

pcre-light

pcre-light (Hackage)

Pros

Cons

Ecosystem

text-icu

text-icu (Hackage)

Pros

Cons

Ecosystem

Notes

Links

Imports and pragmas

Search

Replacement

Splitting

Regex settings