category feedRegular expressionsCommon needseditdelete

This category is a work in progress






Regular expressions (or regexps) are lightweight parsers – e.g. instead of many1 digit (with Parsec) you might write [0-9]+ as a regular expression, and then use it to find all numbers in a string. Regular expressions can also contain capturing groups – e.g. if you want to extract fractional numbers that look like “123.45” from the text, you could write [0-9]+\.[0-9]+ and get “123.45” as a string, or you could write ([0-9]+)\.([0-9]+) and then you'd be able to extract the first and second parts from each match.

Some libraries also provide functions to do replacement based on regexps, and to split a string with the separator specified as a regexp.

Generally, regexps aren't a good replacement for parsers (the terse syntax makes them awkward to use, and they're error-prone), but sometimes they're faster and more convenient.

PCRE vs POSIX

There are 2 main flavors of regexps – PCRE and POSIX. PCRE-style expressions are used in PHP, Perl, Javascript; POSIX-style expressions are used in grep (by default), PostgreSQL and other places. Here's a description of the difference between them. Despite the differences, simple regular expressions look the same in both flavors.

Recommendations

So many libraries to choose from, ugh.

I recommend text-icu, merely because it's bindings to a well-known library, by a well-known developer (Bryan O'Sullivan, the author of aeson, attoparsec, and Real World Haskell). If you want POSIX expressions, then you might try regex-tdfa, or regex-compat-tdfa for a simpler interface. regex-applicative is a parsing library that is somewhere between regexes and parsers – it's not cumbersome to use, but it's less terse than regexes.

Benchmarks

TODO: add benchmarks

edit description
or press Ctrl+Enter to savemarkdown supported
#
text-icu (Hackage)
PCRE
move item up move item down edit item info delete item
Summary edit summary

Bindings to International Components for Unicode, which among other things provides regexes. See this section of the Data.Text.ICU module.

Summary quit editing summary
Prosedit prosquit editing pros
  • Supports Unicode classes – e.g. [:lower:] matches all lowercase letters, not just Latin ones.
    move trait up move trait down edit trait delete trait
  • Allows limiting time/memory used by the matcher.
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Consedit consquit editing cons
  • Requires ICU installed.
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
Notes
collapse notesedit notes

Links

Imports and pragmas

{-# LANGUAGE OverloadedStrings #-}

It's better to import the module qualified, because some functions from it (like find and span) clash with those from Prelude. Additionally, many functions clash with ones from Data.Text, so don't import it as T either.

import qualified Data.Text.ICU as ICU

If you want replacement as well:

-- from text-regex-replace
import qualified Data.Text.ICU.Replace as ICU

Search

To search, use findAll (or find if you only need the 1st match):

> ICU.findAll "[0-9]+" "12 + 34 = 55"
[Match ["12"],Match ["34"],Match ["55"]]

findAll returns a list of Matches. A Match holds information about the matched piece of text, groups inside of the match, and text occuring between the matches.

For example, let's construct a regex that would match a name and a surname: (\p{Lu}\w*) (\p{Lu}\w*) – here \w means “character that can occur inside a word”, and \p{Lu} means “character from Unicode category Lu”, which is “Letter, uppercase”:

> let regex = "(\\p{Lu}\\w*) (\\p{Lu}\\w*)"
> let [zaphod, ford] = ICU.findAll regex "Zaphod Beeblebrox and Ford Prefect"

To get the match itself, use group 0 (which will always return Just, but unfortunately the library doesn't provide an easier way to get the match without having to unwrap Just):

> ICU.group 0 ford
Just "Ford Prefect"

You can also use group to get a particular capturing group:

> ICU.group 1 ford
Just "Ford"

> ICU.group 2 ford
Just "Prefect"

span returns the text between the previous match and this match:

> ICU.span ford
" and "

Finally, you can use prefix and suffix to get the whole string before/after the match:

> ICU.prefix 0 ford
Just "Zaphod Beeblebrox and "

> ICU.suffix 0 ford
Just ""

Replacement

Simple replacement is done with replaceAll (to replace only the 1st match, use replace):

> ICU.replaceAll "[0-9]+" "<num>" "12 + 34 = 55"
"<num> + <num> = <num>"

Replacement with groups:

> ICU.replaceAll "(.*), (.*)" "$2 $1" "Beeblebrox, Zaphod"
"Zaphod Beeblebrox"

(To have a literal $ in the output, write $$ instead of $.)

Splitting

text-icu doesn't export a splitting function, which makes it a bit complicated. Here's one that you could use:

split :: ICU.Regex -> Text -> [Text]
split r s = go (ICU.findAll r s)
  where go [] = [s]
        go [m] = [ICU.span m, fromJust (ICU.suffix 0 m)]
        go (m:ms) = ICU.span m : go ms

Regex settings

You can customise the way regexes are applied by using regex and MatchOption. For instance, if you want the matching to be case-insensitive, use CaseInsensitive:

> let regex = ICU.regex [ICU.CaseInsensitive] "xxx_(\\w+)_xxx"
> let str = "xxx_Overlord_xxx XXX_dp_ak_XXX"

> mapMaybe (ICU.group 1) (ICU.findAll regex str)
["Overlord","dp_ak"]

There are other settings available – look at the docs for MatchOption to see the full list.

collapse notesedit notes
#
regex-tdfa (Hackage)
POSIX
move item up move item down edit item info delete item
Summary edit summary

Uses regex-base. Seems to be the most popular library.

Summary quit editing summary
Prosedit prosquit editing pros
  • Handles corner cases better than other POSIX implementations (including glibc and the rest).
    move trait up move trait down edit trait delete trait
  • Doesn't require any installed libraries (since it's written in pure Haskell).
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Consedit consquit editing cons
  • Slightly complicated to use, and documentation isn't particularly good.
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Ecosystemedit ecosystem
Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
Notes
collapse notesedit notes

<notes are empty>

add something!

#
regex-applicative (Hackage)
other
move item up move item down edit item info delete item
Summary edit summary

Regex-like parsing combinators.

Summary quit editing summary
Prosedit prosquit editing pros

    press Ctrl+Enter or Enter to addmarkdown supportededit off
    Consedit consquit editing cons

      press Ctrl+Enter or Enter to addmarkdown supportededit off
      Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
      Notes
      collapse notesedit notes

      <notes are empty>

      add something!

      #
      pcre-light (Hackage)
      PCRE
      move item up move item down edit item info delete item
      Summary edit summary

      Binds to the C PCRE library.

      Summary quit editing summary
      Prosedit prosquit editing pros

        press Ctrl+Enter or Enter to addmarkdown supportededit off
        Consedit consquit editing cons
        • Only works with bytestrings unless you use a wrapper.
          move trait up move trait down edit trait delete trait

        press Ctrl+Enter or Enter to addmarkdown supportededit off
        Ecosystemedit ecosystem
        • regex-easy – a convenience wrapper (TODO: probably useless?)
        • pcre-heavy – another convenience wrapper
        • rex – quasiquoter
        Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
        Notes
        collapse notesedit notes

        <notes are empty>

        add something!

        #
        regexdot (Hackage)
        POSIX
        move item up move item down edit item info delete item
        Summary edit summary

        write something here!

        Summary quit editing summary
        Prosedit prosquit editing pros
        • Works on lists of arbitrary objects.
          move trait up move trait down edit trait delete trait

        press Ctrl+Enter or Enter to addmarkdown supportededit off
        Consedit consquit editing cons

          press Ctrl+Enter or Enter to addmarkdown supportededit off
          Ecosystemedit ecosystem
          Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
          Notes
          collapse notesedit notes

          <notes are empty>

          add something!

          #
          re2 (Hackage)
          other
          move item up move item down edit item info delete item
          Summary edit summary

          Bindings to Google's RE2 library.

          Summary quit editing summary
          Prosedit prosquit editing pros

            press Ctrl+Enter or Enter to addmarkdown supportededit off
            Consedit consquit editing cons

              press Ctrl+Enter or Enter to addmarkdown supportededit off
              Ecosystemedit ecosystem
              Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
              Notes
              collapse notesedit notes

              <notes are empty>

              add something!

              #
              hxt-regex-xmlschema (Hackage)
              other
              move item up move item down edit item info delete item
              Summary edit summary

              write something here!

              Summary quit editing summary
              Prosedit prosquit editing pros

                press Ctrl+Enter or Enter to addmarkdown supportededit off
                Consedit consquit editing cons

                  press Ctrl+Enter or Enter to addmarkdown supportededit off
                  Ecosystemedit ecosystem
                  Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                  Notes
                  collapse notesedit notes

                  <notes are empty>

                  add something!

                  #
                  weighted-regexp (Hackage)
                  other
                  move item up move item down edit item info delete item
                  Prosedit prosquit editing pros

                    press Ctrl+Enter or Enter to addmarkdown supportededit off
                    Consedit consquit editing cons

                      press Ctrl+Enter or Enter to addmarkdown supportededit off
                      Ecosystemedit ecosystem
                      Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                      Notes
                      collapse notesedit notes

                      <notes are empty>

                      add something!

                      #
                      regexpr (Hackage)
                      other
                      move item up move item down edit item info delete item
                      Summary edit summary

                      write something here!

                      Summary quit editing summary
                      Prosedit prosquit editing pros

                        press Ctrl+Enter or Enter to addmarkdown supportededit off
                        Consedit consquit editing cons

                          press Ctrl+Enter or Enter to addmarkdown supportededit off
                          Ecosystemedit ecosystem
                          Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                          Notes
                          collapse notesedit notes

                          <notes are empty>

                          add something!

                          #
                          regex-pcre-builtin (Hackage)
                          PCRE
                          move item up move item down edit item info delete item
                          Summary edit summary

                          write something here!

                          Summary quit editing summary
                          Prosedit prosquit editing pros

                            press Ctrl+Enter or Enter to addmarkdown supportededit off
                            Consedit consquit editing cons

                              press Ctrl+Enter or Enter to addmarkdown supportededit off
                              Ecosystemedit ecosystem

                              pcre-utils – provides split/replace

                              Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                              Notes
                              collapse notesedit notes

                              <notes are empty>

                              add something!

                              #
                              regex-pcre (Hackage)
                              PCRE
                              move item up move item down edit item info delete item
                              Summary edit summary

                              Uses system PCRE library.

                              Summary quit editing summary
                              Prosedit prosquit editing pros

                                press Ctrl+Enter or Enter to addmarkdown supportededit off
                                Consedit consquit editing cons

                                  press Ctrl+Enter or Enter to addmarkdown supportededit off
                                  Ecosystemedit ecosystem
                                  Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                                  Notes
                                  collapse notesedit notes

                                  <notes are empty>

                                  add something!

                                  #
                                  regex-posix (Hackage)
                                  POSIX
                                  move item up move item down edit item info delete item
                                  Summary edit summary

                                  The library is bundled.

                                  Summary quit editing summary
                                  Prosedit prosquit editing pros

                                    press Ctrl+Enter or Enter to addmarkdown supportededit off
                                    Consedit consquit editing cons

                                      press Ctrl+Enter or Enter to addmarkdown supportededit off
                                      Ecosystemedit ecosystem
                                      Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                                      Notes
                                      collapse notesedit notes

                                      <notes are empty>

                                      add something!

                                      #
                                      regex-pderiv (Hackage)
                                      other
                                      move item up move item down edit item info delete item
                                      Summary edit summary

                                      write something here!

                                      Summary quit editing summary
                                      Prosedit prosquit editing pros

                                        press Ctrl+Enter or Enter to addmarkdown supportededit off
                                        Consedit consquit editing cons

                                          press Ctrl+Enter or Enter to addmarkdown supportededit off
                                          Ecosystemedit ecosystem
                                          Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                                          Notes
                                          collapse notesedit notes

                                          <notes are empty>

                                          add something!

                                          #
                                          relit (Hackage)
                                          other
                                          move item up move item down edit item info delete item
                                          Summary edit summary

                                          Not a library, but a quasiquoter for various regex* libraries.

                                          Summary quit editing summary
                                          Prosedit prosquit editing pros

                                            press Ctrl+Enter or Enter to addmarkdown supportededit off
                                            Consedit consquit editing cons

                                              press Ctrl+Enter or Enter to addmarkdown supportededit off
                                              Ecosystemedit ecosystem
                                              Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                                              Notes
                                              collapse notesedit notes

                                              <notes are empty>

                                              add something!

                                              #
                                              regex-type (Hackage)
                                              other
                                              move item up move item down edit item info delete item
                                              Summary edit summary

                                              write something here!

                                              Summary quit editing summary
                                              Prosedit prosquit editing pros

                                                press Ctrl+Enter or Enter to addmarkdown supportededit off
                                                Consedit consquit editing cons

                                                  press Ctrl+Enter or Enter to addmarkdown supportededit off
                                                  Ecosystemedit ecosystem
                                                  Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
                                                  Notes
                                                  collapse notesedit notes

                                                  <notes are empty>

                                                  add something!