Recommendations

The standard type for strings – String – is widely used. However, it's rather slow, as it's implemented simply as a linked list of characters:

type String = [Char]

If you're only going to work with tiny strings here and there, you can use String. However, if you're doing lots of string manipulation, you're better off with Text from the text package:

  • It handles Unicode better than String (for an example of where String might fail you, see here).
  • It's faster and takes less memory.
  • It has more utility functions (replace, splitOn, etc).

Some people advocate for using Text instead of String in all cases, but if you're a beginner you might be better off with String because it's clearer how to manipulate it (all list functions work on String) and you won't run into any type problems. However, even if you're not a beginner, you're still likely to use String in some places (when defining Show, for instance). Don't try to run away from String everywhere, sometimes it's not worth it.

Finally, if you need speed and you're willing to do decoding by yourself, or if you're working with network protocols that use text, you might consider bytestring.

edit description
or press Ctrl+Enter to savemarkdown supported
#
text (Hackage)
other
move item up move item down edit item info delete item
Summary edit summary

The type for strings that everyone uses “in production”. Implemented as UTF-16 arrays under the hood.

Summary quit editing summary
Prosedit prosquit editing pros
  • Fast and uses less memory than String.
    move trait up move trait down edit trait delete trait
  • Has more utility functions like splitOn, etc. available out of the box.
    move trait up move trait down edit trait delete trait
  • Conforms to various Unicode rules about string casing and so on (e.g. if you try to uppercase "Straße" you'll get "STRASSE" and not nonsensical "STRAßE").
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Consedit consquit editing cons
  • Can be harder to manipulate if you're used to processing strings as lists (i.e. String).
    move trait up move trait down edit trait delete trait
  • Uses UTF-16 and thus takes additional time to encode/decode from UTF-8.
    move trait up move trait down edit trait delete trait
  • Doesn't have O(1) indexing because UTF-16 is a variable-length encoding.
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Ecosystemedit ecosystem
Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
Notes
collapse notesedit notes

Imports and pragmas

OverloadedStrings lets string literals have type Text:

{-# LANGUAGE OverloadedStrings #-}

Imports are most commonly qualified:

import qualified Data.Text as T
import Data.Text (Text)

import qualified Data.Text.IO as T             -- for putStrLn, etc
import qualified Data.Text.Encoding as T       -- for UTF8 encoding/decoding

Lazy variant:

import qualified Data.Text.Lazy as T
import Data.Text.Lazy (Text)

import qualified Data.Text.Lazy.IO as T        -- for putStrLn, etc
import qualified Data.Text.Lazy.Encoding as T  -- for UTF8 encoding/decoding

If you're not using anything like base-prelude, you might want to import Data.Monoid to have concatenation:

import Data.Monoid

Strict and lazy Text

There are 2 text types in the library – both are called Text but one comes from the Data.Text module and the other from Data.Text.Lazy module. They're not compatible (but you can convert between them), and are intended for use in different situations.

Strict Text is an array of characters. Lazy Text is a list (possibly infinite) of arrays of characters, or chunks. It's recommended to use lazy Text for cases where it makes sense to process text in a streaming fashion – for instance, if you have a huge file that you want to read and output as a web page, you could do it like “read a chunk, output a chunk, read a chunk, output a chunk...” – which is what might happen automatically if you use lazy Text correctly.

A rule of thumb is “if you don't ever intend for the string to be in memory only partially, use strict Text”.

To convert lazy Text to strict Text, use toStrict from Data.Text.Lazy; fromStrict goes in the opposite direction. To break a lazy Text into a list of chunks, use toChunks, and for the reverse – fromChunks.

Usage

Most functions from Prelude are replicated in Data.Text. The ones that are new are replicated below.

Common functions

  • pack and unpack for converting between String and Text
  • cons and snoc to prepend/append a character
  • (<>) from Data.Monoid appends two strings
  • toLower and toUpper convert to upper/lowercase (there's also `toTitle)
  • toCaseFold is used for case-insensitive comparisons: toCaseFold x == toCaseFold y

Searching

replace x y replaces x by y:

> replace " " "_" "hello world"
"hello_world"

> replace "ofo" "bar" "ofofo"
"barfo"

breakOn splits the string into “before separator” and “after separator” parts, where separator can be a string; breakOnEnd does the same but starts from the end:

> breakOn "::" "a::b::c"
("a", "::b::c")

> breakOnEnd "::" "a::b::c"
("a::b::", "c")

breakOnAll gives you all splitting variants:

> breakOnAll "::" "a::b::c"
[("a", "::b::c"), ("a::b", "::c")]

splitOn splits the string into a list of strings; split breaks on predicate Char -> Bool:

> splitOn "::" "a::b::c"
["a","b","c"]

> split (not . isAlphaNum) "a::b::c"
["a","","b","","c"]

count counts how many times a string occurs in another string (without overlaps).

Cutting strings

take and takeEnd take N characters from the beginning/end, drop and dropEnd remove them.

takeWhile, takeWhileEnd, dropWhile and dropWhileEnd exist as well. dropAround strips characters from both sides of the string.

strip, stripStart and stripEnd strip spaces specifically.

stripPrefix and stripSuffix remove some particular prefix/suffix (or return Nothing). commonPrefixes takes two strings and cuts out the longest matching prefix from them.

chunksOf splits a string into chunks of length N.

Transformations

justifyRight and justifyLeft add characters to the beginning/end of the string until it reaches certain length:

> justifyRight 7 '_' "foo"
"____foo"

> justifyLeft 7 '_' "foo"
"foo____"

center adds the character to both sides equally, breaking ties in favor of the left side:

> center 7 '_' "foo"
"__foo__"

> center 8 '_' "foo"
"___foo__"

Optimisation

TODO: mention copy, Builder, explain how fusion works, etc.

FAQ

  • Where is elem?

    It's been removed from text because you can use isInfixOf to do the same thing.
    Thanks to rewrite rules, T.isInfixOf "c" or T.isInfixOf (T.singleton c) will be as fast as elem.

collapse notesedit notes
#
other
move item up move item down edit item info delete item
Summary edit summary

The default Haskell type.

type String = [Char]

Supports Unicode. Isn't very fast, but isn't horribly slow either, and lots of libraries work with String instead of Text, so if you're not doing web dev and not writing anything with lots of string processing, you might just as well use it.

Summary quit editing summary
Prosedit prosquit editing pros
  • The most widely used string type.
    move trait up move trait down edit trait delete trait
  • Bundled with base.
    move trait up move trait down edit trait delete trait
  • Easy to process manually (because it's just a list of characters).
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Consedit consquit editing cons
  • Slow, uses lots of memory (being a linked list).
    move trait up move trait down edit trait delete trait
  • Doesn't support Unicode perfectly (if you do something like map toUpper, for instance).
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Ecosystemedit ecosystem
  • split can do pretty much anything when it comes to string splitting.

  • utf8-string for converting to/from UTF8.

  • case-insensitive for case-insensitive comparisons.

Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
Notes
collapse notesedit notes

Usage

Splitting

You can split strings into words and lines by using words/lines. However, for more options use the split package:

import Data.List.Split

Its documentation is actually pretty good, so it won't be replicated here.

collapse notesedit notes
#
bytestring (Hackage)
other
move item up move item down edit item info delete item
Summary edit summary

Provides byte arrays.

Only use it if you're working with text with known encoding and you need it to be fast, or when you're working with network protocols. For instance:

  • aeson doesn't translate JSON to Text before parsing it, but works on raw ByteStrings (and assumes UTF-8)
  • cassava stores CSV fields as ByteStrings
  • lucid outputs HTML as a ByteString
  • http-types uses ByteStrings for headers, URLs, and so on
Summary quit editing summary
Prosedit prosquit editing pros
  • The fastest option available. Unlike text, it doesn't do any encoding/decoding under the hood and provides you direct access to the bytes.
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Consedit consquit editing cons
  • Should never be used for strings unless you know what you're doing.
    move trait up move trait down edit trait delete trait

press Ctrl+Enter or Enter to addmarkdown supportededit off
Ecosystemedit ecosystem

There are more packages in the entry for bytestring in the “Arrays” category.

Ecosystemquit editing ecosystemor press Ctrl+Enter to savemarkdown supported
Notes
collapse notesedit notes

Imports and pragmas

This module is the same as Data.ByteString but converts all bytes to characters without you having to do it:

import qualified Data.ByteString.Char8 as BC

And to be able to use string literals to construct ByteStrings, enable OverloadedStrings:

{-# LANGUAGE OverloadedStrings #-}
collapse notesedit notes