<?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/Atom"><title xmlns:ns="http://www.w3.org/2005/Atom" ns:type="text">Strings – Haskell – Aelve Guide</title><id>https://guide.aelve.com/haskell/feed/category/o62hqc69</id><updated>2019-01-07T08:12:56Z</updated><link xmlns:ns="http://www.w3.org/2005/Atom" ns:href="https://guide.aelve.com/haskell/feed/category/o62hqc69"/><entry><id>dhzdv4ae</id><title xmlns:ns="http://www.w3.org/2005/Atom" ns:type="text">text-utf8</title><updated>2019-01-07T08:12:56Z</updated><content xmlns:ns="http://www.w3.org/2005/Atom" ns:type="html">&lt;h1&gt;  &lt;span class=&#34;item-name&#34;&gt;text-utf8&lt;/span&gt;

  
  (&lt;a href=&#34;https://hackage.haskell.org/package/text-utf8&#34;&gt;Hackage&lt;/a&gt;)
&lt;/h1&gt;&lt;p&gt;This is a fork of the &lt;code&gt;text&lt;/code&gt; package ported which uses UTF-8 instead of UTF-16 as its internal representation.&lt;/p&gt;
&lt;h2&gt;Pros&lt;/h2&gt;&lt;ul&gt;&lt;/ul&gt;&lt;h2&gt;Cons&lt;/h2&gt;&lt;ul&gt;&lt;/ul&gt;</content><link xmlns:ns="http://www.w3.org/2005/Atom" ns:href="https://guide.aelve.com/haskell/strings-o62hqc69#item-dhzdv4ae"/></entry><entry><id>pkjaz0m0</id><title xmlns:ns="http://www.w3.org/2005/Atom" ns:type="text">intern</title><updated>2019-01-06T12:13:53Z</updated><content xmlns:ns="http://www.w3.org/2005/Atom" ns:type="html">&lt;h1&gt;  &lt;span class=&#34;item-name&#34;&gt;intern&lt;/span&gt;

  
  (&lt;a href=&#34;https://hackage.haskell.org/package/intern&#34;&gt;Hackage&lt;/a&gt;)
&lt;/h1&gt;&lt;p&gt;An implementation of interned strings (also known as &amp;quot;hash-consing&amp;quot;, &amp;quot;symbols&amp;quot; or &amp;quot;atoms&amp;quot;). Every distinct string will only be kept in memory once, which is very useful when many of your strings are duplicates. Also provides O(1) string comparison, since it can be done simply by looking at the references.&lt;/p&gt;
&lt;h2&gt;Pros&lt;/h2&gt;&lt;ul&gt;&lt;/ul&gt;&lt;h2&gt;Cons&lt;/h2&gt;&lt;ul&gt;&lt;/ul&gt;</content><link xmlns:ns="http://www.w3.org/2005/Atom" ns:href="https://guide.aelve.com/haskell/strings-o62hqc69#item-pkjaz0m0"/></entry><entry><id>wn4f31st</id><title xmlns:ns="http://www.w3.org/2005/Atom" ns:type="text">text-short</title><updated>2019-01-06T12:03:40Z</updated><content xmlns:ns="http://www.w3.org/2005/Atom" ns:type="html">&lt;h1&gt;  &lt;span class=&#34;item-name&#34;&gt;text-short&lt;/span&gt;

  
  (&lt;a href=&#34;https://hackage.haskell.org/package/text-short&#34;&gt;Hackage&lt;/a&gt;)
&lt;/h1&gt;&lt;p&gt;A version of &lt;code&gt;Text&lt;/code&gt; with less memory overhead, suitable for keeping a lot of short strings in memory. Implemented as a wrapper over &lt;a href=&#34;http://hackage.haskell.org/package/bytestring/docs/Data-ByteString-Short.html#t:ShortByteString&#34;&gt;&lt;code&gt;ShortByteString&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The main difference between &lt;code&gt;Text&lt;/code&gt; and &lt;code&gt;ShortText&lt;/code&gt; is that &lt;code&gt;ShortText&lt;/code&gt; uses UTF-8 instead of UTF-16 internally and also doesn&#39;t support zero-copy slicing (thereby saving 2 words). Consequently, the memory footprint of a (boxed) &lt;code&gt;ShortText&lt;/code&gt; value is 4 words (2 words when unboxed) plus the length of the UTF-8 encoded payload.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Note that unlike &lt;code&gt;ByteString&lt;/code&gt;, &lt;code&gt;Text&lt;/code&gt; doesn&#39;t use &lt;a href=&#34;https://www.reddit.com/r/haskelltil/comments/693xue/pinned_memory_can_lead_to_unexpected_memory_leaks/&#34;&gt;pinned memory&lt;/a&gt;, so there&#39;s no point in switching from &lt;code&gt;Text&lt;/code&gt; to &lt;code&gt;ShortText&lt;/code&gt; if you want to avoid heap fragmentation – &lt;code&gt;Text&lt;/code&gt; already avoids it.&lt;/p&gt;
&lt;h2&gt;Pros&lt;/h2&gt;&lt;ul&gt;&lt;/ul&gt;&lt;h2&gt;Cons&lt;/h2&gt;&lt;ul&gt;&lt;/ul&gt;&lt;h2&gt;Ecosystem&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://hackage.haskell.org/package/text-containers&#34;&gt;text-containers&lt;/a&gt; provides memory-dense sets, arrays and associative maps over &lt;code&gt;ShortText&lt;/code&gt; values.&lt;/li&gt;
&lt;/ul&gt;
</content><link xmlns:ns="http://www.w3.org/2005/Atom" ns:href="https://guide.aelve.com/haskell/strings-o62hqc69#item-wn4f31st"/></entry><entry><id>ioyvne1y</id><title xmlns:ns="http://www.w3.org/2005/Atom" ns:type="text">bytestring</title><updated>2016-04-14T10:37:25Z</updated><content xmlns:ns="http://www.w3.org/2005/Atom" ns:type="html">&lt;h1&gt;  &lt;span class=&#34;item-name&#34;&gt;bytestring&lt;/span&gt;

  
  (&lt;a href=&#34;https://hackage.haskell.org/package/bytestring&#34;&gt;Hackage&lt;/a&gt;)
&lt;/h1&gt;&lt;p&gt;Provides byte arrays, with a fake-string interface in &lt;a href=&#34;http://hackage.haskell.org/package/bytestring/docs/Data-ByteString-Char8.html&#34;&gt;&lt;code&gt;Data.ByteString.Char8&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Only use it if you&#39;re working with text with known encoding and you need it to be fast, or when you&#39;re working with network protocols. For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://hackage.haskell.org/package/aeson&#34;&gt;aeson&lt;/a&gt; doesn&#39;t translate JSON to &lt;code&gt;Text&lt;/code&gt; before parsing it, but works on raw &lt;code&gt;ByteString&lt;/code&gt;s (and assumes UTF-8)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://hackage.haskell.org/package/cassava&#34;&gt;cassava&lt;/a&gt; stores CSV fields as &lt;code&gt;ByteString&lt;/code&gt;s&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://hackage.haskell.org/package/lucid&#34;&gt;lucid&lt;/a&gt; outputs HTML as a &lt;code&gt;ByteString&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://hackage.haskell.org/package/http-types&#34;&gt;http-types&lt;/a&gt; uses &lt;code&gt;ByteString&lt;/code&gt;s for headers, URLs, and so on&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Pros&lt;/h2&gt;&lt;ul&gt;&lt;p&gt;&lt;li&gt;The fastest option available. Unlike &lt;a href=&#34;https://hackage.haskell.org/package/text&#34;&gt;text&lt;/a&gt;, it doesn&#39;t do any encoding/decoding under the hood and provides you direct access to the bytes.&lt;/li&gt;&lt;/p&gt;&lt;/ul&gt;&lt;h2&gt;Cons&lt;/h2&gt;&lt;ul&gt;&lt;p&gt;&lt;li&gt;Only suitable for working with &lt;a href=&#34;https://en.wikipedia.org/wiki/ASCII&#34;&gt;ASCII&lt;/a&gt; text, unless you take care to handle the encoding (like e.g. &lt;a href=&#34;https://hackage.haskell.org/package/aeson&#34;&gt;aeson&lt;/a&gt; does). It won&#39;t &lt;em&gt;necessarily&lt;/em&gt; break – e.g. you can still search for a UTF-8 substring in a UTF-8 string even if both are broken from the &lt;code&gt;ByteString&lt;/code&gt; point of view, because they are broken the same way. However, it&#39;s still very fragile. A better alternative for dealing with UTF-8 (or ASCII) encoded memory is to use &lt;a href=&#34;https://hackage.haskell.org/package/text-utf8&#34;&gt;text-utf8&lt;/a&gt; or &lt;a href=&#34;https://hackage.haskell.org/package/text-short&#34;&gt;text-short&lt;/a&gt;.&lt;/li&gt;&lt;/p&gt;&lt;/ul&gt;&lt;h2&gt;Ecosystem&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/case-insensitive&#34;&gt;case-insensitive&lt;/a&gt; for case-insensitive comparisons&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/bytestring-show&#34;&gt;bytestring-show&lt;/a&gt; as replacement for &lt;code&gt;Show&lt;/code&gt;, &lt;a href=&#34;https://hackage.haskell.org/package/readable&#34;&gt;readable&lt;/a&gt; as replacement for &lt;code&gt;Read&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/attoparsec&#34;&gt;attoparsec&lt;/a&gt; is particularly well-suited for parsing &lt;code&gt;ByteString&lt;/code&gt;s&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/stringsearch&#34;&gt;stringsearch&lt;/a&gt; for fast searching, replacement, and splitting&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/utf8-string&#34;&gt;utf8-string&lt;/a&gt; for basic UTF-8 operations on &lt;code&gt;ByteString&lt;/code&gt;s, e.g. taking first N characters&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are more packages in the &lt;a href=&#34;https://guide.aelve.com/haskell/arrays-bpid18sd#item-zf02sw64&#34;&gt;entry for bytestring&lt;/a&gt; in the “Arrays” category.&lt;/p&gt;
&lt;h2&gt;Notes&lt;/h2&gt;&lt;h1&gt;&lt;span id=&#34;item-notes-ioyvne1y-imports-and-pragmas&#34;&gt;&lt;/span&gt;Imports and pragmas&lt;/h1&gt;&lt;p&gt;This module is the same as &lt;code&gt;Data.ByteString&lt;/code&gt; but converts all bytes to characters without you having to do it:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;kw&#34;&gt;import qualified&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;Data.ByteString.Char8&lt;/span&gt; &lt;span class=&#34;kw&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;BC&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And to be able to use string literals to construct &lt;code&gt;ByteString&lt;/code&gt;s, enable &lt;code&gt;OverloadedStrings&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;ot&#34;&gt;{-# LANGUAGE OverloadedStrings #-}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
</content><link xmlns:ns="http://www.w3.org/2005/Atom" ns:href="https://guide.aelve.com/haskell/strings-o62hqc69#item-ioyvne1y"/></entry><entry><id>zpbns7qe</id><title xmlns:ns="http://www.w3.org/2005/Atom" ns:type="text">String</title><updated>2016-04-14T10:25:12Z</updated><content xmlns:ns="http://www.w3.org/2005/Atom" ns:type="html">&lt;h1&gt;  &lt;a href=&#34;http://hackage.haskell.org/package/base/docs/Prelude.html#t:String&#34; class=&#34;item-name&#34;&gt;String&lt;/a&gt;

&lt;/h1&gt;&lt;p&gt;The default Haskell type for strings. Unicode-aware but not particularly clever (slightly less clever than &lt;code&gt;Text&lt;/code&gt;). Defined as an ordinary list of characters:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;kw&#34;&gt;type&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;String&lt;/span&gt; &lt;span class=&#34;fu&#34;&gt;=&lt;/span&gt; [&lt;span class=&#34;dt&#34;&gt;Char&lt;/span&gt;]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Isn&#39;t very fast, but isn&#39;t horribly slow either, and lots of libraries work with &lt;code&gt;String&lt;/code&gt; instead of &lt;code&gt;Text&lt;/code&gt;, so if you&#39;re not doing web dev and not writing anything with lots of string processing, you might just as well use it.&lt;/p&gt;
&lt;p&gt;Even in codebases that use &lt;code&gt;Text&lt;/code&gt; all the way, &lt;code&gt;String&lt;/code&gt; is still sometimes used for error messages (e.g. a function that returns &lt;code&gt;Either String a&lt;/code&gt;).&lt;/p&gt;
&lt;h2&gt;Pros&lt;/h2&gt;&lt;ul&gt;&lt;p&gt;&lt;li&gt;The most widely used string type.&lt;/li&gt;&lt;/p&gt;&lt;p&gt;&lt;li&gt;Bundled with &lt;code&gt;base&lt;/code&gt;.&lt;/li&gt;&lt;/p&gt;&lt;p&gt;&lt;li&gt;Easy to process manually (because it&#39;s just a list of characters).&lt;/li&gt;&lt;/p&gt;&lt;/ul&gt;&lt;h2&gt;Cons&lt;/h2&gt;&lt;ul&gt;&lt;p&gt;&lt;li&gt;Slow, uses lots of memory (being a linked list).&lt;/li&gt;&lt;/p&gt;&lt;p&gt;&lt;li&gt;Doesn&#39;t support Unicode perfectly (if you do something like &lt;code&gt;map toUpper&lt;/code&gt;, for instance).&lt;/li&gt;&lt;/p&gt;&lt;/ul&gt;&lt;h2&gt;Ecosystem&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/split&#34;&gt;split&lt;/a&gt; can do pretty much anything when it comes to string splitting.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/utf8-string&#34;&gt;utf8-string&lt;/a&gt; for converting to/from UTF8.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/case-insensitive&#34;&gt;case-insensitive&lt;/a&gt; for case-insensitive comparisons.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Notes&lt;/h2&gt;&lt;h1&gt;&lt;span id=&#34;item-notes-zpbns7qe-usage&#34;&gt;&lt;/span&gt;Usage&lt;/h1&gt;&lt;h2&gt;&lt;span id=&#34;item-notes-zpbns7qe-splitting&#34;&gt;&lt;/span&gt;Splitting&lt;/h2&gt;&lt;p&gt;You can split strings into words and lines by using &lt;code&gt;words&lt;/code&gt;/&lt;code&gt;lines&lt;/code&gt;. However, for more options use the &lt;a href=&#34;https://hackage.haskell.org/package/split&#34;&gt;split&lt;/a&gt; package:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode repl&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;kw&#34;&gt;import &lt;/span&gt;&lt;span class=&#34;dt&#34;&gt;Data.List.Split&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Its documentation is actually pretty good, so it won&#39;t be replicated here.&lt;/p&gt;
</content><link xmlns:ns="http://www.w3.org/2005/Atom" ns:href="https://guide.aelve.com/haskell/strings-o62hqc69#item-zpbns7qe"/></entry><entry><id>fqgxfjfq</id><title xmlns:ns="http://www.w3.org/2005/Atom" ns:type="text">text</title><updated>2016-04-14T10:25:09Z</updated><content xmlns:ns="http://www.w3.org/2005/Atom" ns:type="html">&lt;h1&gt;  &lt;span class=&#34;item-name&#34;&gt;text&lt;/span&gt;

  
  (&lt;a href=&#34;https://hackage.haskell.org/package/text&#34;&gt;Hackage&lt;/a&gt;)
&lt;/h1&gt;&lt;p&gt;The type for strings that is most commonly recommended “for production”. Implemented as UTF-16 arrays under the hood.&lt;/p&gt;
&lt;p&gt;Comes in strict and lazy variants (&lt;a href=&#34;https://hackage.haskell.org/package/text/docs/Data-Text.html&#34;&gt;&lt;code&gt;Data.Text&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;https://hackage.haskell.org/package/text/docs/Data-Text.html&#34;&gt;&lt;code&gt;Data.Text.Lazy&lt;/code&gt;&lt;/a&gt;); the latter can be used for processing huge strings in a streaming fashion instead of more explicit approaches like &lt;a href=&#34;https://hackage.haskell.org/package/pipes&#34;&gt;pipes&lt;/a&gt; or &lt;a href=&#34;https://hackage.haskell.org/package/conduit&#34;&gt;conduit&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Pros&lt;/h2&gt;&lt;ul&gt;&lt;p&gt;&lt;li&gt;Fast and uses less memory than &lt;code&gt;String&lt;/code&gt;.&lt;/li&gt;&lt;/p&gt;&lt;p&gt;&lt;li&gt;Has more utility functions like &lt;code&gt;splitOn&lt;/code&gt;, etc. available out of the box.&lt;/li&gt;&lt;/p&gt;&lt;p&gt;&lt;li&gt;Better conforms to various Unicode rules about string casing and so on. It still does codepoint-based processing instead of grapheme-based processing, though, but there are libraries that process &lt;code&gt;Text&lt;/code&gt; the right way.&lt;/li&gt;&lt;/p&gt;&lt;/ul&gt;&lt;h2&gt;Cons&lt;/h2&gt;&lt;ul&gt;&lt;p&gt;&lt;li&gt;Can be harder to manipulate if you&#39;re used to processing strings as lists (i.e. &lt;code&gt;String&lt;/code&gt;).&lt;/li&gt;&lt;/p&gt;&lt;p&gt;&lt;li&gt;Uses UTF-16 and thus takes additional time to encode/decode from UTF-8. See also &lt;a href=&#34;https://hackage.haskell.org/package/text-utf8&#34;&gt;text-utf8&lt;/a&gt; or &lt;a href=&#34;https://hackage.haskell.org/package/text-short&#34;&gt;text-short&lt;/a&gt;.&lt;/li&gt;&lt;/p&gt;&lt;p&gt;&lt;li&gt;Doesn&#39;t have O(1) indexing because UTF-16 is a variable-length encoding. Can be annoying if you only process ASCII (or close to ASCII) text, for which O(1) indexing is possible.&lt;/li&gt;&lt;/p&gt;&lt;/ul&gt;&lt;h2&gt;Ecosystem&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Most parsing packages nowadays support &lt;code&gt;Text&lt;/code&gt;, including &lt;a href=&#34;https://hackage.haskell.org/package/megaparsec&#34;&gt;megaparsec&lt;/a&gt; and &lt;a href=&#34;https://hackage.haskell.org/package/attoparsec&#34;&gt;attoparsec&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To encode/decode &lt;code&gt;Text&lt;/code&gt; to UTF-8, UTF-16, or UTF-32, use &lt;a href=&#34;http://hackage.haskell.org/package/text/docs/Data-Text-Encoding.html&#34;&gt;&lt;code&gt;Data.Text.Encoding&lt;/code&gt;&lt;/a&gt;. For more encodings, see &lt;a href=&#34;http://hackage.haskell.org/package/text-icu/docs/Data-Text-ICU-Convert.html&#34;&gt;&lt;code&gt;Data.Text.ICU.Convert&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For a fast alternative to the &lt;code&gt;Show&lt;/code&gt; class, see &lt;a href=&#34;https://hackage.haskell.org/package/text-show&#34;&gt;text-show&lt;/a&gt; (and additional instances in &lt;a href=&#34;https://hackage.haskell.org/package/text-show-instances&#34;&gt;text-show-instances&lt;/a&gt;). For an alternative to the &lt;code&gt;Read&lt;/code&gt; class, see &lt;a href=&#34;https://hackage.haskell.org/package/readable&#34;&gt;readable&lt;/a&gt;. Fast &lt;code&gt;show&lt;/code&gt; specifically for &lt;code&gt;Double&lt;/code&gt; is in &lt;a href=&#34;https://hackage.haskell.org/package/double-conversion&#34;&gt;double-conversion&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For advanced Unicode handing, see &lt;a href=&#34;https://hackage.haskell.org/package/text-icu&#34;&gt;text-icu&lt;/a&gt; (which provides &lt;a href=&#34;http://site.icu-project.org/&#34;&gt;ICU&lt;/a&gt; bindings). &lt;a href=&#34;https://hackage.haskell.org/package/unicode-transforms&#34;&gt;unicode-transforms&lt;/a&gt; is a pure Haskell alternative that does only normalization (NFC, NFKC, NFD, NFKD), but with performance comparable to text-icu. &lt;a href=&#34;https://hackage.haskell.org/package/text-manipulate&#34;&gt;text-manipulate&lt;/a&gt; has additional functions for working with word boundaries, &lt;code&gt;PascalCasing&lt;/code&gt; and &lt;code&gt;snake_casing&lt;/code&gt;, acronyms, truncating text intelligently, and so on. &lt;a href=&#34;https://hackage.haskell.org/package/text-icu-translit&#34;&gt;text-icu-translit&lt;/a&gt; has transliteration.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hackage.haskell.org/package/case-insensitive&#34;&gt;case-insensitive&lt;/a&gt; provides newtypes for strings that should be compared case-insensitively, and &lt;a href=&#34;https://hackage.haskell.org/package/text-normal&#34;&gt;text-normal&lt;/a&gt; provides newtypes for normalized text.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For using big text literals (like templates) in Haskell sources, see &lt;a href=&#34;https://hackage.haskell.org/package/neat-interpolation&#34;&gt;neat-interpolation&lt;/a&gt;. For printf-like functionality, see &lt;a href=&#34;https://hackage.haskell.org/package/formatting&#34;&gt;formatting&lt;/a&gt;, &lt;a href=&#34;https://hackage.haskell.org/package/fmt&#34;&gt;fmt&lt;/a&gt;, or &lt;a href=&#34;https://hackage.haskell.org/package/PyF&#34;&gt;PyF&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Orphan instances: &lt;a href=&#34;https://hackage.haskell.org/package/cereal-text&#34;&gt;cereal-text&lt;/a&gt;, &lt;a href=&#34;https://hackage.haskell.org/package/quickcheck-text&#34;&gt;quickcheck-text&lt;/a&gt;. Instances for &lt;a href=&#34;https://hackage.haskell.org/package/binary&#34;&gt;binary&lt;/a&gt; are provided since &lt;code&gt;text-1.2.1&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Notes&lt;/h2&gt;&lt;h1&gt;&lt;span id=&#34;item-notes-fqgxfjfq-imports-and-pragmas&#34;&gt;&lt;/span&gt;Imports and pragmas&lt;/h1&gt;&lt;p&gt;&lt;code&gt;OverloadedStrings&lt;/code&gt; lets string literals have type &lt;code&gt;Text&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;ot&#34;&gt;{-# LANGUAGE OverloadedStrings #-}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Imports are most commonly qualified:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;kw&#34;&gt;import qualified&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;Data.Text&lt;/span&gt; &lt;span class=&#34;kw&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;T&lt;/span&gt;
&lt;span class=&#34;kw&#34;&gt;import &lt;/span&gt;&lt;span class=&#34;dt&#34;&gt;Data.Text&lt;/span&gt; (&lt;span class=&#34;dt&#34;&gt;Text&lt;/span&gt;)

&lt;span class=&#34;kw&#34;&gt;import qualified&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;Data.Text.IO&lt;/span&gt; &lt;span class=&#34;kw&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;T&lt;/span&gt;             &lt;span class=&#34;co&#34;&gt;-- for putStrLn, etc&lt;/span&gt;
&lt;span class=&#34;kw&#34;&gt;import qualified&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;Data.Text.Encoding&lt;/span&gt; &lt;span class=&#34;kw&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;T&lt;/span&gt;       &lt;span class=&#34;co&#34;&gt;-- for UTF8 encoding/decoding&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Lazy variant:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;kw&#34;&gt;import qualified&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;Data.Text.Lazy&lt;/span&gt; &lt;span class=&#34;kw&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;T&lt;/span&gt;
&lt;span class=&#34;kw&#34;&gt;import &lt;/span&gt;&lt;span class=&#34;dt&#34;&gt;Data.Text.Lazy&lt;/span&gt; (&lt;span class=&#34;dt&#34;&gt;Text&lt;/span&gt;)

&lt;span class=&#34;kw&#34;&gt;import qualified&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;Data.Text.Lazy.IO&lt;/span&gt; &lt;span class=&#34;kw&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;T&lt;/span&gt;        &lt;span class=&#34;co&#34;&gt;-- for putStrLn, etc&lt;/span&gt;
&lt;span class=&#34;kw&#34;&gt;import qualified&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;Data.Text.Lazy.Encoding&lt;/span&gt; &lt;span class=&#34;kw&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;dt&#34;&gt;T&lt;/span&gt;  &lt;span class=&#34;co&#34;&gt;-- for UTF8 encoding/decoding&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you&#39;re not using anything like &lt;a href=&#34;https://hackage.haskell.org/package/base-prelude&#34;&gt;base-prelude&lt;/a&gt;, you might want to import &lt;code&gt;Data.Monoid&lt;/code&gt; to have concatenation:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;kw&#34;&gt;import &lt;/span&gt;&lt;span class=&#34;dt&#34;&gt;Data.Monoid&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1&gt;&lt;span id=&#34;item-notes-fqgxfjfq-strict-and-lazy-text&#34;&gt;&lt;/span&gt;Strict and lazy &lt;code&gt;Text&lt;/code&gt;&lt;/h1&gt;&lt;p&gt;There are 2 text types in the library – both are called &lt;code&gt;Text&lt;/code&gt; but one comes from the &lt;code&gt;Data.Text&lt;/code&gt; module and the other from &lt;code&gt;Data.Text.Lazy&lt;/code&gt; module. They&#39;re not compatible (but you can convert between them), and are intended for use in different situations.&lt;/p&gt;
&lt;p&gt;Strict &lt;code&gt;Text&lt;/code&gt; is an array of characters. Lazy &lt;code&gt;Text&lt;/code&gt; is a list (possibly infinite) of arrays of characters, or &lt;em&gt;chunks&lt;/em&gt;. It&#39;s recommended to use lazy &lt;code&gt;Text&lt;/code&gt; for cases where it makes sense to process text in a streaming fashion – for instance, if you have a huge file that you want to read and output as a web page, you could do it like “read a chunk, output a chunk, read a chunk, output a chunk...” – which is what might happen automatically if you use lazy &lt;code&gt;Text&lt;/code&gt; correctly.&lt;/p&gt;
&lt;p&gt;A rule of thumb is “if you don&#39;t ever intend for the string to be in memory only partially, use strict &lt;code&gt;Text&lt;/code&gt;”.&lt;/p&gt;
&lt;p&gt;To convert lazy &lt;code&gt;Text&lt;/code&gt; to strict &lt;code&gt;Text&lt;/code&gt;, use &lt;code&gt;toStrict&lt;/code&gt; from &lt;code&gt;Data.Text.Lazy&lt;/code&gt;; &lt;code&gt;fromStrict&lt;/code&gt; goes in the opposite direction. To break a lazy &lt;code&gt;Text&lt;/code&gt; into a list of chunks, use &lt;code&gt;toChunks&lt;/code&gt;, and for the reverse – &lt;code&gt;fromChunks&lt;/code&gt;.&lt;/p&gt;
&lt;h1&gt;&lt;span id=&#34;item-notes-fqgxfjfq-usage&#34;&gt;&lt;/span&gt;Usage&lt;/h1&gt;&lt;p&gt;Most functions from &lt;code&gt;Prelude&lt;/code&gt; are replicated in &lt;code&gt;Data.Text&lt;/code&gt;. The ones that are new are replicated below.&lt;/p&gt;
&lt;h2&gt;&lt;span id=&#34;item-notes-fqgxfjfq-common-functions&#34;&gt;&lt;/span&gt;Common functions&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pack&lt;/code&gt; and &lt;code&gt;unpack&lt;/code&gt; for converting between &lt;code&gt;String&lt;/code&gt; and &lt;code&gt;Text&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cons&lt;/code&gt; and &lt;code&gt;snoc&lt;/code&gt; to prepend/append a character&lt;/li&gt;
&lt;li&gt;&lt;code&gt;(&amp;lt;&amp;gt;)&lt;/code&gt; from &lt;code&gt;Data.Monoid&lt;/code&gt; appends two strings&lt;/li&gt;
&lt;li&gt;&lt;code&gt;toLower&lt;/code&gt; and &lt;code&gt;toUpper&lt;/code&gt; convert to upper/lowercase (there&#39;s also `toTitle)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;toCaseFold&lt;/code&gt; is used for case-insensitive comparisons: &lt;code&gt;toCaseFold x == toCaseFold y&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;span id=&#34;item-notes-fqgxfjfq-searching&#34;&gt;&lt;/span&gt;Searching&lt;/h2&gt;&lt;p&gt;&lt;code&gt;replace x y&lt;/code&gt; replaces &lt;code&gt;x&lt;/code&gt; by &lt;code&gt;y&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode repl&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; replace &lt;span class=&#34;st&#34;&gt;&amp;quot; &amp;quot;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;_&amp;quot;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;
&lt;span class=&#34;st&#34;&gt;&amp;quot;hello_world&amp;quot;&lt;/span&gt;

&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; replace &lt;span class=&#34;st&#34;&gt;&amp;quot;ofo&amp;quot;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;bar&amp;quot;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;ofofo&amp;quot;&lt;/span&gt;
&lt;span class=&#34;st&#34;&gt;&amp;quot;barfo&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;breakOn&lt;/code&gt; splits the string into “before separator” and “after separator” parts, where separator can be a string; &lt;code&gt;breakOnEnd&lt;/code&gt; does the same but starts from the end:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode repl&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; breakOn &lt;span class=&#34;st&#34;&gt;&amp;quot;::&amp;quot;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;a::b::c&amp;quot;&lt;/span&gt;
(&lt;span class=&#34;st&#34;&gt;&amp;quot;a&amp;quot;&lt;/span&gt;, &lt;span class=&#34;st&#34;&gt;&amp;quot;::b::c&amp;quot;&lt;/span&gt;)

&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; breakOnEnd &lt;span class=&#34;st&#34;&gt;&amp;quot;::&amp;quot;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;a::b::c&amp;quot;&lt;/span&gt;
(&lt;span class=&#34;st&#34;&gt;&amp;quot;a::b::&amp;quot;&lt;/span&gt;, &lt;span class=&#34;st&#34;&gt;&amp;quot;c&amp;quot;&lt;/span&gt;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;breakOnAll&lt;/code&gt; gives you all splitting variants:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode repl&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; breakOnAll &lt;span class=&#34;st&#34;&gt;&amp;quot;::&amp;quot;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;a::b::c&amp;quot;&lt;/span&gt;
[(&lt;span class=&#34;st&#34;&gt;&amp;quot;a&amp;quot;&lt;/span&gt;, &lt;span class=&#34;st&#34;&gt;&amp;quot;::b::c&amp;quot;&lt;/span&gt;), (&lt;span class=&#34;st&#34;&gt;&amp;quot;a::b&amp;quot;&lt;/span&gt;, &lt;span class=&#34;st&#34;&gt;&amp;quot;::c&amp;quot;&lt;/span&gt;)]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;splitOn&lt;/code&gt; splits the string into a list of strings; &lt;code&gt;split&lt;/code&gt; breaks on predicate &lt;code&gt;Char -&amp;gt; Bool&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode repl&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; splitOn &lt;span class=&#34;st&#34;&gt;&amp;quot;::&amp;quot;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;a::b::c&amp;quot;&lt;/span&gt;
[&lt;span class=&#34;st&#34;&gt;&amp;quot;a&amp;quot;&lt;/span&gt;,&lt;span class=&#34;st&#34;&gt;&amp;quot;b&amp;quot;&lt;/span&gt;,&lt;span class=&#34;st&#34;&gt;&amp;quot;c&amp;quot;&lt;/span&gt;]

&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; split (not &lt;span class=&#34;fu&#34;&gt;.&lt;/span&gt; isAlphaNum) &lt;span class=&#34;st&#34;&gt;&amp;quot;a::b::c&amp;quot;&lt;/span&gt;
[&lt;span class=&#34;st&#34;&gt;&amp;quot;a&amp;quot;&lt;/span&gt;,&lt;span class=&#34;st&#34;&gt;&amp;quot;&amp;quot;&lt;/span&gt;,&lt;span class=&#34;st&#34;&gt;&amp;quot;b&amp;quot;&lt;/span&gt;,&lt;span class=&#34;st&#34;&gt;&amp;quot;&amp;quot;&lt;/span&gt;,&lt;span class=&#34;st&#34;&gt;&amp;quot;c&amp;quot;&lt;/span&gt;]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;count&lt;/code&gt; counts how many times a string occurs in another string (without overlaps).&lt;/p&gt;
&lt;h2&gt;&lt;span id=&#34;item-notes-fqgxfjfq-cutting-strings&#34;&gt;&lt;/span&gt;Cutting strings&lt;/h2&gt;&lt;p&gt;&lt;code&gt;take&lt;/code&gt; and &lt;code&gt;takeEnd&lt;/code&gt; take N characters from the beginning/end, &lt;code&gt;drop&lt;/code&gt; and &lt;code&gt;dropEnd&lt;/code&gt; remove them.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;takeWhile&lt;/code&gt;, &lt;code&gt;takeWhileEnd&lt;/code&gt;, &lt;code&gt;dropWhile&lt;/code&gt; and &lt;code&gt;dropWhileEnd&lt;/code&gt; exist as well. &lt;code&gt;dropAround&lt;/code&gt; strips characters from both sides of the string.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;strip&lt;/code&gt;, &lt;code&gt;stripStart&lt;/code&gt; and &lt;code&gt;stripEnd&lt;/code&gt; strip spaces specifically.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;stripPrefix&lt;/code&gt; and &lt;code&gt;stripSuffix&lt;/code&gt; remove some particular prefix/suffix (or return &lt;code&gt;Nothing&lt;/code&gt;). &lt;code&gt;commonPrefixes&lt;/code&gt; takes two strings and cuts out the longest matching prefix from them.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;chunksOf&lt;/code&gt; splits a string into chunks of length N.&lt;/p&gt;
&lt;h2&gt;&lt;span id=&#34;item-notes-fqgxfjfq-transformations&#34;&gt;&lt;/span&gt;Transformations&lt;/h2&gt;&lt;p&gt;&lt;code&gt;justifyRight&lt;/code&gt; and &lt;code&gt;justifyLeft&lt;/code&gt; add characters to the beginning/end of the string until it reaches certain length:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode repl&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; justifyRight &lt;span class=&#34;dv&#34;&gt;7&lt;/span&gt; &lt;span class=&#34;ch&#34;&gt;&#39;_&#39;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;
&lt;span class=&#34;st&#34;&gt;&amp;quot;____foo&amp;quot;&lt;/span&gt;

&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; justifyLeft &lt;span class=&#34;dv&#34;&gt;7&lt;/span&gt; &lt;span class=&#34;ch&#34;&gt;&#39;_&#39;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;
&lt;span class=&#34;st&#34;&gt;&amp;quot;foo____&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;center&lt;/code&gt; adds the character to both sides equally, breaking ties in favor of the left side:&lt;/p&gt;
&lt;div class=&#34;sourceCode&#34;&gt;&lt;pre class=&#34;sourceCode repl&#34;&gt;&lt;code class=&#34;sourceCode&#34;&gt;&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; center &lt;span class=&#34;dv&#34;&gt;7&lt;/span&gt; &lt;span class=&#34;ch&#34;&gt;&#39;_&#39;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;
&lt;span class=&#34;st&#34;&gt;&amp;quot;__foo__&amp;quot;&lt;/span&gt;

&lt;span class=&#34;fu&#34;&gt;&amp;gt;&lt;/span&gt; center &lt;span class=&#34;dv&#34;&gt;8&lt;/span&gt; &lt;span class=&#34;ch&#34;&gt;&#39;_&#39;&lt;/span&gt; &lt;span class=&#34;st&#34;&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;
&lt;span class=&#34;st&#34;&gt;&amp;quot;___foo__&amp;quot;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1&gt;&lt;span id=&#34;item-notes-fqgxfjfq-optimisation&#34;&gt;&lt;/span&gt;Optimisation&lt;/h1&gt;&lt;p&gt;TODO: mention &lt;code&gt;copy&lt;/code&gt;, &lt;code&gt;Builder&lt;/code&gt;, explain how fusion works, etc.&lt;/p&gt;
&lt;h1&gt;&lt;span id=&#34;item-notes-fqgxfjfq-faq&#34;&gt;&lt;/span&gt;FAQ&lt;/h1&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Where is &lt;code&gt;elem&lt;/code&gt;?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It&#39;s been removed from &lt;code&gt;text&lt;/code&gt; because you can use &lt;code&gt;isInfixOf&lt;/code&gt; to do the same thing.&lt;br /&gt;
Thanks to rewrite rules, &lt;code&gt;T.isInfixOf &amp;quot;c&amp;quot;&lt;/code&gt; or &lt;code&gt;T.isInfixOf (T.singleton c)&lt;/code&gt; will be
as fast as &lt;code&gt;elem&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
</content><link xmlns:ns="http://www.w3.org/2005/Atom" ns:href="https://guide.aelve.com/haskell/strings-o62hqc69#item-fqgxfjfq"/></entry></feed>