ParsingCommon needs
Parsers turn one representation of data into another its representation with the latter usually being more convenient to work with. Needless to say, the representations are rarely isomorphic, so the process can fail.
Recommendations
Most people are using parsec, or attoparsec if they need speed. megaparsec is better than parsec (while still very similar to it), but not yet as widespread. The API of attoparsec is easier than the one of parsec/megaparsec, but error messages are bad.
trifecta is for advanced users – it has highlighting and nice error messages, but it's hard to figure out. If you're not writing a compiler, you probably don't need it. On the other hand, if you are writing a compiler, then you might also look at alex/happy – e.g. GHC's parser is implemented using those.
Some people favor Earley, since it has a more advanced algorithm than most other libraries and so it's easier to express complex parsers in it. However, Earley is still a pretty rare library.
An unofficial successor of parsec (which hasn't seen any updates in quite some time). Nothing particularly fancy – just a good, modern parsing library.
-
Very easy to use.
-
Error messages are good.
-
Allows to use custom error messages tailored to your domain of interest (that means you can signal errors using your own custom data constructors).
-
The API is largely similar to Parsec, so existing tutorials/code samples could be reused and migration is easy.
-
Works well with Text and custom streams of tokens, such as result of running Alex/Happy.
-
Has special combinators for parsing indentation (good if you're writing a parser for a small programming language or data format like YAML).
-
Has rudimentary error recovery – if a part of a parser fails, you can log a parse error and skip a part of input. Sometimes it's useful.
-
Has special combinator (as of 5.1.0) for debugging that shows what is going on on lower level.
-
Well-tested and robust.
-
Like all parsec-like libraries, it doesn't like left recursion – i.e. if you're parsing
1+2+3
, you can't just write something like (in pseudocode)expr = number | (expr '+' number)
and expect it to work. See this post for a more detailed explanation. -
Doesn't have automatic backtracking. This means that if you write
expr = add | multiply
and the parser foradd
fails in the middle (e.g. after parsing a single number), it won't trymultiply
unless you explicitly tell it to. This can be a good thing (saying when you want to backtrack explicitly can lead to better performance and better error messages), but it can still be somewhat annoying.
- hspec-megaparsec - utility functions for testing Megaparsec parsers with Hspec.
- cassava-megaparsec - Megaparsec parser of CSV files that plays nicely with Cassava.
- tagsoup-megaparsec - a Tag token parser and Tag specific parsing combinators.
- parser-combinators - lightweight package providing commonly useful parser combinators.
A very fast parsing library for Text
and ByteString
. Best suited for parsing things that aren't going to be seen by humans (like JSON, binary protocols, and so on). Not that good for parsing e.g. programming languages – for instance, it doesn't even tell you the positions of errors when they happen.
-
Performance (see this for a comparison of sorts). Can be 10× faster than Parsec.
-
Has automatic backtracking, which means that you don't have to figure out where to put
try
– everything just works. -
Has a simpler API than parsec/megaparsec.
-
Can't report positions of parsing errors. (And the error messages are generally poor.)
-
Doesn't provide a monad transformer. This means that if you want to do something while parsing (e.g. keep state, or print warnings, or whatever), you can't.
-
Backtracking can't be turned off or limited in scope (i.e. you can't say “if this parser didn't fail then commit to it”). It makes error messages worse and likely hurts performance (but I'm not sure, given that attoparsec is still the fastest library around).
-
Additional parsers: attoparsec-expr (Parsec-like expression parser), attoparsec-binary, http-attoparsec, aeson (JSON), timeparsers, html-entities, taggy (HTML/XML), css-text, hweblib (HTTP, MIME, URI, ABNF), http-date
-
Iteratees: attoparsec-iteratee, pipes-attoparsec, conduit-extra, conduit-tokenize-attoparsec, streaming-utils, io-streams
-
Other: foldl-transduce-attoparsec, hspec-attoparsec, list-t-attoparsec, network-attoparsec, attosplit
-
Lets you report errors in a manner similar to Clang, with colors and
^~~~~~~~~
and so on, which is very useful when writing e.g. a compiler. (For an example of what Clang does, see here.) -
Has a module for doing highlighting of parsed text (i.e. you assign labels like
Number
,Operator
,Identifier
, etc and you can generate colored text from them).
-
It's in base, so you can use it even when you can't (or don't want to) depend on any parsing library.
-
Non-deterministic – all parse results will be returned. Hence doesn't need
try
or backtracking, and doesn't leak space. (Left-biased Parsec-like choice is still possible with<++
.) -
Can be used for writing complicated
Read
instances that are fully compliant with Haskell's precedency parsing requirements (see theReadPrec
module). -
Can be faster than Parsec (see this benchmark where parsing a simple config file is twice as fast with ReadP).
-
Has a function for using the
Read
instance as a parser (i.e.readP_to_S reads
).
<notes are empty>
<notes are empty>
A very simple, backtracking, fast parser combinator library.
Do not use fastparser when:
- performance is not the most pressing concern.
- you need to parse anything else but strict ByteString.
- you need to use a battle-tested library (still experimental)
- you need to parse large inputs that are not easily cut into many smaller pieces that can be parsed independently
<notes are empty>