# 24 Days of Hackage: parsec

There comes a time in every programmer’s life when a plain old string just won’t cut it anymore - you need more structure. However, the more you think about this, the more you are terrified of the prospect of having to write some sort of parser, or maybe it’s time to put on some protective clothing on and fight through a forest of regular expression operators and arcane escaping rules. Why why why, you ask yourself, does it have to be this painful?

Ok, maybe that was a little melodramatic, but I can say that parsing with Haskell is an activity that I no longer dread, in fact - I find it fun and exciting! `parsec` is a parser combinator library that provides a set of primitives for parsing. As I’ve been stressing throughout this series (and will continue to!), combinators let us build complex logic out of very small, easy to reason about, pieces.

For the example today, we’ll look at parsing International Standard Recording Codes (ISRCs), a problem that I had to solve for some work on MusicBrainz. The linked Wikipedia does a good job at explaining the format of ISRCs, so lets dive right in and try and build up our own ISRC parser. First of all, we need to parse the country code. That’s just two uppercase characters, so lets write that parser:

``countryCode = count 2 upper``

Simple! This parser requires two uppercase characters, so we’ve combined the `upper` parser with the `count` combinator to run `upper` twice. If this parser succeeds, it will return the two uppercase characters that it parsed. Moving on, the next section of data we need to parse is “a three character alphanumeric registrant code”:

``````regCode = count 3 upperNum
where upperNum = upper <|> digit``````

Much like our `countryCode` parser, we’ve combined the `upperNum` parser with the `count` combinator to run it 3 times. `upperNum` can be defined by parsing either an `upper` character (A-Z) or a `digit` (0-9). The `<|>` combinator, part of the `Alternative` type class, allows us to try one parser and if that fails, try another.

Next, we take the last two digits of the year of registration:

``regYear = count 2 digit``

And finally, we take the five digit number identifying the recording:

``recordingId = count 5 digit``

All that we need to do now is thread all these parsers together, and we’re done! One option could be to do the following:

``````data ISRC = ISRC { isrcCountryCode :: String
, isrcRegCode :: String
, isrcRegYear :: Int
, isrcRecording :: Int
} deriving (Show)

isrcParser = ISRC <\$> countryCode
<*> regCode
<*  eof

λ> parse isrcParser "" "USPR37300012"
Right (ISRC { isrcCountryCode = "US"
, isrcRegCode = "PR3"
, isrcRegYear = 73
, isrcRecording = 12})``````

Here I’ve used the `Applicative` interface to build a parser that gradually builds up an `ISRC` value, and also the `Functor` instance to convert `regCode`’s into an `Int`, as `ISRC` requires. I’ve finished off by using `eof` to indicate that the parser should fail if there is any left over input.

However, this isn’t the only way to use `parsec`, though it’s certainly a convenient way. In practice, I actually only wanted to perform validation and normalization, so I can get by just sequencing these parsers and then stitching things back together:

``````isrcParser₂ = mconcat <\$>
sequence [ countryCode, regCode, regYear, recordingId ]
<* eof

λ> parse isrcParser₂ "" "USPR37300012"
Right "USPR37300012"``````

This parser uses the `sequence` combinator for `Monad`s to run a bunch of parsers, and collect all their results - then I just `mconcat` the results into a single `String`.

Oh, did I just mention parsers are `Monad`s? This gives us even more power - for example, we could branch out and run different parsers depending on what we’ve parsed so far. Not only that, but they are monad transformers, which gives us even more power. If we used a base monad such as `IO`, it’s entirely possible to write a parser that looks up things in a database while it does the parsing. Maybe a little bit on the crazy side, but it’s all possible!

You can contact me via email at ollie@ocharles.org.uk or tweet to me @acid2. I share almost all of my work at GitHub. This post is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

I accept Bitcoin donations: `14SsYeM3dmcUxj3cLz7JBQnhNdhg7dUiJn`