Quantcast
Channel: [In]effective Theory
Viewing all articles
Browse latest Browse all 53

Flexible Format Strings in Haskell

$
0
0

One of my continual frustrations with most programming languages is the lack of facilities for creating quick'n'easy parser. C has scanf, but that's unbelievably fragile, and not particularly useful for more complex formats. You can use yacc, or Haskell's parsec, but all such options are rather overblown for just parsing, say, a log file.

Take a specific example: the nginx log_format directive takes a format string like so (this is not the full default):

$remote_addr - $remote_user [$time_local] [took $request_time ms] "$request"

This is a nice, simple structure, corresponding neatly to how one would generate the string in many scripting languages. Unfortunately, actually parsing the resulting log is much more challenging. It's perhaps easiest in perl, but even then, you have to go to great lengths to figure out precisely what characters will be permitted in which variable. (Or alternatively, you can probably hack out an entirely unreadable solution with lookaheads and whatnot. Not an attractive prospect.)

Wouldn't it be nice if you could parse a log file using that same format string?

So I've hacked up a haskell library to do that. On hackage and on github. Hopefully, it will prove useful.

Some technical details. It works by treating contiguous chunks of raw text (with no intervening variables) as delimeters, marking the end of the text to be assigned to the previous variable. The shortest match, therefore, is always used. This provides the maximum flexibility for the content of variables, at the cost of rigid requirements for the rest of the text - as this library is intended for parsing auto-generated text, I consider that an acceptable tradeoff. Real-world tools might want to perform some whitespace manipulations (all whitespace to a single space, for example) to allow for greater flexibility in input.


Viewing all articles
Browse latest Browse all 53

Trending Articles