Wednesday, February 3, 2010

Iteratee

Iteratee really looks promising on paper, and people using it seem to think it's really great. I've been put off a bit by what looks like a rather complex interface, but decided last night to take a crack at it.

What I've written below is an iteratee wrapper around "cat /etc/passwd" output using runInteractiveCommand.

module Main where

import Control.Monad.Trans
import Data.Iteratee.Base
import Data.Iteratee.Base.StreamChunk (ReadableChunk (..))
import Data.Iteratee.IO.Handle
import System.Process
import System.IO

-- For some reason this signature is wrong, but I'm not sure why...
--handleDriver :: (MonadIO m, ReadableChunk s el) => IterateeG s el m a -> Handle -> m a
handleDriver iter h = do
result <- enumHandle h iter >>= run
liftIO $ hClose h
return result

main :: IO ()
main = do
(_, outp, _, _) <- runInteractiveCommand "/bin/cat /etc/passwd"
handleDriver (stream2list :: IterateeG [] Char IO String) outp >>= putStrLn


handleDriver just runs enumHandle with an iteratee (in my case stream2list) over the Handle, in blocks (not character by character) that are specified by the implementation, returning the result. That result is then printed to stdout by putStrLn.

This is a little bit like interact for Handle in that I could have used something more advanced than stream2list to process the result of "/bin/cat /etc/passwd".

I'm not too excited about the fact that I got the type signature on handleDriver wrong, and I'm also a little bit put off by the type signature on stream2list.

So what's the difference between this approach and an interact-like styled approach? For one iteratees work like the function that's supplied to a fold operation over a collection of data. In this case the collection is being produced dynamically via the enumerator. The iteratees themselves work like little parsers that can be composed in a monadic sense. Errors in IO and termination of a stream get propagated automatically up through the system nicely.

Why is lazy IO then not so great? Look at the signature for interact:

interact :: (String -> String) -> IO ()

Interact is a function that takes a function of String to String and produces IO. This means it, ostensibly consumes all the input on stdin, applies the provided function to convert the whole String to a String, then prints that string to stdout. Now it doesn't have to read the whole of input in one shot, because of lazy evaluation. Strings are [Char], and the list structure in Haskell is non-strict in its construction. It's like pausing the construction of that list to do some processing on it, and going back to it with coroutines, except that the system is doing it behind the scenes.

What happens in interact when an error occurs? How does the pure function of type (String -> String) even know about exceptions in the processing? This is where iteratee is an improvement on traditionally lazy IO.

Let's assume we wanted to write a lazy version of interact for a handle called hInteract.

hInteract :: Handle -> (String -> String) -> IO ()

I believe this function could be used safely as follows:

withFile "/etc/passwd" ReadMode ((flip hInteract) id)

withFile uses bracket internally to ensure that hClose is called on the handle and all seems well. I don't think we necessarily understand how resources get used. Oleg, the father of Iteratee, posts this message a few years back explaining more of the benefits of Iteratee.

However, it seems that there is now a new lazy IO mechanism available that is safe. I've not had any time to check into this, but I plan to in the next coming days.

Having written an Expect-like Monad, I'm interested in the aspects of error handling and precise resource control, because the code I'm writing really needs to be able to run to as close to forever as I can get.


No comments:

Post a Comment