Showing posts with label Haskell. Show all posts
Showing posts with label Haskell. Show all posts

Wednesday, February 3, 2010

Iteratee

Iteratee really looks promising on paper, and people using it seem to think it's really great. I've been put off a bit by what looks like a rather complex interface, but decided last night to take a crack at it.

What I've written below is an iteratee wrapper around "cat /etc/passwd" output using runInteractiveCommand.

module Main where

import Control.Monad.Trans
import Data.Iteratee.Base
import Data.Iteratee.Base.StreamChunk (ReadableChunk (..))
import Data.Iteratee.IO.Handle
import System.Process
import System.IO

-- For some reason this signature is wrong, but I'm not sure why...
--handleDriver :: (MonadIO m, ReadableChunk s el) => IterateeG s el m a -> Handle -> m a
handleDriver iter h = do
result <- enumHandle h iter >>= run
liftIO $ hClose h
return result

main :: IO ()
main = do
(_, outp, _, _) <- runInteractiveCommand "/bin/cat /etc/passwd"
handleDriver (stream2list :: IterateeG [] Char IO String) outp >>= putStrLn


handleDriver just runs enumHandle with an iteratee (in my case stream2list) over the Handle, in blocks (not character by character) that are specified by the implementation, returning the result. That result is then printed to stdout by putStrLn.

This is a little bit like interact for Handle in that I could have used something more advanced than stream2list to process the result of "/bin/cat /etc/passwd".

I'm not too excited about the fact that I got the type signature on handleDriver wrong, and I'm also a little bit put off by the type signature on stream2list.

So what's the difference between this approach and an interact-like styled approach? For one iteratees work like the function that's supplied to a fold operation over a collection of data. In this case the collection is being produced dynamically via the enumerator. The iteratees themselves work like little parsers that can be composed in a monadic sense. Errors in IO and termination of a stream get propagated automatically up through the system nicely.

Why is lazy IO then not so great? Look at the signature for interact:

interact :: (String -> String) -> IO ()

Interact is a function that takes a function of String to String and produces IO. This means it, ostensibly consumes all the input on stdin, applies the provided function to convert the whole String to a String, then prints that string to stdout. Now it doesn't have to read the whole of input in one shot, because of lazy evaluation. Strings are [Char], and the list structure in Haskell is non-strict in its construction. It's like pausing the construction of that list to do some processing on it, and going back to it with coroutines, except that the system is doing it behind the scenes.

What happens in interact when an error occurs? How does the pure function of type (String -> String) even know about exceptions in the processing? This is where iteratee is an improvement on traditionally lazy IO.

Let's assume we wanted to write a lazy version of interact for a handle called hInteract.

hInteract :: Handle -> (String -> String) -> IO ()

I believe this function could be used safely as follows:

withFile "/etc/passwd" ReadMode ((flip hInteract) id)

withFile uses bracket internally to ensure that hClose is called on the handle and all seems well. I don't think we necessarily understand how resources get used. Oleg, the father of Iteratee, posts this message a few years back explaining more of the benefits of Iteratee.

However, it seems that there is now a new lazy IO mechanism available that is safe. I've not had any time to check into this, but I plan to in the next coming days.

Having written an Expect-like Monad, I'm interested in the aspects of error handling and precise resource control, because the code I'm writing really needs to be able to run to as close to forever as I can get.


Friday, January 15, 2010

When types and definitions aren't enough.

Was in #haskell on freenode a bit this morning, and someone mentioned something about how they were not exactly excited about the new rules for code formatting on if-then-else expressions.

I mentioned that I try to avoid if-then-else and case as much as possible by using types like Maybe that have 2 kinds of constructors, namely Nothing and "Just a" (for Maybe a).

I said that I can use the MonadPlus instance for Maybe a to get a lot of what is available in if-then-else clauses.

let x = someExpression
in if x == Nothing
then 9
else fromJust x

could be written as

let x = someExpression
in fromJust $ x `mplus` Just 9

mplus is defined for Maybe as evaluating the first parameter, and if it is not Nothing, it returns it, otherwise it will return the second parameter. It's essentially an "or" operator.

However, someone pointed out that there's absolutely no requirement for mplus to be written this way. It can still live up to all the rules and restrictions of MonadPlus by short-circuiting evaluation on the second argument instead of the first. Sure, it's sort of a de-facto first then second sequencing of evaluation, but it is not as safe as say "if-then-else".

I wonder now about the Applicative module as well, and specifically Alternative for the Maybe class.

I could just as easily write

let x = someExpression
in fromJust $ x <|> Just 9

But do we fall into the same trap of no guarantees? Is there a rule in Applicative enforcing the short-circuit of the first argument before the second?

Much code is written in the Applicative style for Parsec, so I really hope this is well defined.

Monday, January 11, 2010

What's Missing in the Haskell Community?

Documentation is often sighted as possibly the #1 item that needs to be improved with respect to Haskell. It depends on what modules we use, but I have to agree. It's quite difficult to uphold the claim that you don't need to understand Category Theory in order to employ a Monoid or Monad, but when you run into Monoid instances like "endo" and don't know what to make of it because the documents don't really describe how to use it, you're probably going to suffer in that that implementation of Monoid is likely useful for some kind of programming that you want to do, and you'll end up struggling with a solved problem. Some people have been stepping up to improve the documentation, and that's really wonderful, but I think there's still some work to be done there.

The best way to learn Haskell in general for me has been to get the great books that are available out there. Real World Haskell is freely available online. (But please support the authors and get a copy if you're finding it useful). Search for Haskell on Amazon.com and you'll find that the reviews are a really good guide to picking which ones might be right for you. If you're really new to the language. Dr. Graham Hutton's book is outstanding. There's even been a series on MSDN's channel 9 walking through the chapters of this book, explaining how to solve some problems and think like a functional programmer.

To keep up to date with Haskell developments, reddit has been invaluable. You'll find blog posts, updates about new Haskell packages, and general community news and related discussion topics there.

So what's still missing?

I can tell you that over the years I've been messing around with Haskell, trying to understand how it works, why it's appropriate for certain kinds of problem solving, and to really get a good appreciation of why people seem to really like it so much, that I feel like the community has been pretty amazing with respect to fueling the flames of curiosity.

Where I think we might be needing a little more help is in the following areas of Haskell.

Explaining where laziness or non-strict evaluation is an advantage over strict evaluation. Perhaps this requires learning to think about the code we write differently, in much the same way it can be a leap to get to recursive programming, I feel this might be a slightly wider gap to cross mentally. (But then again maybe I'm just getting old... )

Show more examples of unintended data-growth or space leakage due to the lack of strict evaluation. In languages like C, you're in direct control of when memory is allocated or deallocated. This is generally considered a "bad thing" for a lot of tasks, including systems programming if you are signed up with the Go camp. A side effect of non-strict by default seems to be that you have to understand how the code you're writing will be evaluated from a bit of a wider view than you might need to care about malloc and free, or new or delete. It seems that unless you've somehow been taught how to recognize the patterns that could cause a a space leak, you're basically doomed to run into some sharp corners that others already seem to understand how to avoid.

Real World Haskell has a great chapter on optimization, but perhaps it's time for an "Optimizing Haskell" book too? There's lots of good advice scattered all over the web, and the experts are not shy to offer you help should you ask. Sometimes I think it's difficult to even ask the right questions when you're confused though, and I suspect this may turn some folks off to Haskell.


Thursday, August 20, 2009

Interesting Discussion

I've been talking Monads with folks on haskell-cafe as well as lazy-IO, and understanding strictness and where it sometimes needs to be sprinkled in to get the behavior desired from a lazy IO system.

This is sort of the price paid for wanting to completely separate one's pure data processing code from IO. The advantage of having this separation is of course in testing of code, in that pure code has referential transparency, meaning that you can call it any number of times you like, and as long as you provide the same X for input, you will always receive the same Y as a result.

By putting Input and Output operations into their own context of evaluation, we totally avoid certain classes of problems in software writing, so the effort seems worth it.

Needless to say it's been an interesting discussion, linked here.