Anda di halaman 1dari 27

Monad tutorial

From http://spbhug.folding-maps.org/wiki/MonadsEn

Introduction
The word "monad" is familiar to virtually everyone who has dealt with functional programming. Many people are scared away by their seeming abstractness and mathematicity and by the need to use them for the most simple tasks, like console I/O. The Internet is filled with monad tutorials and there is even a popular belief that one has the right to be considered an FP newbie no sooner than he writes such a tutorial. These articles present monads from different points of view: from the one of their underlying mathematical concepts, or like a 'hack' into the imperative world, or from a purely practical point of view, or extremely broadly and comprehensively. The text you are reading is meant to illuminate them from another one: monads will be presented as a generalization of certain common idioms and as a method for abstracting them.

Goal and target audience


It is assumed that the reader is familiar with basics of Haskell and with the methodology of FP in general. Otherwise, one can get the required knowledge from articles listed at the end of this article. My goal is to explain the concept and give an intuitive understanding of monads with several examples, some of which are common and familiar and some are not. This text is not a practical manual on usage of standard monads.

Three familiar examples


Here we consider three common idioms: sequential computations, computations with handling of absent values and computations that return a set of possible results. Then we shall point out their commonality.

Sequential computations (the IO monad)


Consider the ';' operation, which means 'and then':
(print "Hello") ; (print "Goodbye")

This line means: Display "Hello" and then display "Goodbye". The execution of this program proceeds as follows:

At moment t0 the screen is blank At moment t1 the screen is modified by print "Hello"; the screen contains 'Hello'. At moment t2 the screen is modified by print "Goodbye"; the screen contains 'Hello Goodbye'

Let us write this sequence as a system of simultaneous equations:


world(t=0) = (empty screen) world(t=1) = world(t=0) + (print "Hello") world(t=2) = world(t=1) + (print "Goodbye")

This way of programming (expressing the solution of a problem via a sequence of actions) is the most well-known and used one in main-stream languages: C, Java etc. One can easily see that in the world of imperative computations, I/O etc. all such systems of equations look like a 'chain': world(n+1) = world(n) + action. One can not take an old state of the world out of one's pocket and perform an action on it, because two different states of the world cannot coexist. Thus, an equation like world(10) = world(3) + (print "Hello") is impossible. So, a sequence of actions A1; A2; A3; ... can be viewed as A1 `then` (A2 `then` A3 `then` ...). The then operation suffices for describing a sequence of actions (it is actually the same as a semicolon, ;, but we are not used to seeing the semicolon as a binary operation). Note that the operation is associative (t.i., ((A1 `then` A2) `then` A3) == (A1 `then` (A2 `then` A3))), so the parentheses can be omitted; all languages do so. However, for a better intuitive understanding ('do the first action, then do the rest') one should interpret the `then` operation as right-associative. Now, a question arises: what should be considered the value of (A `then` B): the value of A, the value of B or something else? The traditional and logical way is to take the value of B. Really, if we had to first evaluate A and then evaluate B, that means that correctness of evaluation of B did depend on evaluation of A: for example, in the sequence
(send query to database) `then` (await answer from database)

the correctness of 'await answer' depends on whether 'send query' was evaluated. Thus, if we are interested in the correctness of B, we are interested in its value, at least in the cases where we are evaluating (A `then` B) to get some value and not just for the side-effects. From a purely functional point of view (where one considers that the only important thing about a computation is its value) one could say that the `then` operation is unneeded; then in the example above we could just write 'await answer from database'. But this is obviously false: we should first completely evaluate 'send query' and only afterwards start evaluating 'await answer'. This necessity is dictated by the pure fact that both computations have side effects, t.i. interact with the outer world. Morale: In the world of computations with side effects (I/O, modifications of variables) computations are bound with a 'and then' operation, which means 'do the first, and, only after it completes, do the rest'.

Computations with absent values (Maybe monad)

Consider a synthetic but typical and familiar example: A program opens a file, reads a line from it and searches a database for the corresponding value. If an error occurs at any stage, null is returned.
File f = open("keys.txt"); if(f == null) return null; String key = readLine(f); if(keys == null) return null; String value = ourDatabase.get(key); if(value == null) return null; return "The value is: " + value;

Now let us substitute the ; ('and then') operation for ;? ('and then, if successful'):
File f String String return = open("keys.txt") ;? key = readLine(f) ;? value = ourDatabase.get(key) ;? "The value is: " + value;

This example can be easily generalized to a more sensible way of handling errors, where instead of null one uses values of an Exception type. However, the essence remains: functions return values of two kinds: 'results' and 'error signals', and error in an intermediate computation causes the error in the whole computation. Morale: In the world where computations sometimes fail, computations are bound with the 'and then, if successful' operation, which means 'do first and, if successful, do rest; if error E occurred, return error E'. The example shows a particular case where there is only one type of error - null.

Computations with many results (List monad)


Consider computations that return a list of values. They can return them either directly, like 'list files in a directory' or 'split string into a list of symbols', or indirectly, via a combination of such multiple-valued computations:

the procedure 'get all orders of all departments of all shops', based on procedures 'list shops', 'list departments of a shop' and 'list orders in a department' the procedure 'get all words in a file', based on procedures 'list lines of a file' and 'list words of a line' the procedure 'list files in a directory and its subdirectories', assembling its result from results of recursive calls.

Here is a typical implementation of the first procedure, which computes the total cost of all orders.

foreach(Shop s : getShops()) { foreach(Department d : getDepartments(s)) { foreach(Order ord : getOrders(d)) { sum += ord.getCost(); } } }

Now let's consider the case where the number of levels of iteration is unknown: for example, recursively listing a directory.
listFilesRec(File file) { List<File> res; res.add(file); foreach(File sub : getContents(file)) { res.addAll(listFilesRec(sub)); } return res; }

Let us now rewrite these examples using a ;* operation, which threads the next operation through all results of the previous one.
Shop s = getShops() ;* Department d = getDepartments(s) ;* Order ord = getOrders(d) ;* sum += ord.getCost(); listFilesRec(File file) { contents = getContents(file) ;* rec = listFilesRec(contents) ;* return [file]++[rec]; }

Morale: In the world of iterations and non-determinate computations, computations are bound by the 'and, for each result' operation, which means, for example, 'for each result ord of getOrders(d) evaluate sum += ord.getCost(d)'.

Warning
At this point, it is important to know that, although these three monads are very often used for illustrating the concept, they are the most easy and maybe even degenerate ones, understanding of just them does not give an understanding of monads in general. In my opinion, that's the precise reason why many people don't understand monads or consider them too simple, too complex or too useless. You will get a good understanding after two other examples of monads, namely Parser and especially Dist, which are described somewhat below. However, don't proceed to that section before reading the text in between Now, let us continue.

What's in common?

An attentive reader might have already noticed that in all cases we chose a specific strategy of binding two computations, the 'first' one and the 'rest' one; we overloaded the ; operator. One can say that this is the precise essence of monads. Now, to use this as a means of abstraction, we'll have to formalize this concept. Let us formalize the concept of 'binding' two computations, t.i., generalize the ;, ;? and ;* operations to one operation, namely (>>=). 1. This is a polymorphic operation: its logic does not depend on the values and types of values it binds. For example, ;* works in absolutely the same way, whether it binds a list of shops with a computation depending on a shop or when it binds a list of files with a computation depending on a file. 2. The type of (>>=) obviously depends on the types of the first and second (rest) computations: it will have at least 2 type variables: a (type of the first computation) and b (type of the second computation). Note again the polymorphism: the type will be of form
forall a, b . (a type expression depending on a and b)

3. The result of this operation is the result of the second computation.


forall a, b . ... -> b

4. The second computation depends on the first, otherwise we wouldn't need any shiny monads and could just perform the second computation. So far, we get something like a -> (a -> b) -> b, where a is the first computation and (a > b) is the second, depending on a 'parameter' computed by the first. This is close to true (although there is surely something wrong with it: there exist only one function with such a type, and it is flip ($), the reverse application function, and it doesn't deserve such a loud name as 'a monad'). However, there is an important caveat: a value is radically different from the same value computed inside a monad! For example, computing "Hello" as "He"++"llo" is obviously radically different from reading "Hello" from keyboard: the first way doesn't have side effects, whereas the second does (mainstream languages and languages without lazy evaluation tend to ignore this distinction, but we won't). For example, readLine has type IO String instead of String and a hypothetical lookupUserByName function has type String -> Maybe User instead of String -> User. So, the type of a computation in a monad m is not 'a' but 'm a'. This can be interpreted as 'A value computed in a particular way', for example 'A value computed with side effects' (IO monad) or 'A value that maybe hasn't been computed because of an error' (Maybe monad) or 'A value with multiple alternatives' (List monad). A monad attaches an adjective to all values. Now, we get something like m a -> (m a -> m b) -> m b. This is even closer, but the second computation actually doesn't need to know about the fact that its argument is 'wrapped'. The print function doesn't care about how the displayed string was computed, whereas getContents doesn't care that it is actually called on several files and that its results are concatenated. These dependencies should be managed not by the computations but by the implementation of (>>=) in the monad; this is the precise mission of (>>=). So, we get the type (>>=) :: m a -> (a -> m b) -> m b. This is the final and correct variant. This type can be read as follows: (>>=) in a monad m binds monadic computation of a

parameter (m a) with a monadic computation depending on this parameter (a -> m b), yielding a monadic result (m b). Now we are able to bind two monadic computations together, but there is no way to create a monadic computation 'from nothing' for an arbitrary monad. It may seem that there is no need to have such an operation, because we will always use a concrete monad (Further we'll see that this isn't true and there exist useful functions that work for all monads), for instance, the List monad, and for a concrete case, a way to create a monadic value usually exists: there's nothing difficult about creating a list of values (a value of type List a). However, since such an operation must, directly or indirectly, exist in any useful monad, it makes sense to include it into the definition of a monad. This operation is called 'return' and has type a -> m a. It takes an arbitrary fixed value and 'lifts' it into the monad, attaching the monad's adjective to it. return converts a value to what we would get if the value was computed inside the monad. Things will become more clear soon. For example, in the IO monad return "Hello" :: IO String and this computation represents 'the string "Hello", as if it was computed with side-effects' . In the Maybe monad, return "Hello" = Just "Hello" and it means 'the string "Hello", as if it were the successful result of a computation that potentially could fail' , and in the List monad return "Hello" = ["Hello"] , 'the string "Hello", as if it were the sole result of a multiplevalued computation' . The reasoning behind such a choice of implementing return are intuitively clear but now we'll see their theoretical grounding. Remember that (>>=) in the monad m binds monadic computation of a parameter with a monadic computation depending on this parameter. If we perform the 'monadic computation of a parameter' using return, it would make sense to require the effect to be the same as if the parameter was simply passed to the second computation: (return x >>= f) == (f x). This is the first of the three monad laws. It's easy to check that for the monads considered above such implementations (return "Hello" = ["Hello"] etc.) satisfy this law. There are another two laws, which are less obvious; their meaning will be clear from their definitions. So, here are the three monad laws:

Monad laws

Agreement of return and (>>=): (return x >>= f) == (f x) Associativity of (>>=) : ((x >>= f) >>= g) == (x >>= (\y -> (f y >>= g))). This law allows to interpret the sequence a ; b ; c ; ... as monolithic and to not care about placement of parentheses. In terms of operations like ;?, ;*, ; (let us denote them all with ;;) one can re-formulate this law as follows:

z = ( y = x ;; return (f y ) ;; return (g z)

is identical to
y = x ;; ( z = (f y) ;; return (g z) )

Right unit: (x >>= return) = x . One can rewrite this law as (x >>= (\a > return a)) = x, that is, binding a monadic computation x with a parametrized computation which simply returns the parameter is the same as just x.

Now we can give a complete definition of a monad:

Definition of a monad
A monad is a triple (m, return, >>=), where:
m is a type with one argument return :: forall a . a -> m a (>>=) :: forall a, b . m a -> (a -> m b) -> m b

These requirements can be formalized at the type system level, and Prelude contains a class definition:
class Monad m where return :: a -> m a (>>=) :: m a -> (a -> m b) -> m b

A monad should satisfy the laws (they can't be enforced or checked by the compiler but their violation will most probably lead to exotic consequences):
(return x) >>= f = f x (x >>= f) >>= g = x >>= (\y -> f y >>= g) (x >>= return) = x

Actually, the Monad class in Prelude contains one more member: fail :: String -> m a but we won't talk about it because it has no relation to the concept of monads and in the next version of Haskell standard it is planned to move this function to a separate class.

Standard monads
Identity
This is the simplest monad, to which corresponds the adjective 'ordinary': Identity String is 'just an ordinary String'. This monad essentially doesn't change neither types of values, nor the binding strategy.
data Identity a = Identity a return a = Identity a

(Identity a) >>= f = f a

It is not clear why one would need such a monad, but it has applications with monad transformers; however, they are beyond the scope of this article. So, let us consider this monad a purely illustrative degenerate example.

Maybe (the monad of computations with handling of absent values)


In this monad there are two kinds of values: 'ordinary' ones and a special value 'the value is missing'.
data Maybe a = Nothing | Just a

The implementation is trivial: binding a 'just' value with a parametrized computation is simply passing the parameter to it, whereas binding a missing parameter with a parametrized computation yields a missing result.
return a = Just a Nothing >>= f = Nothing (Just a) >>= f = f a

List (the monad of computations with multiple results)


Here each value has several 'alternatives'. If one value depends on another, alternatives of both are enumerated.
return a = [a] params >>= f = concat [f x | x <- params]

Let us remember the types participating:


params :: [a] f :: a -> [b]

is a list of values, or 'an indeterminate value' is an 'indeterminate' function that returns a bunch of possible results, depending

on a parameter Thus, as expected, params >>= f :: [b] is a list of possible results obtained by applying the function to possible arguments. For example, one might have
["c:/music", "c:/work"] >>= getDirectoryContents = ["c:/music/Bach", "c:/musi c/Beethoven", "c:/music/Rammstein", "c:/work/projects", "c:/work/documents"]

State (the monad of computations with a mutable state)


This is a more complex monad not mentioned above. It corresponds to a computation that has an internal state modified while it proceeds, but the state is not as global as in the IO monad (where

a computation potentially changes state of the whole world and where the state can't be obtained directly or stored). Here each computation does two things: it returns a value and modifies the state (t.i., returns a new state). So, a computation of type a with a state of type s has type s -> (a,s). That's exactly the type of the State monad: newtype State s a = State {runState :: s -> (a, s)} (probably it would better be called StatefulComputation but that's somewhat too long). Consider, for instance, a program implementing the Monte-Carlo method. It will need a stateful random number generator. This program will use a function 'generate a random number' that returns a new random number and changes the generator state:
rand :: State RndGen Int rand = ... -- Let us omit the implementation for now; it will be shown below. monteCarlo :: (Int -> Bool) -> Int -> State RndGen Int monteCarlo experiment 0 = return 0 monteCarlo experiment n = rand >>= \r -> if (experiment r) then monteCarlo experiment (n-1) >>= \s -> return (s+1) else monteCarlo experiment (n-1)

Note that the code of monteCarlo has no signs of presence of assignment and change of state: the code is written in a purely functional style and its statefulness is hidden by the State monad and the rand function. How would we implement rand? Well, we could do it in a boilerplate fashion:
rand :: State RndGen Int rand = State $ \(RndGen x) -> (x, RndGen ((x*1367823 + 918237) `mod` 32768))

However, it seems more natural to equip the State monad with functions for reading and changing state; they are called get and put and the State monad in Haskell is actually equipped with them:
rand = get >>= \(RndGen x) -> put $ RndGen ((x*1367823 + 918237) `mod` 32768) >>= return x

This code is a bit longer but it doesn't look like a mystical lambda abstraction, instead it manipulates state in an obvious way. It is not beautiful but somewhat later we'll see that Haskell has syntactic sugar for this kind of code. The get and put functions are not hacks, they are implemented rather trivially but it is instructive to have a look at them:
get :: State s s

get = State $ \s -> (s, s) put :: s -> State s () put s' = State $ \s -> ((), s')

It turns out that get is 'a stateful computation that just returns the state' and put is 'a stateful computation that returns nothing in particular and just modifies the state' Now look at the implementation of the State monad itself:
instance Monad (State s) where return a = State dontChangeStateAndReturnA where dontChangeStateAndReturnA s = (a, s) -- r1 :: State s a = State (s -> (a, s)) is a stateful computation of 'a' -- p :: a -> State s b = a -> State (s -> (b, s)) is a stateful computation of 'b' parameterized by r1 (State r1) >>= p = State passState where passState s = (res2, finalState) where (res1, intermediateState) = r1 s -- Perform the first computation, compute the parameter (State r2) = p res1 -- *Compute* the second computation, using the parameter (res2, finalState) = r2 intermediateState -- Perform the second computation

This code can be made clearer if we remove the State constructor (however, it will become incorrect):
r1 >>= p = passState where passState s = (res2, finalState) where (res1, intermediateState) = r1 s -- Perform the first computation, compute the parameter (res2, finalState) = (p res1) intermediateState -- Compute and perform the second computation

IO (the monad of computations with side-effects)

This section presents the IO monad in a perhaps intuitively understandable but incorrect way; the correct way is more complex than the intended complexity of this article. However, it is also much more peculiar; you can look at it in [C.5]

IO's adjective is 'computed with side-effects', and its strategy of binding two computations is 'first perform side effects of the first one, then side effects of the second one' (remember the example with sending query to a database and waiting for response), whereas pure computations involved proceed lazily as usual because their order doesn't matter. Let us take a naive approach to describing IO:

data IO a = IO a return a = IO a (IO a) >>= f = f a -- (*) but with side effects of 'a' magically performed before applying 'f', see later

Remember that in the beginning of this text, we told that there's no way to restore a previous state of the world and that the essence of sequential computations is to build a strictly linear chain of world states. The only way to construct a sound model of sequential computation is to prohibit saving the world's state for later use and breaking the linearity. To do this, it suffices to hide the IO constructor and thus prohibit pattern matching: let (IO s) = readLine in .... Having done that, we now have a stunning property: there's no way out of the IO monad! There is no way to convert IO a to just a and no way to hide the fact that a computation has side effects. If a function uses a value of type ... -> IO ..., then the type of this function will also be '... -> IO ...'. Thus, a function that uses side effects gets itself annotated with "warning, side effects present" and vice versa: if a function does not return an IO value, we can be sure that it has no side effects, this fact is enforced by the typechecker! This is one of the reasons for low bug count in Haskell problems because it eliminates one of the most frequent causes of bugs: indirect interactions because of side effects.

To be honest, there exists a function System.IO.Unsafe.unsafePerformIO with type IO a -> a, but one should think twice before even considering its usage, because with this function we also obtain all the imperative problems of hidden effects, which get much worse because of Haskell's lazy evaluation mechanism: now it is very hard to say when exactly will an effectful computation be performed and whether it will be performed at all. Usage of unsafePerformIO can be reasonably safe only in cases where its side-effect does not affect anything (for example, debug output) or when it is always the same, t.i. the computation is idempotent (for example, reading a global configuration file). But even in this cases, it's better to avoid it.

It is difficult to explain how exactly (*) is achieved, and, with this approach (data IO a = IO a) it is completely impossible. However, this approach, together with (*) and hiding the IO data constructor, shows the most important properties of the IO monad: the fact that pure functional computations get unchanged and that effectful computations get sequenced, and the fact that one can't get out of the IO monad and hide the effectfulness of a computation. Some ways of implementing the IO monad include presenting it as a State monad which has the whole world as its state (or some internal runtime state, like processor registers, I/O ports and process memory, transformation of which leads to the computer performing observable effects), presenting an IO value as a chunk of code that, when ran, performs an effect, and presenting it as a function from a bunch of (already performed) effects to a bunch of some more effects. This forces sequencing of effects because the function can't proceed before its 'input' effects are performed.

Others

Haskell's standard library defines some more monads: Reader, Writer and Cont. I recommended to look at them yourself: Reader and Writer are conceptually simple and often used; Cont is a complicated monad used for programming in 'continuation-passing style'. Its discussion is beyond the scope of this article.

do syntax
Consider again the example with multiple-valued functions (the List monad):
Shop s = getShops() ;* Department d = getDepartments(s) ;* Order ord = getOrders(d) ;* sum += ord.getCost();

Let us rewrite it in Haskell:


let s = d = ord in sum getShops getDepartments s = getOrders d (getCost ord)

This code is of course incorrect: it doesn't typecheck. s has type [Shop], not Shop, so getDepartments can't be applied to s, similarly for d/getOrders and ord/getCost. Code with the ;* operator shouldn't be interpreted literally. Actually, when writing
Shop s = getShops ;*

we meant that s is a parameter used by the rest of the computation. So, this code means not 'Let s equal getShops' but instead 'Let parameter s be computed via getShops'. Results of this computation will be used by ;* that, in case of the List monad, has type [a] -> (a -> [b]) > [b] and, in the particular case where a = Shop, [Shop] -> (Shop -> [b]) -> [b]. So, s has type [Shop] but the rest of the computation depends on Shop. Taking this into account we can rewrite the first fragment:
getShops() >>= \s -> getDepartments(s) >>= \d -> getOrders(d) >>= \ord -> sum += ord.getCost();

In haskell:
sum (getShops >>= \s -> getDepartments s >>= \d -> getOrders d >>= \ord -> return (getCost ord))

Obviously, such a kind of 'assignment' corresponds to the syntactic idiom value >>= \variable -> . Haskell has syntactic sugar for that, called 'do syntax':

sum (do s <- getShops d <- getDepartments s ord <- getOrders d return (getCost ord))

This gets translated to precisely the code above. Such syntax makes the code look more imperative, especially in case of the IO monad:
main = do putStrLn "Input a number" a <- readNumber putStrLn "Input another number" b <- readNumber putStrLn $ "Their sum is: " ++ show (a + b)

Inside a do block a sequence of two lines where the first one is not a binding of the form a <... gets translated as if it was written not putStrLn "Input a number" but _ <putStrLn "Input a number": the computation is performed but its result is not used (is passed to a constant function \_ -> ...). Of the monads mentioned above, only IO and State have computations whose return value is not important. Somewhat later we'll return to such actions and monads. This is how the code above gets translated:
main = putStrLn Input a number >>= \_ -> readNumber >>= \a -> putStrLn Input another number >>= \_ -> readNumber >>= \b -> putStrLn $ Their sum is: ++ show (a + b)

Designing monads
The three monads above are, of course, of some practical value and they do illustrate the idea of monads, but it they were all that monads are about, monads couldn't be called that useful and interesting, and these examples could be implemented in some different way. The real power of monads is disclosed when one finds out that a specific problem domain is well described in terms of monads. Now we'll look at two such examples.

Syntactic analysis
Let's design a parsing library. Let a parser be parametrized by type of the values it returns. For example, a number parser will return numbers, whereas a code parser will return the code's syntax tree. Type of a parser will be like type Parser a = String -> Maybe a, where the Maybe indicates that parsing may sometimes fail. A number parser will return Nothing when instructed to parse "Hello". Having such a type, one can easily write some 'primitive' parsers, for example:

a parser for numbers a parser that always returns the same constant value a parser that always returns the input string a parser that always fails a parser that expects a particular string, returns () if it is found and fails otherwise

One can also implement the <|> combinator: parallel composition of parsers: it tries to apply the first parser and, if it fails, applies the second one instead.
(<|>) :: Parser a -> Parser a -> Parser a a <|> b = \s -> case (a s) of Just v -> Just v Nothing -> b s

However we can't implement sequential composition (<*>), for example, a parser for two numbers ("123 456" -> (123, 456)) can't be implemented efficiently in this way. That's because the combinator knows nothing about the implementation of its arguments and it has to check all possible partitions of the input string into two, call the first parser on the left part and the second parser on the right part. This is extremely slow and, in case of more than two parsers, complexity becomes exponential in both length of the input string and parser count. Actually, we would like to implement sequential composition as follows: let the first parser 'eat' as much of the input string as it can and pass the rest to the second parser. We have to take this fact in the Parser type: type Parser a = String -> Maybe (a, String). Now sequential composition is easy:
(<*>) :: Parser a -> Parser b -> Parser (a,b) a <*> b = \s -> do (va, s') <- a s (vb, s') <- b s' return (va,vb)

This code fragment uses the Maybe monad and do-syntax; for a better understanding look at two more variants having the same semantics: The first one: How the compiler expands the code above:
a <*> b = \s -> (a s) >>= \(va, s') -> (b s') >>= \(vb, s'') -> return (va,vb)

The second variant, which does not use the Maybe monad at all: the (>>=) and return functions are expanded:
a <*> b = \s -> case (a s) of Nothing -> Nothing Just (va,s') ->

case (b s') of Nothing -> Nothing Just (vb, s'') -> Just ((va,vb), s'')

So, with a type type Parser a = String -> Maybe (a, String) it is easy to implement sequential parser composition and also (left as an exercise for the reader) parallel composition. Let us now consider a couple more important combinators: start with oneOrMore :: Parser a -> Parser [a]. It will be useful for parsing strings like this:
Napoleon 1769|Emperor of France Henry the 8 th 1491|King of England, had 6 wives

A recognizing parser (one that just checks correctness but doesn't return anything) can be implemented like
kingInfo = (oneOrMore anyChar) <*> space <*> (oneOrMore digit) <*> char '|' <*> (oneOrMore anyChar)`

Let us invent an implementation for oneOrMore:


A greedy variant: oneOrMore p applies p while it is possible. Obviously this won't work with this example because the very first oneOrMore anyChar will eat the whole string. A non-deterministic variant: (oneOrMore p1) <*> p2 should find a way to split the input string into two, where the first can be parsed by several applications of p1 and the second can be parsed by p2.

Only the second variant suits our needs. The easiest (or the only) way to implement it is to transition from the Maybe type to lists. Philip Wadler formulated this as 'Replace failure by a list of successes' in his work [C.1]. Now the parser will have type type Parser a = String > [(a, String)]. Implementations of various operations will become even simpler and the implementation of <*> with the do-syntax won't change at all! Now let us implement a better kingInfo that will not only check syntax but also return a King: data King = King {name::String, birth::Int, info::String}:
kingInfo = \s -> do (name, s1) <- oneOrMore anyChar s (_, s2) <- space s1 (birthStr, s3) <- oneOrMore digit s2 (_, s4) <- char '|' s3 (info, s5) <- oneOrMore anyChar s4 return (King name (read birthStr) info, s5)

We see a dependency between computations that hasn't been taken into account in the List monad: seems like we need our own! It will be a lot like the List monad but it will store the rest of the parsed string together with each parsed value: somewhat like a mixture of List and State:

newtype Parser a = Parser {runParser :: String -> [(a,String)]} instance Monad Parser where return a = Parser $ \s -> [(a,s)] pa >>= pb = Parser $ \s -> [(b,s'') | (a,s') <- pa s, (b,s'') <- pb s']

During transition from Maybe to this monad all primitive parsers and their combinators undergo trivial changes, for example, <*> now looks as
(Parser p) <*> (Parser q) = Parser r where r s = concat [((vp,vq), s'') | (vp,s') <- p s, (vq, s'') <- q s']

and the 'royal' example can now be rewritten as


kingInfo = do name <- oneOrMore anyChar space birthStr <- oneOrMore digit char '|' info <- oneOrMore anyChar return (King name (read birthStr) info)

Now the parser declaration looks precisely as a Backus-Naur grammar; to achieve this, we exploited the 'monadic' essence of BNF and of the parsing process. Noticing monadic structure in a problem is very important and useful: usually, when it is possible, the code becomes as clear and beautiful as in this example. The Parsec parser library uses a similar approach and its idea is taken from [C.3].

Statistical distributions
Now we'll do some statistic modeling. Let us model the behavior of three types of drivers on a crossroad and estimate their collision probability (example taken from [D.1]):

There are three types of drivers: cautious, normal and aggressive The traffic light has 3 positions: yellow, green and red If two 'perpendicular' drivers on a crossroad start driving simultaneously, they collide with a probability of 0.3 A driver is cautious with probability of 0.2, normal with a probability of 0.6 and aggressive with a probability of 0.2 The following table describes reactions of various drivers to various traffic lights, each cell means ' This type of driver starts driving when he sees this traffic light with this probability'

Cautious Normal Aggressive Green 1 1 1 Yellow 0.1 0.2 0.9 Red 0 0.1 0.3

The goal is to estimate the collision probability between two given types of drivers and average it over all types. Let us forget for a moment about the statistical nature of the problem. Then the solution becomes a trivial deterministic procedure:
bool collide(Driver first, Driver second, Light light) { Action firstAction = first.actionOn(light); Action secondAction = second.actionOn(light); bool result = doActionsLeadToCollision(firstAction, secondAction); return result; }

The only thing we need to do in order to transform this simple procedure into a computation of collision probability is to replace every type T with 'a distribution over T': for example, 'bool' will become 'distribution over bool', 'Driver' will become 'distribution over Driver', etc. Then, if the language provides us with arithmetic over such types, the problem will be solved. Two questions arise: 1) What is a 'distribution over T' and how to represent it in our program and 2) How to implement arithmetic over such values. The answer to the first question depends on how we shall use these values. It makes sense to assume that we'll generate values and compute statistics over them. For that we'll need:

Domain of a value (its carrier, or support) Probabilities of each value appearing (in the discrete case) or the distribution function (in the continuous case) A procedure of generating a value with such a distribution A way to compute the dispersion or mean of the value, or, in a more general case, the dispersion or expectation of some function of the value

But dispersion can be expressed via the expectation: Dx = M[(x-Mx)2], also probabilities of individual values in the finite case can be expressed as P{x=x0} = M[if(x==x0) then 1 else 0]. In the infinite case we don't actually need the distribution function. Really, of what use can it be? One usually integrates it over an interval to obtain the probability of a value belonging to the interval, but this integral can also be expressed via the mean: 'P{x0<x<x1} = M[if(x0<x<x1) then 1 else 0]'. Of course, it would be silly to say that the distribution function is useless; the point is that for some practical computational purposes it is can not be used if specified directly as a computational procedure. To put it together, it looks like we'll need three 'basis' properties, through which all others can be expressed:

Carrier Generator Procedure for computing means

Let us consider only discrete distributions with a finite carrier, because for the general case of infinite distributions we'll be unable to implement the expectation function precisely.
module Dist where import System.Random data Dist a = Dist { support :: [a], gen :: StdGen -> (a, StdGen), expect :: (a -> Float) -> Float }

Now, what about the second question? Let us compute the sum of two fuzzy values: an integer a and an integer b. The distribution of their sum can be computed by randomization of a+b over the parameter a (randomization of a value over a parameter is averaging or obtaining of its distribution with a random value of the parameter). Addition can be generalized to an arbitrary function; the number of function arguments can also be generalized to an arbitrary large value. So, the only operation needed for composing distributions from parts, t.i., for computing characteristics of expressions of the form f(a1, a2, a3, ...), is randomization. The randomization operator has type: (distribution of parameter p) -> (distribution depending on p) -> (randomized distribution), or, if we replace 'distribution over a' with 'Dist a', Dist a > (a -> Dist b) -> Dist b. Whoa! This is precisely (>>=) :: m a -> (a -> m b) > m b applied to the Dist monad! That's true: statistical distributions form a monad that has randomization as its binding operator. Obviously, return corresponds to the delta distribution, t.i., the distribution where the random value can have only one particular value with a probability of 1:
instance Monad Dist where return a = Dist {support = [a], gen = \g -> (a, g), expect = \f -> f a} da >>= fdb = Dist { support = concat [support (fdb a) | a <- support da], gen = \g -> let (a, g') = (gen da g) in (gen (fdb a) g'), expect = \f -> expect da (\a -> expect (fdb a) f) }

This is how these operations work: return a returns a distribution whose carrier is a singleton set of a, the generator always generates a and the expectation of function f equals f(a). (>>=) :: Dist a -> (a -> Dist b) -> Dist b: here da is distribution of the parameter and fdb is a function depending on the parameter and returning a distribution db. During randomization, the following occurs:

The carrier is assembled as a union of carriers of 'db' distribution across all possible values of the parameter, t.i. across values of 'support da'.

The generator generates a parameter, computes the 'db' distribution and uses it to generate the result Expectation uses the randomization formula: M,,a,b,,[f(a,b)] = Ma[Mb[f(a,b)]]

Now we can define a couple of easy distributions and proceed to solving the original problem. For illustrative purposes a single distribution, freqs, will suffice, which is determined by pairs (probability, value), which, in turn, can be expressed with the choose combinator: 'a mixture of two distributions in proportion p : 1-p'.
choose p d1 d2 = Dist {support = s, gen = g, expect = e} where s = support d1 ++ support d2 g sg = let (x,sg') = randomR (0.0,1.0) sg in (if x < p then (gen d1 sg') else (gen d2 sg')) e f = p * expect d1 f + (1-p) * expect d2 f prob p = choose p (return True) (return False) freqs [] = error "Empty cases list" freqs [(_,a)] = return a freqs ((w1,a1):as) = choose w1 (return a1) (freqs $ map (\(w,a) -> (w/(1-w1), a)) as) mean d = expect d id disp d = expect d (\x -> (x-m)^2) where m = mean d probability f d = expect d (\x -> if f x then 1 else 0)

This suffices for the original problem.


data Light = Red | Green | Yellow data Driver = Cautious | Normal | Aggressive data Action = Drive | DontDrive drive p = choose p (return Drive) (return DontDrive) _ Cautious Normal Aggressive Cautious Normal Aggressive `actOn` `actOn` `actOn` `actOn` `actOn` `actOn` `actOn` Green Yellow Yellow Yellow Red Red Red = = = = = = = drive drive drive drive drive drive drive 1.0 0.1 0.2 0.9 0.0 0.1 0.3

Drive `collision` Drive = prob 0.3 _ `collision` _ = prob 0.0 driver = freqs [(0.2, Cautious), (0.6, Normal), (0.2, Aggressive)] simulate d1 d2 light = do a1 <- d1 `actOn` light a2 <- d2 `actOn` light a1 `collision` a2

simulateOverDrivers light = do d1 <- driver d2 <- driver simulate d1 d2 light probCollisionOnRed = probability (==True) (simulateOverDrivers Red) probCollisionOfTwoAggressiveOnYellow = probability (==True) (simulate Aggressive Aggressive Yellow)

Now, once again, the fact that we noticed a monad in the problem allowed us to express its solution very closely to the problem domain.

Generic monadic computations and monad combinators


We've seen several examples of problems that are well expressed in terms of monads: so, one of the advantages of monads is their terminology that allows to compactly describe certain problems and computation patterns. But the sole terminology is not a big enough reason to introduce a new abstraction into the language. In order for an abstraction to be useful we need operations that actually use it as an abstraction, t.i. as a black box. I'm talking about operations that serve the same purpose regardless of which monad they are applied to, t.i., operations polymorphic with respect to the monad type: op :: (Monad m) => ... m .... It is not obvious that such operations do exist at all and that they can be useful: all the monads we spoke about were completely different and served different purposes. However, they do.

Simplest monadic combinators


The first situation where we used different monads in the same way is the do-syntax. It's hard to call it an 'operation' but it nonetheless works the same way for all monads and is equally useful for all of them. Also, our programs often contained code fragments like this one:
do a <- foo b <- bar return (f a b) -- foo :: m a -- bar :: m b -- f :: a -> b -> m c

Why not write simply return (f foo bar)? Alas, this won't typecheck. But this pattern occurs extremely frequently and it makes sense to abstract it.
liftM :: (a -> b) -> (m a -> m b) liftM f ma = do a <- ma ; return (f a) liftM2 :: (a -> b -> c) -> m a -> m b -> m c liftM2 f ma mb = do a <- ma; b <- mb; return (f a b)

etc.

These functions liftM, liftM2 etc. are implemented in the standard library, in Control.Monad. Their somewhat obscure names mean 'lift function into monad, so that it operates not on simple values but on monadic values but does the same thing as before'. Now we can write things like this: (this code fragment reads a string from the keyboard and returns it converted to upper case)
readAndUpper :: IO String readAndUpper = liftM (map toUpper) getLine

Or like this (in the Dist monad):


twoDrivers :: Dist (Driver,Driver) twoDrivers = liftM2 (,) driver driver

Now let us imagine that we are going to print a list of values: perform putStrLn for each value.
map putStrLn values

Alas, this is not a solution. Look at the types:


values :: [String] putStrLn :: String -> IO () map putStrLn values :: [IO ()]

So, we get a list of values of type IO (), that is, a list of actions. But Haskell is a lazy language and elements of the list are not forced and remain unevaluated calls of putStrLn. Now, to perform this list of actions we have to perform each of them in turn.
doIOs [] = return [] doIOs (a:as) = do a ; doIOs as doIOs (map putStrLn values)

If we start the interpreter and look at the type of doIOs, we see that it is doIOs: (Monad m) => [m a] -> m [a]. It has no mention of the IO monad and is monadpolymorphic! Analogously, the IO word should be removed from the name of the function; we obtain the standard function
sequence :: Monad m => [m a] -> m [a]

and a similar function


mapM :: Monad m => (a -> m b) -> [a] -> m [b] mapM f as = sequence (map f as)

The standard library defines some more counterparts of list functions and control structures for monads:

when :: Monad m => Bool -> m () -> m () when b m = if b then m else return () replicateM :: Monad m => Int -> m a -> m [a] foldM :: Monad m => (a -> b -> m a) -> a -> [b] -> m a

and some more. For the case where a computation is important solely for its side-effects but not the return value these operations have counterparts with an underscore:
mapM_ :: Monad m => (a -> m b) -> [a] -> m () sequence_ :: Monad m => [m a] -> m ()

etc.

Why do we need this genericity?


It is not yet obvious what applications there are for the combinators described above (except for liftM) apart from structuring actions in the IO monad. Let's look at some examples:

Apart from IO, the monad of 'computations with mutable world' they may be used in the similar State monad, the monad of 'computations with mutable state' for the same purposes. For example, to collect distinct elements of a list we can traverse it with mapM insert, where insert has type (Ord a) => State (BinaryTree a) () and BinaryTree is the type of balanced trees:
(Ord a) => [a] -> BinaryTree a = s' = runState (mapM_ insert) emptyBinaryTree = do t <- get put (insertBinTree a t) return ()

collectDistinct :: collectDistinct as where (s', _) insert a

Let us generate distribution of the sum of 4 numbers uniformly distributed in intervals of 1 to 1, 2, 3 and 4 correspondingly:

uniform a = freqs [(1.0/fromInteger a, i) | i <- [1..a]] sumDist = foldM (\s a -> do da <- uniform a return (s + da)) (return 0) [1..4]

Let us create a parser that parses 5 numbers separated by space:

fiveNumbers = replicateM 5 ((zeroOrMore (char ' ')) >> readNumber)

The functions that end in an underscore and return m () deserve special attention. They seem to not perform any computations (if they would, these computations would have a more senseful result than (), right?); these functions should be used only with monads that have effects. These effects include not only side-effects of the IO monad; presence of effects should be interpreted as

sensibility of the >> operator and of do without <- (sensibility of sequences like do foo; bar; baz). Seems like this property is impossible to express formally, but one can say that the List and Maybe monads don't possess it, whereas IO, State and Parser do. For example, the sequence do [1, 2, 3]; [5, 6] does not make sense: its result is [5, 6, 5, 6, 5, 6] but the value [1, 2, 3] was just thrown away and only its length was used. It's hard to imagine a situation where such use of the List monad would be intended; thus, in this monad the functions mapM_, sequence_, replicateM_ and their kin are useless. For the Maybe monad, analogously, this functions will only use the presence or absence of a value but not the value itself; it's hard to imagine where usage of Maybe in such a situation would be preferable to a simple Bool. Now let us consider a less trivial example where we'll create our own monad-polymorphic combinator and touch the topic of monad transformers. Actually, in most cases monad-polymorphic combinators either implement an 'imperative' traversal of some structure or use the monad's implementation of fail for custom error handling. For instance, the interpreter monad of some language may implement fail with printing the error stack trace. In this case it makes sense to make implement potentially failing operations as monad-polymorphic and use fail to report errors.

An example: SAX
Assume that we are writing a program for processing some tree structures: for example, XML documents. Such programs often use tree traversing operations: printing the tree to screen, serialization to a stream, counting nodes satisfying a predicate, getting a list of all nodes or computing an aggregate function of these nodes. Most probably we'll need a function that traverses the tree and performs some action on each node. However, if the action will be given only the node content, it won't get any information on the tree structure, so it's better to use the SAX model (Simple API for XML, supported by most of contemporary XML parsers) and feed the handler with subtree entry and exit events.
-- Tree datatype data Node = Node {tag :: String, attributes :: [(String,String)]} data Tree = Tree Node [Tree] data NodeEvent = Enter Node | Leave Node

Then the traversal function will look like walk :: (NodeEvent -> action) -> Tree > result. But in a typical SAX implementation the handler doesn't return anything and all the handling and computation work falls on the shoulders of the handler. We'll follow this way: walk :: (NodeEvent -> actionWithoutResult) -> Tree -> noResult. What can be used as an action? Of course, a pure function won't fit because its result won't be included in the answer anyway, and will be dismissed. It makes sense to use a monadic action with an effectful monad, like IO, State etc.

Thus, we get walk :: Monad m => (NodeEvent -> m ()) -> Tree -> m (), which traverses the tree inside an effectful monad m, performing a monadic action for each entry or exit event. Let us proceed to the implementation:
walk :: (Monad m) => (NodeEvent -> m a) -> Tree -> m () walk f (Tree n ts) = do f (Enter n) mapM_ (walk f) ts f (Leave n) return ()

Now let us use this function to implement indented pretty-printing the tree to XML. Printing indented text has two aspects:

There are side effects, t.i. we'll have to use the IO monad There is a mutable state passed around, namely the current indent level

The first aspect forces us to use the IO monad while the second one makes us think og State, but neither monad fits by its own. Let us implement our own simple monad, the monad of 'indented output':
newtype IndentIO a = IndentIO { runWithIndent :: Int -> IO (Int, a) } instance Monad IndentIO where return a = IndentIO $ \i -> return (i, a) (IndentIO r) >>= f = IndentIO r' where r' i = do (i', a) <- r i runWithIndent (f a) i'

In this monad a value is a function that receives the current indent level, performs an action with side-effects and returns the new indent level. The implementation is very similar to implementation of State but passing the state occurs inside the IO monad. Here are some functions that we'll definitely need:
justIO action = IndentIO $ \i -> do action ; return (i,()) -- Performs an action without changing the indent indentMore = IndentIO $ \i -> return (i+4, ()) -- Increases indent without performing any action indentLess = IndentIO $ \i -> return (i-4, ()) -- Decreases indent without performing any action getIndent = IndentIO $ \i -> return (i, i) -- Returns the current indent without performing any action printIndent s = do i <- getIndent -- Prints a line with indent justIO $ putStrLn (replicate i ' ' ++ s)

Now we can use these functions to solve the original problem:


-- Generates a string of the form <tag atr1="v1" atr2="v2"> showEnter (Node tag ats) = "<"++tag++ concat[" "++n++"=\""++v++"\"" | (n,v) <- ats] ++">"

-- Generates a string of the form </tag> showLeave (Node tag _) = "</"++tag++">" printTree = walk p where p (Enter node) = do printIndent (showEnter node) indentMore p (Leave node) = do indentLess printIndent (showLeave node)

This trick, namely combining several monads in one, is needed rather often, and it's inconvenient to write a new monad for each such case. This problem is addressed by monad transformers but this topic is beyond the scope of this text. However, the example with IndentIO is a good illustration of how monad transformers work and actually is a special case of them, specialized for two particular monads. You can get more familiar with monad transformers in [B.7] and [B.11]. The walk function and State monad can also be used to count the number of nodes satisfying a predicate:
countNodes p tree = execState (walk (\e -> when (counts e) (modify (1+))) tree) 0 where counts (Enter n) = p n counts _ = False

In this code the handler is represented by a function that increases the state variable by 1 if it enters a node satisfying the predicate. The result of traversal is a value of type State Int (), t.i., it is not yet the answer but just a function ready to compute it, if passed an initial value. This is done by calling execState with a second argument of 0 (start counting from zero).

Conclusion
We've looked at several 'facets' of monads: we inferred the conception itself as a generalization of some imperative idioms, we saw how some non-trivial practical problems are formulated and solved in terms of monads and we also saw the usefulness of the monad abstraction and which operations may exploit this abstraction. As we progressed, we saw that monads are most commonly used for two different purposes: structuring the control flow and describing imperative effectful computations (IO, State, IndentIO), and structuring data flow (Maybe, List, Dist). Some monads belong to both classes (Parser). Monad-polymorphic operations are almost always used with monads of the first class. We've also touched the topic of monad transformers, composite monads that combine properties of some simpler ones, but we didn't study them thoroughly. Also, we've missed another interesting topic: the theoretic grounding for the conception of monads and their formulation in terms of category theory. This topis is especially interesting because monads were initially discovered in category theory and only later Philip Wadler applied them to programming.

I hope that the text has succeeded in its intent to form an intuitive understanding of how monads work and a 'feeling' for them, and that you got interested in studying topics omitted in this article.

Acknowledgements
I would like to thank Denis Moskvin, Ivan Veselov and Oleg Tsarev for valuable comments and offers to early drafts of this text and to Julia Astakhova for those to later versions, for countenance and patience. People from the #haskell IRC-channel also gave valuable comments on monad-polymorphic functions; namely, dons, bd_ and Saizan. Ivan Tarasov, Maxim Taldykin and Artem Shalkhakov helped find some errors in the published version. quicksilver, ski, ddarius and roconnor pointed me to the fact that my section about the IO monad was incorrect, completely changed my understanding of how it works and helped develop a more correct version of the section about it.

References
A. Basics of Haskell 1. http://darcs.haskell.org/yaht/yaht.pdf - Yet Another Haskell Tutorial, one of the easiest yet largest tutorials 2. http://www.haskell.org/tutorial/ - A gentle introduction to Haskell 3. http://www.haskell.org/haskellwiki/Tutorials - A list of Haskell tutorials B. Other monad tutorials 1. http://citeseer.ist.psu.edu/wadler95monads.html - Philip Wadler's classical article on monads that fired up interest to them 2. http://darcs.haskell.org/yaht/yaht.pdf - Contains a chapter on monads, mostly ones similar to
State

3. http://www.haskell.org/haskellwiki/Monad - An article on haskellwiki 4. http://www.haskell.org/haskellwiki/Tutorials#Using_monads - A list of monad tutorials 5. http://www.haskell.org/haskellwiki/Monad_tutorials_timeline - A timeline of monad tutorials with good annotations 6. http://www.haskell.org/all_about_monads/html/index.html - A big and thorough tutorial describing all the standard monads and monad transformers 7. http://book.realworldhaskell.org/beta/monads.html , http://book.realworldhaskell.org/beta/monadcase.html ,

http://book.realworldhaskell.org/beta/monadtrans.html - Chapters of the 'Real World Haskell' book about monads. Very detailed and with practical examples. Mostly about Maybe, State and IO. 8. http://www.haskell.org/haskellwiki/Monads_as_containers 9. http://research.microsoft.com/~simonpj/papers/marktoberdorf/ - Simon Peython Jones' article 'Tackling The Awkward Squad', brilliantly tells about the IO monad 10. http://members.chello.nl/hjgtuyl/tourdemonad.html - A thorough overview of everything monad-related in Haskell's standard library 11. http://spbhug.folding-maps.org/wiki/MonadTransformers - Mikhail Mitrofanov's presentation about monad transformers (in Russian) C. Scientific articles on monads 1. http://books.google.com/books?hl=en&lr=&id=AiMwYZsTGkC&oi=fnd&pg=PA113&ots=prga1Pri15&sig=3Z5qGXfwK1AWI6RKZ5Jc2ubyeTg - Philip Wadler, 'How to replace failure by a list of successes' 2. http://okmij.org/ftp/Computation/monads.html - Oleg Kiselyov's articles on monads, including the monad of statistical experiments (Monte-Carlo method) and the monad of logical inference 3. http://www.cs.nott.ac.uk/~gmh//monparsing.ps - Graham Hutton, Erik Meijer - Monadic Parser Combinators 4. http://www.randomhacks.net/darcs/probability-monads/probability-monads.pdf - Eric Kidd, 'Build your own probability monads' - an article with several interesting variations on the topic of probability monads 5. http://luqui.org/blog/archives/2008/03/29/io-monad-the-continuation-presentation/ A blog post by Luke Palmer that explains how the IO monad might be implemented with continuations D. Other 1. http://www.amazon.com/Expert-F-Experts-Voice-Net/dp/1590598504 - The book 'Expert F#', also a brilliant introduction to functional programming in general MonadsEn (last edited 2011-05-20 20:40:17 by max ulidtko)