Write Yourself a Scheme in 48 Hours

A Haskell Tutorial

By Jonathan Tang

Download in .tar.gz format
Download in .zip format

Contents

  1. Overview
  2. First Steps: Compiling and running
  3. Parsing
    1. A Simple Parser
    2. Whitespace
    3. Literal Numbers and Strings: Return Values
    4. Lists, Dotted Lists, and Quoted Data: Recursive Parsers
  4. Evaluation, Part 1
    1. Displaying Values: Show and Typeclasses
    2. Evaluating Primitive Values: Pattern Matching
    3. Evaluating Primitive Functions: First-class Functions
  5. Intermezzo: Error Checking & Exceptions
  6. Evaluation, Part 2
    1. Conditionals and Additional Primitives: Pattern Matching 2
    2. Equal? and Weak Typing: Heterogenous Lists
  7. Building a REPL: Basic I/O
  8. Adding Variables and Assignment: Mutable State in Haskell
  9. Defining Scheme Functions: Closures and Environments
  10. Creating IO Primitives: File I/O
  11. Towards a Standard Library: Fold and Unfold
  12. Conclusion & Further Resources

4.1 Beginning the Evaluator

Currently, we've just been printing out whether or not we recognize the given program fragment. We're about to take the first steps towards a working Scheme interpreter: assigning values to program fragments. We'll be starting with baby steps, but fairly soon you'll be progressing to doing working computations.

Let's start by telling Haskell how to print out a string representation of the various possible LispVals:

showVal :: LispVal -> String
showVal (String contents) = "\"" ++ contents ++ "\""
showVal (Atom name) = name
showVal (Number contents) = show contents
showVal (Bool True) = "#t"
showVal (Bool False) = "#f"
This is our first real introduction to pattern matching. Pattern matching is a way of destructuring an algebraic data type, selecting a code clause based on its constructor and then binding the components to variables. Any constructor can appear in a pattern; that pattern matches a value if the tag is the same as the value's tag and all subpatterns match their corresponding components. Patterns can be nested arbitrarily deep, with matching proceeding in an inside -> outside, left -> right order. The clauses of a function definition are tried in textual order, until one of the patterns matches. If this is confusing, you'll see some examples of deeply-nested patterns when we get further into the evaluator.

For now, you only need to know that each clause of the above definition matches one of the constructors of LispVal, and the right-hand side tells what to do for a value of that constructor.

The List and DottedList clauses work similarly, but we need to define a helper function unwordsList to convert the contained list into a string:

showVal (List contents) = "(" ++ unwordsList contents ++ ")"
showVal (DottedList head tail) = "(" ++ unwordsList head ++ " . " ++ showVal tail ++ ")"
The unwordsList function works like the Haskell Prelude's
unwords function, which glues together a list of words with spaces. Since we're dealing with a list of LispVals instead of words, we define a function that first converts the LispVals into their string representations and then applies unwords to it:
unwordsList :: [LispVal] -> String
unwordsList = unwords . map showVal
Our definition of unwordsList doesn't include any arguments. This is an example of point-free style: writing definitions purely in terms of function composition and partial application, without regard to individual values or "points". Instead, we define it as the composition of a couple built-in functions. First, we partially-apply map to showVal, which creates a function that takes a list of LispVals and returns a list of their string representations. Haskell functions are curried: this means that a function of two arguments, like map, is really a function that returns a function of one argument. As a result, if you supply only a single argument, you get back a function one argument that you can pass around, compose, and apply later. In this case, we compose it with unwords: map showVal converts a list of LispVals to a list of their string representations, and then unwords joins the result together with spaces.

We used the function show above. This standard Haskell function lets you convert any type that's an instance of the class Show into a string. We'd like to be able to do the same with LispVal, so we make it into a member of the class Show, defining its "show" method as showVal:

instance Show LispVal where show = showVal
A full treatment of typeclasses is beyond the scope of this tutorial; you can find more information in other tutorials and the Haskell 98 report.

Let's try things out by changing our readExpr function so it returns the string representation of the value actually parsed, instead of just "Found value":

readExpr input = case parse parseExpr "lisp" input of
    Left err -> "No match: " ++ show err
    Right val -> "Found " ++ show val
And compile and run...
jdtang@debian:~/haskell_tutorial/code$ ghc -package parsec -o parser listing4.1.hs
jdtang@debian:~/haskell_tutorial/code$ ./parser "(1 2 2)"
Found (1 2 2)
jdtang@debian:~/haskell_tutorial/code$ ./parser "'(1 3 (\"this\" \"one\"))"
Found (quote (1 3 ("this" "one")))

4.2 Beginnings of an evaluator: Primitives

Now, we start with the beginnings of an evaluator. The purpose of an evaluator is to map some "code" data type into some "data" data type, the result of the evaluation. In Lisp, the data types for both code and data are the same, so our evaluator will return a LispVal. Other languages often have more complicated code structures, with a variety of syntactic forms.

Evaluating numbers, strings, booleans, and quoted lists is fairly simple: return the datum itself.

eval :: LispVal -> LispVal
eval val@(String _) = val
eval val@(Number _) = val
eval val@(Bool _) = val
eval (List [Atom "quote", val]) = val
This introduces a new type of pattern. The notation val@(String _) matches against any LispVal that's a string and then binds val to the whole LispVal, and not just the contents of the String constructor. The result has type LispVal instead of type String. The underbar is the "don't care" variable, matching any value yet not binding it to a variable. It can be used in any pattern, but is most useful with @-patterns (where you bind the variable to the whole pattern) and with simple constructor-tests where you're just interested in the tag of the constructor.

The last clause is our first introduction to nested patterns. The type of data contained by List is [LispVal], a list of LispVals. We match that against the specific two-element list [Atom "quote", val], a list where the first element is the symbol "quote" and the second element can be anything. Then we return that second element.

Let's integrate eval into our existing code. Start by changing readExpr back so it returns the expression instead of a string representation of the expression:

readExpr :: String -> LispVal
readExpr input = case parse parseExpr "lisp" input of
    Left err -> String $ "No match: " ++ show err
    Right val -> val
And then change our main function to read an expression, evaluate it, convert it to a string, and print it out. Now that we know about the >>= monad sequencing operator and the function composition operator, let's use them to make this a bit more concise:

main :: IO ()
main = getArgs >>= putStrLn . show . eval . readExpr . (!! 0)

Here, we take the result of the getArgs action (a list of strings) and pass it into the composition of:
  1. Take the first value ((!! 0)). This notation is known as an operator section: it's telling the compiler to partially-apply the list indexing operator to 0, giving you back a function that takes the first element of whatever list it's passed.
  2. Parse it (readExpr)
  3. Evaluate it (eval)
  4. Convert it to a string (show)
  5. Print it (putStrLn)

Compile and run the code the normal way:

debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o eval listing4.2.hs
debian:/home/jdtang/haskell_tutorial/code# ./eval "'atom" 
atom
debian:/home/jdtang/haskell_tutorial/code# ./eval 2
2
debian:/home/jdtang/haskell_tutorial/code# ./eval "\"a string\""
"a string"
debian:/home/jdtang/haskell_tutorial/code# ./eval "(+ 2 2)"

Fail: listing6.hs:83: Non-exhaustive patterns in function eval
We still can't do all that much useful with the program (witness the failed (+ 2 2) call), but the basic skeleton is in place. Soon, we'll be extending it with some functions to make it useful.

4.3 Adding basic primitives

Next, we'll improve our Scheme so we can use it as a simple calculator. It's still not yet a "programming language", but it's getting close.

Begin by adding a clause to eval to handle function application. Remember that all clauses of a function definition must be placed together and are evaluated in textual order, so this should go after the other eval clauses:

eval (List (Atom func : args)) = apply func $ map eval args
This is another nested pattern, but this time we match against the cons operator ":" instead of a literal list. Lists in Haskell are really syntactic sugar for a change of cons applications and the empty list: [1, 2, 3, 4] = 1:(2:(3:(4:[]))). By pattern-matching against cons itself instead of a literal list, we're saying "give me the rest of the list" instead of "give me the second element of the list". For example, if we passed (+ 2 2) to eval, func would be bound to "+" and args would be bound to [Number 2, Number 2].

The rest of the clause consists of a couple functions we've seen before and one we haven't defined yet. We have to recursively evaluate each argument, so we map eval over the args. This is what lets us write compound expressions like (+ 2 (- 3 1) (* 5 4)). Then we take the resulting list of evaluated arguments, and pass it and the original function to apply:

apply :: String -> [LispVal] -> LispVal
apply func args = maybe (Bool False) ($ args) $ lookup func primitives
The built-in function
lookup looks up a key (its first argument) in a list of pairs. However, lookup will fail if no pair in the list contains the matching key. To express this, it returns an instance of the built-in type Maybe. We use the function maybe to specify what to do in case of either success or failure. If the function isn't found, we return a Bool False value, equivalent to #f (we'll add more robust error-checking later). If it is found, we apply it to the arguments using ($ args), an operator section of the function application operator.

Next, we define the list of primitives that we support:

primitives :: [(String, [LispVal] -> LispVal)]
primitives = [("+", numericBinop (+)),
              ("-", numericBinop (-)),
              ("*", numericBinop (*)),
              ("/", numericBinop div),
              ("mod", numericBinop mod),
              ("quotient", numericBinop quot),
              ("remainder", numericBinop rem)]
Look at the type of primitives. It is a list of pairs, just like lookup expects, but the values of the pairs are functions from [LispVal] to LispVal. In Haskell, you can easily store functions in other data structures, though the functions must all have the same type.

Also, the functions that we store are themselves the result of a function, numericBinop, which we haven't defined yet. This takes a primitive Haskell function (often an operator section) and wraps it with code to unpack an argument list, apply the function to it, and wrap the result up in our Number constructor.

numericBinop :: (Integer -> Integer -> Integer) -> [LispVal] -> LispVal
numericBinop op params = Number $ foldl1 op $ map unpackNum params

unpackNum :: LispVal -> Integer
unpackNum (Number n) = n
unpackNum (String n) = let parsed = reads n in 
                          if null parsed 
                            then 0
                            else fst $ parsed !! 0
unpackNum (List [n]) = unpackNum n
unpackNum _ = 0
As with R5RS Scheme, we don't limit ourselves to only two arguments. Our numeric operations can work on a list of any length, so (+ 2 3 4) = 2 + 3 + 4, and (- 15 5 3 2) = 15 - 5 - 3 - 2. We use the built-in function foldl1 to do this. It essentially changes every cons operator in the list to the binary function we supply, op.

Unlike R5RS Scheme, we're implementing a form of weak typing. That means that if a value can be interpreted as a number (like the string "2"), we'll use it as one, even if it's tagged as a string. We do this by adding a couple extra clauses to unpackNum. If we're unpacking a string, attempt to parse it with Haskell's built-in reads function, which returns a list of pairs of (parsed value, original value).

For lists, we pattern-match against the one-element list and try to unpack that. Anything else falls through to the next case.

If we can't parse the number, for any reason, we'll return 0 for now. We'll fix this shortly so that it signals an error.

Compile and run this the normal way. Note how we get nested expressions "for free" because we call eval on each of the arguments of a function:

debian:/home/jdtang/haskell_tutorial/code# ghc -package parsec -o eval listing7.hs
debian:/home/jdtang/haskell_tutorial/code# ./eval "(+ 2 2)" 4
debian:/home/jdtang/haskell_tutorial/code# ./eval "(+ 2 (-4 1))"
2
debian:/home/jdtang/haskell_tutorial/code# ./eval "(+ 2 (- 4 1))"
5
debian:/home/jdtang/haskell_tutorial/code# ./eval "(- (+ 4 6 3) 3 5 2)"
3

Exercises

  1. Add primitives to perform the various type-testing functions of R5RS: symbol?, string?, number?, etc.
  2. Change unpackNum so that it always returns 0 if the value is not a number, even if it's a string or list that could be parsed as a number.
  3. Add the symbol-handling functions from R5RS. A symbol is what we've been calling an Atom in our data constructors
Next: Error Checking & Exceptions