Write Yourself a Scheme in 48 Hours

A Haskell Tutorial

By Jonathan Tang

Download in .tar.gz format
Download in .zip format

Contents

  1. Overview
  2. First Steps: Compiling and running
  3. Parsing
    1. A Simple Parser
    2. Whitespace
    3. Literal Numbers and Strings: Return Values
    4. Lists, Dotted Lists, and Quoted Data: Recursive Parsers
  4. Evaluation, Part 1
    1. Displaying Values: Show and Typeclasses
    2. Evaluating Primitive Values: Pattern Matching
    3. Evaluating Primitive Functions: First-class Functions
  5. Intermezzo: Error Checking & Exceptions
  6. Evaluation, Part 2
    1. Additional Primitives: Partial Application
    2. Conditionals: Pattern Matching 2
    3. List Primitives: car, cdr, and cons
    4. Equal? and Weak Typing: Heterogenous Lists
  7. Building a REPL: Basic I/O
  8. Adding Variables and Assignment: Mutable State in Haskell
  9. Defining Scheme Functions: Closures and Environments
  10. Creating IO Primitives: File I/O
  11. Towards a Standard Library: Fold and Unfold
  12. Conclusion & Further Resources

2. First steps

First, you'll need to install GHC. On Linux, it's often pre-installed or available via apt-get or yum. It's also downloadable from http://www.haskell.org/ghc/. A binary package is probably easiest, unless you really know what you're doing. It should download and install like any other software package. This tutorial was developed on Linux, but everything should also work on Windows as long as you know how to use the DOS command line.

For UNIX (or Windows Emacs) users, there is a pretty good Emacs mode, including syntax highlighting and automatic indentation. Windows users can use Notepad or any other text editor: Haskell syntax is fairly Notepad-friendly, though you have to be careful with the indentation.

Now, it's time for your first Haskell program. This program will read a name off the command line and then print a greeting. Create a file ending in ".hs" and type the following text:

module Main where
import System.Environment

main :: IO ()
main = do args <- getArgs
          putStrLn ("Hello, " ++ args !! 0)

Let's go through this code. The first two lines specify that we'll be creating a module named Main that imports the System module. Every Haskell program begins with an action called "main" in a module named "Main". That module may import others, but it must be present for the compiler to generate an executable file. Haskell is case-sensitive: module names are always capitalized, declarations always uncapitalized.

The line main :: IO () is a type declaration: it says that the action "main" has type "IO ()". Type declarations in Haskell are optional: the compiler figures them out automatically, and only complains if they differ from what you've specified. In this tutorial, I specify the types of all declarations explicitly, for clarity. If you're following along at home, you may want to omit them, because it's less to change as we build our program.

The IO type is an instance of something called a monad, which is a scary name for a not-so-scary concept. Basically, a monad is a way of saying "there's some extra information attached to this value, which most functions don't need to worry about". In this example, the "extra information" is the fact that this action performs IO, and the basic value is nothing, represented as "()". Monadic values are often called "actions", because the easiest way to think about the IO monad is a sequencing of actions that each might affect the outside world.

Haskell is a declarative language: instead of giving the computer a sequence of instructions to carry out, you give it a collection of definitions that tell it how to perform every function it might need. These definitions use various compositions of actions and functions The compiler figures out an execution path that puts everything together.

To write one of these definitions, you set it up as an equation. The left hand side defines a name, and optionally one or more patterns (explained later) that will bind variables. The right hand side defines some composition of other definitions that tells the computer what to do when it encounters the name. These equations behave just like ordinary equations in algebra: you can always substitute the right hand side for the left within the text of the program, and it'll evaluate to the same value. Called "referential transparency", this property makes it significantly easier to reason about Haskell programs than other languages.

How will we define our main action? We know that it must be an IO () action, and that we want it to read the command line args and print some output. There are two ways to create an IO action:

  1. Lift an ordinary value into the IO monad, using the return function.
  2. Combine two existing IO actions.
Since we want to do two things, we'll take the second approach. The built-in action getArgs reads the command-line arguments and stores them in a list of strings. The built-in function putStrLn takes a string and writes it to the console.

To combine them, we use a do-block. A do-block consists of a series of lines, all lined up with the the first non-whitespace character after the do. Each line can have one of two forms:

  1. name <- action
  2. action
The first form takes the result of the action and binds it to name. For example, if the type of the action is IO [String] (an IO action returning a list of strings, as with getArgs), then "name" will be bound to the list of strings returned. The second form just executes the action, sequencing it with the previous line through the >> (pronounced "bind") operator. This operator has different semantics for each monad: in the case of the IO monad, it executes the actions sequentially, performing whatever external side-effects that result. Because the semantics of this composition depend upon the particular monad used, you cannot mix actions of different monad types in the same do-block.

Of course, these actions may themselves be the result of functions or complicated expressions. In this example, we first take index 0 of the argument list (args !! 0), concatenate it onto the end of the string "Hello, " ("Hello, " ++), and finally pass that to putStrLn for IO sequencing. Strings are lists of characters in Haskell, so you can use any of the list functions and operators on them. A full table of the standard operators and their precedences follows:
Operator(s) Precedence Associativity Description
. 9 Right Function composition
!! 9 Left List indexing
^, ^^, ** 8 Right Exponentiation (integer, fractional, and floating-point)
*, / 7 Left Multiplication, Division
+, - 6 Left Addition, Subtraction
: 5 Right Cons (list construction)
++ 5 Right List Concatenation
`elem`, `notElem` 4 Left List Membership
==, /=, <, <=, >=,> 4 Left Equals, Not-equals, and other relation operators
&& 3 Right Logical And
|| 2 Right Logical Or
>>, >>= 1 Left Monadic Bind, Monadic Bind (piping value to next function)
=<< 1 Right Reverse Monadic Bind (same as above, but arguments reversed)
$ 0 Right Infix Function Application (same as "f x",
but right-associative instead of left)

To compile and run the program, try something like this:

debian:/home/jdtang/haskell_tutorial/code# ghc -o hello_you listing2.hs
debian:/home/jdtang/haskell_tutorial/code# ./hello_you Jonathan
Hello, Jonathan
The -o option specifies the name of the executable you want to create, and then you just specify the name of the Haskell source file.

Exercises

  1. Change the program so it reads two arguments from the command line, and prints out a message using both of them
  2. Change the program so it performs a simple arithmetic operation on the two arguments and prints out the result. You can use read to convert a string to a number, and show to convert a number back into a string. Play around with different operations.
  3. getLine is an IO action that reads a line from the console and returns it as a string. Change the program so it prompts for a name, reads the name, and then prints that instead of the command line value

Next - Parsing