A Crash Course In Algebraic Types

io7m | single-page | multi-page | archive (zip, signature)

1. Overview
2. Why?
3. What?
4. Shapes
- 4.1. Declarations
- 4.2. Matching
5. The Type Zoo
6. Arithmetic expression interpreter
- 6.1. Specification
- 6.2. Implementation

Overview

This document attempts to describe so-called algebraic data types, and the language construct - pattern matching - used to manipulate them in functional languages.

The document assumes very little in the way of prior knowledge of type systems and attempts to be light on mathematics and theory. The document is aimed at people with a passing interest in static type systems, and also at authors of new programming languages. It is also likely of interest to users of existing statically typed languages who are curious as to what newer languages have to offer.

The example code is written in both Java and Haskell, but the types described are present in many other systems and usually only differ in terms of syntax as opposed to theory.

Why?

2.1

Overview

In this example, a small arithmetic expression interpreter is developed. This example highlights the problems caused by the lack of algebraic data types and pattern matching, and also highlights the various complicated language features and methodologies that arise to deal with a lack of them.

2.2

Specification

An arithmetic expression is one of the following:

2.2.1. Arithmetic expression specification

An integer constant.
The sum of two arithmetic expressions.
The product of two arithmetic expressions.
The difference of two arithmetic expressions.

Evaluating an arithmetic expression requires the following steps:

2.2.2. Evaluation specification

If the expression is a constant, return the constant.
If the expression is an addition, evaluate the two subexpressions and then return the sum of their results.
If the expression is a multiplication, evaluate the two subexpressions and then return the product of their results.
If the expression is a subtraction, evaluate the two subexpressions and then return the difference of their results.

2.3

Implementation

The simplest (and possibly only) way to represent the type of an arithmetic expression in Java is as a set of classes extending an abstract base class. The ArithmeticExpression type demonstrates this:

2.3.1. ArithmeticExpression

package com.io7m.example.ccatpm;

abstract class ArithmeticExpression
{
  /**
   * An integer constant.
   */

  public final static class ConstantExpression extends ArithmeticExpression
  {
    private final int value;

    @SuppressWarnings("synthetic-access") ConstantExpression(
      final int value)
    {
      super(ExpressionType.EXP_CONSTANT);
      this.value = value;
    }

    public final int getValue()
    {
      return this.value;
    }
  }

  static enum ExpressionType
  {
    EXP_CONSTANT,
    EXP_PLUS,
    EXP_MULTIPLY,
    EXP_SUBTRACT
  }

  /**
   * The product of two arithmetic expressions.
   */

  public final static class MultiplyExpression extends ArithmeticExpression
  {
    private final ArithmeticExpression e_left;
    private final ArithmeticExpression e_right;

    @SuppressWarnings("synthetic-access") MultiplyExpression(
      final ArithmeticExpression e_left,
      final ArithmeticExpression e_right)
    {
      super(ExpressionType.EXP_MULTIPLY);
      this.e_left = e_left;
      this.e_right = e_right;
    }

    public final ArithmeticExpression getLeft()
    {
      return this.e_left;
    }

    public final ArithmeticExpression getRight()
    {
      return this.e_right;
    }
  }

  /**
   * The sum of two arithmetic expressions.
   */

  public final static class PlusExpression extends ArithmeticExpression
  {
    private final ArithmeticExpression e_left;
    private final ArithmeticExpression e_right;

    @SuppressWarnings("synthetic-access") PlusExpression(
      final ArithmeticExpression e_left,
      final ArithmeticExpression e_right)
    {
      super(ExpressionType.EXP_PLUS);
      this.e_left = e_left;
      this.e_right = e_right;
    }

    public final ArithmeticExpression getLeft()
    {
      return this.e_left;
    }

    public final ArithmeticExpression getRight()
    {
      return this.e_right;
    }
  }

  /**
   * The difference of two arithmetic expressions.
   */

  public final static class SubtractExpression extends ArithmeticExpression
  {
    private final ArithmeticExpression e_left;
    private final ArithmeticExpression e_right;

    @SuppressWarnings("synthetic-access") SubtractExpression(
      final ArithmeticExpression e_left,
      final ArithmeticExpression e_right)
    {
      super(ExpressionType.EXP_SUBTRACT);
      this.e_left = e_left;
      this.e_right = e_right;
    }

    public final ArithmeticExpression getLeft()
    {
      return this.e_left;
    }

    public final ArithmeticExpression getRight()
    {
      return this.e_right;
    }
  }

  private final ExpressionType type;

  private ArithmeticExpression(
    final ExpressionType type)
  {
    this.type = type;
  }

  public final ExpressionType getType()
  {
    return this.type;
  }
}

This example shows a number of immediate problems:

2.3.2. Implementation issues

There is a fixed set of types of expressions that will be evaluated, but the language cannot know in advance how many classes will really extend the ArithmeticExpression class. Essentially, the type wants to be "closed", but the language requires that it be "open". The best that can be done is to make the single constructor of the class private, and the class itself have default visibility (in other words, only visible to classes in the same package).
Each subclass of ArithmeticExpression is required to carry around a value of type ExpressionType in order to distinguish the values of the classes later. It's possible to use the instanceof keyword to distinguish classes, but this obviously cannot be used in switch statement, meaning that the compiler cannot help the programmer to ensure that all cases are covered. It's also possible for the programmer that specifies the value of ExpressionType for each class to accidentally re-use values, leading to hard-to-locate bugs later.
Java has nullable references. Everywhere a non-primitive type appears is a possible source of runtime null pointer exceptions.
The definitions are annoyingly verbose: The code is already over 100 lines from a four-line specification. It's not obvious, without a lot of tedious scanning, that the type even represents the original specification at all.
It's possible to create circular structures (by accidentally assigning the wrong values in the constructors of arithmetic expressions). Functions written to work with expressions that don't assume that expressions can be circular (as most won't) will run forever.

The evaluation function is as follows:

2.3.3. Interpreter

package com.io7m.example.ccatpm;

import com.io7m.example.ccatpm.ArithmeticExpression.ConstantExpression;
import com.io7m.example.ccatpm.ArithmeticExpression.MultiplyExpression;
import com.io7m.example.ccatpm.ArithmeticExpression.PlusExpression;
import com.io7m.example.ccatpm.ArithmeticExpression.SubtractExpression;

public final class Interpreter
{
  public static int run(
    final ArithmeticExpression expr)
  {
    switch (expr.getType()) {
      case EXP_CONSTANT:
      {
        final ConstantExpression actual = (ConstantExpression) expr;
        return actual.getValue();
      }
      case EXP_MULTIPLY:
      {
        final MultiplyExpression actual = (MultiplyExpression) expr;
        final int left = Interpreter.run(actual.getLeft());
        final int right = Interpreter.run(actual.getRight());
        return left * right;
      }
      case EXP_PLUS:
      {
        final PlusExpression actual = (PlusExpression) expr;
        final int left = Interpreter.run(actual.getLeft());
        final int right = Interpreter.run(actual.getRight());
        return left + right;
      }
      case EXP_SUBTRACT:
      {
        final SubtractExpression actual = (SubtractExpression) expr;
        final int left = Interpreter.run(actual.getLeft());
        final int right = Interpreter.run(actual.getRight());
        return left - right;
      }
      default:
        throw new AssertionError("unreachable!");
    }
  }

  private Interpreter()
  {

  }
}

Again, there are obvious problems:

2.3.4. Implementation issues

The code contains one dangerous cast per expression type. The language cannot be informed of the formal link between the enumeration value and the associated class.
The code assumes expressions, or the contents of expressions, won't be null.
The Java compiler can't determine that all cases of the enumeration type are present. It requires an unreachable default clause.

2.4

Visitor Implementation

A second attempt at the implementation, using the visitor pattern, is even larger than the original, and borders on obfuscated with regards to the flow of execution. First, the interface that expressions must implement:

2.4.1. Expression

package com.io7m.example.ccatpm.visitor;

interface Expression
{
  int accept(ExpressionVisitor visitor);
}

Then, the type of binary operations [0]:

2.4.2. Binary

package com.io7m.example.ccatpm.visitor;

abstract class Binary
{
  private final Expression left;
  private final Expression right;

  public Binary(
    final Expression left,
    final Expression right)
  {
    this.left = left;
    this.right = right;
  }

  public final Expression getLeft()
  {
    return this.left;
  }

  public final Expression getRight()
  {
    return this.right;
  }
}

Then, the types of each arithmetic expression case:

2.4.3. Constant

package com.io7m.example.ccatpm.visitor;

final class Constant implements Expression
{
  private final int value;

  public Constant(
    final int value)
  {
    this.value = value;
  }

  public int getValue()
  {
    return this.value;
  }

  @Override public int accept(
    final ExpressionVisitor visitor)
  {
    return visitor.visit(this);
  }
}

2.4.4. Add

package com.io7m.example.ccatpm.visitor;

final class Add extends Binary implements Expression
{
  public Add(
    final Expression left,
    final Expression right)
  {
    super(left, right);
  }

  @Override public int accept(
    final ExpressionVisitor visitor)
  {
    return visitor.visit(this);
  }
}

2.4.5. Multiply

package com.io7m.example.ccatpm.visitor;

final class Multiply extends Binary implements Expression
{
  public Multiply(
    final Expression left,
    final Expression right)
  {
    super(left, right);
  }

  @Override public int accept(
    final ExpressionVisitor visitor)
  {
    return visitor.visit(this);
  }
}

2.4.6. Subtract

package com.io7m.example.ccatpm.visitor;

final class Subtract extends Binary implements Expression
{
  public Subtract(
    final Expression left,
    final Expression right)
  {
    super(left, right);
  }

  @Override public int accept(
    final ExpressionVisitor visitor)
  {
    return visitor.visit(this);
  }
}

Then, the type of arithmetic expression visitors:

2.4.7. ExpressionVisitor

package com.io7m.example.ccatpm.visitor;

interface ExpressionVisitor
{
  int visit(Add add);
  int visit(Constant constant);
  int visit(Multiply multiply);
  int visit(Subtract subtract);
}

Then, finally, the evaluation function:

2.4.8. Interpreter

package com.io7m.example.ccatpm.visitor;

public final class Interpreter
{
  public static int evaluate(
    final Expression expression)
  {
    return expression.accept(new ExpressionVisitor() {
      @Override public int visit(
        final Add add)
      {
        return Interpreter.evaluate(add.getLeft())
          + Interpreter.evaluate(add.getRight());
      }

      @Override public int visit(
        final Constant constant)
      {
        return constant.getValue();
      }

      @Override public int visit(
        final Multiply multiply)
      {
        return Interpreter.evaluate(multiply.getLeft())
          * Interpreter.evaluate(multiply.getRight());
      }

      @Override public int visit(
        final Subtract subtract)
      {
        return Interpreter.evaluate(subtract.getLeft())
          - Interpreter.evaluate(subtract.getRight());
      }
    });
  }
}

The implementation still has problems:

2.4.9. Implementation issues

The flow of execution is obfuscated. The maintainer of this code has to be familiar with the visitor pattern in order to understand it, as opposed to just being familiar with ordinary recursion.
Java has nullable references. Everywhere a non-primitive type appears is a possible source of runtime null pointer exceptions.
The definitions are even more verbose: The code is almost 150 lines. There is a lot of boilerplate code; each expression type has to implement an identical visit function. In a more complex example (such as the abstract syntax tree of a programming language), this quickly becomes excruciating.
This implementation of the visitor pattern only allows the programmer to write functions that return int. Writing something more general requires even more complicated classes, and possibly even generics.

At the end of this document, an alternative implementation will be given using algebraic types and pattern matching that avoids all of the problems of both implementations, is less than a tenth of the size, and gives strong assurances of safety and correctness.

What?

Algebraic data types are currently not well-known outside of statically typed functional languages such as OCaml, Haskell, Scala, amongst others. The simplest way to think of them appears to be as a sort of enumerated type, where each case of the type can have associated data. Values of the types are inspected via a language construct called pattern matching, which can essentially be thought of as a much more powerful version of the switch or case construct found in languages such as C, Java, Ada, and others. As underwhelming as this sounds, the combination of these types and pattern matching allow for reducing what would be extremely complicated tasks in other languages down to trivial symbol manipulation, and subsume vast parts of the type systems in other languages.

Shapes

4.1. Declarations
4.2. Matching

4.1

Declarations

In keeping with traditional introductory texts on programming languages, the following definitions declare a collection of "shape" types:

4.1.1. Shapes

module Shapes where

data Circle =
  MakeCircle Integer
  deriving Show

data Rectangle =
  MakeRectangle Integer Integer
  deriving Show

data Shape
  = ShapeCircle Circle
  | ShapeRectangle Rectangle
  deriving Show

Each data declaration declares an algebraic data type. The Circle declaration, for example, declares a type Circle with a single data constructor named MakeCircle, which takes a single value of type int as an argument. That is, the programmer calls MakeCircle to create a value of type Circle. It's important to note that the MakeCircle function is the only way to create a value of type Circle - this simple fact alone has far reaching consequences and can be used to express and enforce various program invariants (a point that this document will address later). The definitions also ask Haskell, via the deriving statement, to provide default implementations of the string conversion functions, for the sake of being able to display values in the interpreter for examples in this document - these can be ignored.

The Rectangle declaration is similar, except that its single data constructor MakeRectangle takes two integers as input (representing the width and height of the rectangle).

The last declaration, Shape, declares two data constructors: ShapeCircle and ShapeRectangle. That is, there are two ways to obtain a value of type Shape: either call ShapeCircle with a value of type Circle, or call ShapeRectangle with a value of type Rectangle.

It's possible to see how these declarations work by constructing values in the interpreter and checking their types (the :: notation can be read as "is of type" - "23 :: Integer" means "23 is of type Integer"). The :type command, unsurprisingly, displays the type of the given expression.

4.1.2. Shapes (ghci)

*Shapes> :type MakeCircle 23
MakeCircle 23 :: Circle

*Shapes> :type MakeRectangle 23 11
MakeRectangle 23 11 :: Rectangle

*Shapes> :type ShapeCircle (MakeCircle 23)
ShapeCircle (MakeCircle 23) :: Shape

*Shapes> :type ShapeRectangle (MakeRectangle 23 11)
ShapeRectangle (MakeRectangle 23 11) :: Shape

Note that it's a type error to attempt to pass a Rectangle to the ShapeCircle constructor:

4.1.3. Shapes error (ghci)

*Shapes> :type ShapeCircle (MakeRectangle 23 11)
<interactive>:1:14:
    Couldn't match expected type `Circle' with actual type `Rectangle'
    In the return type of a call of `MakeRectangle'
    In the first argument of `ShapeCircle', namely
      `(MakeRectangle 23 11)'
    In the expression: ShapeCircle (MakeRectangle 23 11)

*Shapes> :type ShapeRectangle (MakeCircle 23)
<interactive>:1:17:
    Couldn't match expected type `Rectangle' with actual type `Circle'
    In the return type of a call of `MakeCircle'
    In the first argument of `ShapeRectangle', namely `(MakeCircle 23)'
    In the expression: ShapeRectangle (MakeCircle 23)

4.2

Matching

Much as values are created via constructors, it's necessary to deconstruct data structures in order to get at the values contained within. This deconstruction is performed via pattern matching. Most languages allow one to, for example, pass an enumerated type into some form of case statement and then perform some action for each one of the individual cases. Pattern matching, however, allows the programmer to perform actions based on the actual structure of the value passed in. As an example, a simple function that takes any value of type Shape and prints the name of the shape [1]:

4.2.1. Shape show

module ShapeShow where

import Shapes

shape_show :: Shape -> IO ()
shape_show s =
  case s of
    ShapeRectangle _ -> print "rectangle"
    ShapeCircle _    -> print "circle"

Unsurprisingly, the case expression attempts to match the input value against each pattern on the left hand side in turn, from top to bottom, stopping at the first pattern that matches (there is no "fall-through" as with switch statements in other languages). This naturally leads to patterns being arranged in decreasing order of specificity (most specific patterns first).

Notice that by pattern matching on a value of type Shape, the programmer learns which constructor was used to construct the value and can perform an action for each. The programmer is also statically prevented from failing to handle a case, to ensure correctness - the compiler warns against patterns that are redundant (unreachable) or that aren't exhaustive [2]:

4.2.2. Shape show (not exhaustive)

module ShapeShowNE where

import Shapes

shape_show_ne :: Shape -> IO ()
shape_show_ne s =
  case s of
    ShapeRectangle _ -> print "rectangle"

4.2.3. Shape show error (not exhaustive)

ShapeShowNE.hs:7:3:
  Warning: Pattern match(es) are non-exhaustive
    In a case alternative: Patterns not matched: ShapeCircle _

4.2.4. Shape show (overlapping)

module ShapeShowOL where

import Shapes

shape_show_ol :: Shape -> IO ()
shape_show_ol s =
  case s of
    ShapeRectangle _ -> print "rectangle"
    ShapeRectangle _ -> print "rectangle"

4.2.5. Shape show error (overlapping)

ShapeShowOL.hs:7:3:
  Warning: Pattern match(es) are overlapped
    In a case alternative: ShapeRectangle _ -> ...

The warning about non-exhaustive cases is extremely important with regards to software maintenance: Add another constructor to a data type and the compiler will immediately inform the programmer of every point in the code that now needs to be modified to work correctly.

A similar function that prints the width of any shape:

4.2.6. Shape width

module ShapeWidth where

import Shapes

shape_width :: Shape -> IO ()
shape_width s =
  case s of
    ShapeRectangle (MakeRectangle width _) -> print width
    ShapeCircle (MakeCircle radius)        -> print (2 * radius)

The names used on the left hand side of the case expression are given by the programmer for use in the expressions on the right hand side. That is, the programmer chooses the names that he/she wishes to give to each field he/she is interested in, at the point of use. The _ or wildcard symbol is used to indicate that the programmer doesn't care about a particular value and therefore doesn't want to provide a name. Wildcards match everything. A silly function that prints "Boo!" regardless of whatever's passed in:

4.2.7. Shape boo!

module ShapeBoo where

import Shapes

shape_boo :: Shape -> IO ()
shape_boo s =
  case s of
    _ -> print "Boo!"

Patterns consist of constants [3], variables, and wildcards. An example of a function that compares shapes for equality with slightly more complex patterns:

4.2.8. Shape equals

module ShapeEquals where

import Shapes

shape_equals :: Shape -> Shape -> Bool
shape_equals s t =
  case (s, t) of
    (ShapeCircle    (MakeCircle r0),       ShapeCircle    (MakeCircle r1))       -> r0 == r1
    (ShapeRectangle (MakeRectangle w0 h0), ShapeRectangle (MakeRectangle w1 h1)) -> (w0 == w1) && (h0 == h1)
    (_, _)                                                                       -> False

The Haskell definition of the function written with pattern matching is almost exactly what one would expect given the typical informal mathematical specification of the function:

4.2.9. Shape equals specification

If both shapes are circles, then both shapes are equal iff their radii are equal.
If both shapes are rectangles, then both shapes are equal iff both their widths and heights are equal.
Otherwise, the shapes are not equal.

The Type Zoo

5.1. Overview
5.2. Unit
5.3. Boolean
5.4. Enumeration
5.5. Option
5.6. Choice
5.7. Pairs
5.8. Naturals (inductive)
5.9. Lists
5.10. Trees
5.11. Hiding
5.12. Arrays

5.1

Overview

As stated previously, the combination of algebraic data types and pattern matching is extremely powerful. Whereas languages that lack them have to keep various type system features completely distinct (in both concept and implementation), the language of algebraic types is expressive enough to support a wide range of structures with little or no "magic". Many structures that require explicit language support in other systems can simply be expressed in terms of algebraic types (possibly with extra syntax support).

5.2

Unit

The Unit type is the simplest possible algebraic type. It has one data constructor that takes no values. In impure languages like OCaml, the Unit type is often given as the return value of functions that are only of interest for their side effects, such as print_string (which is of type string -> unit, and simply prints its argument to standard out and returns unit). In Haskell, the Unit type is called () and has a single data constructor, also called () [4].

5.3

Boolean

The Boolean type is an algebraic type with two data constructors:

5.3.1. Boolean

module Boolean where

data Boolean
  = True
  | False
  deriving Show

This definition of Boolean type allows languages to implement the usual if .. then .. else construct in terms of pattern matching, simplifying the language definition and semantics.

5.4

Enumeration

As may or may not be obvious, it's possible to achieve the normal enumeration types present in other languages. The data constructors of the type are simply declared as not taking any values:

5.4.1. Enumeration

module Enumeration where

data Color
  = Red
  | Blue
  | Green
  | Yellow
  deriving Show

5.5

Option

The Option type is the first polymorphic type that has been described so far. The definition of Option is as follows:

5.5.1. Option

module Option where

data Option a
  = Some a
  | None
  deriving Show

The definition introduces a type Option, which is parameterized by a type variable arbitrarily named a. It defines two data constructors, Some and None, the first of which takes a value of type a. The type encodes the concept of an optional value: The programmer is required to pattern match on values of type Option to determine whether or not a value is present.

5.5.2. Option present

module OptionPresent where

import Option

present :: Option a -> IO ()
present o =
  case o of
    Some _ -> print "present"
    None   -> print "not present"

Typically, most programs represent the above as a nullable reference type. This leads to endless bugs involving accidentally dereferencing null references. Programs written using Option types are immune to this class of bugs, as it's statically impossible to get access to a nonexistent value.

Note that as Option is polymorphic, it may be instantiated with any other type, including other Option types:

5.5.3. Option construction

*Option> :type None
None :: Option a

*Option> :type Some
Some :: a -> Option a

*Option> :type Some True
Some True :: Option Bool

*Option> :type Some (23 :: Integer)
Some (23 :: Integer) :: Option Integer

*Option> :type Some (Some (23 :: Integer))
Some (Some (23 :: Integer)) :: Option (Option Integer)

*Option> :type Some None
Some None :: Option (Option a)

5.6

Choice

The Choice type is a logical continuation of the Option type. It represents a choice between values of two types:

5.6.1. Choice

module Choice where

data Choice a b
  = ChoiceLeft a
  | ChoiceRight b
  deriving Show

As with Option, it may be instantiated with any other types:

5.6.2. Choices

*Choice> :type ChoiceLeft
ChoiceLeft :: a -> Choice a b

*Choice> :type ChoiceRight
ChoiceRight :: b -> Choice a b

*Choice> :type ChoiceLeft True
ChoiceLeft True :: Choice Bool b

*Choice> :type ChoiceRight True
ChoiceRight True :: Choice a Bool

Types isomorphic to the Choice type are often used to represent computations that may succeed or fail: On success, a value of type b is returned, and on failure, a value of of type a is returned (usually some sort of error code value) [5].

5.7

Pairs

Where the Choice type represents a choice between two possible values, the Pair type represents a structure where values of two (possibly different) types will be present simultaneously. The Pair type is equivalent to the tuple type in other languages:

5.7.1. Pair

module Pair where

data Pair a b =
  MakePair a b
  deriving Show

Naturally, Pair types can be nested, forming arbitrarily complex structures:

5.7.2. Pair

*Pair> :type MakePair 
MakePair :: a -> b -> Pair a b

*Pair> :type MakePair True
MakePair True :: b -> Pair Bool b

*Pair> :type MakePair True (23 :: Integer)
MakePair True (23 :: Integer) :: Pair Bool Integer

*Pair> :type MakePair (MakePair (23 :: Integer) True) (23 :: Integer)
MakePair (MakePair (23 :: Integer) True) (23 :: Integer)
  :: Pair (Pair Integer Bool) Integer

*Pair> :type MakePair (23 :: Integer) (MakePair (MakePair False True) (MakePair True False))
MakePair (23 :: Integer) (MakePair (MakePair False True) (MakePair True False))
  :: Pair Integer (Pair (Pair Bool Bool) (Pair Bool Bool))

Most languages have shorthand syntax for constructing values of some standard library pair type: "(x, y)".

5.8

Naturals (inductive)

Many functional languages (particularly those that act as proof assistants) use the following definition for natural numbers:

5.8.1. NaturalInd

module NaturalInd where

data Natural
  = Z
  | S Natural
  deriving Show

This definition is taken directly from axioms 1 and 6 of the Peano axioms, and is probably the first type encountered by programmers upon being introduced to most proof assistants. The type is the first recursive type described so far; that is, a type that refers to itself in its own definition.

5.8.2. Natural constructions

*NaturalInd> :type Z
Z :: Natural

*NaturalInd> :type S
S :: Natural -> Natural

-- One
*NaturalInd> :type S Z
S Z :: Natural

-- Two
*NaturalInd> :type S (S Z)
S (S Z) :: Natural

-- Three
*NaturalInd> :type S (S (S Z))
S (S (S Z)) :: Natural

It's also possible to copy the definition of, for example, plus, almost directly:

5.8.3. Natural (inductive) plus

module NaturalIndPlus where

import NaturalInd

plus :: Natural -> Natural -> Natural
plus x y =
  case (x, y) of
    (n, Z)   -> n
    (n, S m) -> S (plus n m)

5.8.4. Natural addition

-- 0 + 0 = 0
ghci> plus Z Z
Z

-- 0 + 1 = 1
ghci> plus Z (S Z)
S Z

-- 1 + 1 = 2
ghci> plus (S Z) (S Z)
S (S Z)

-- 2 + 2 = 4
ghci> plus (S (S Z)) (S (S Z))
S (S (S (S Z)))

5.9

Lists

The List type is a recursive type found in just about every functional language in existence. A List is either empty or an element of some type prepended to another list of elements of that type. By this definition, lists may contain elements of any type, but each element of the list has the same type.

5.9.1. List

module List where

data List a
  = Empty
  | Cell a (List a)
  deriving Show

Constructing values of type List works as expected:

5.9.2. List

*List> :type Empty
Empty :: List a

*List> :type Cell
Cell :: a -> List a -> List a

*List> :type Cell True
Cell True :: List Bool -> List Bool

*List> :type Cell True Empty
Cell True Empty :: List Bool

*List> :type Cell True (Cell False Empty)
Cell True (Cell False Empty) :: List Bool

-- A list of lists!
*List> :type Cell (Cell True Empty) Empty
Cell (Cell True Empty) Empty :: List (List Bool)

Typically, the programmer writes recursive functions to manipulate values of recursive types. Using pattern matching, it's possible to write a function such as length in a couple of lines:

5.9.3. ListLength

module ListLength where

list_length :: List a -> Integer
list_length list =
  case list of
    Null        -> 0
    Cell _ rest -> 1 + (list_length rest)

As with the Pair type, Most languages have shorthand syntax for constructing and matching on lists: "[x, y, ...]".

5.10

Trees

The definition of a binary tree type is similarly simple (and is both polymorphic and recursive):

5.10.1. BinaryTree

module BinaryTree where

data BTree a
  = Leaf
  | Tree (BTree a) a (BTree a)
  deriving Show

Manipulating values of these types should be unsurprising by this point:

5.10.2. Binary tree construction

*BinaryTree> :type Leaf
Leaf :: BTree a

*BinaryTree> :type Tree Leaf
Tree Leaf :: a -> BTree a -> BTree a

*BinaryTree> :type Tree Leaf True
Tree Leaf True :: BTree Bool -> BTree Bool

*BinaryTree> :type Tree Leaf True Leaf
Tree Leaf True Leaf :: BTree Bool

*BinaryTree> :type Tree Leaf True (Tree Leaf True Leaf)
Tree Leaf True (Tree Leaf True Leaf) :: BTree Bool

Note that, as demonstrated, it's obviously possible to build binary trees that are unbalanced. In order to preserve invariants such as "the tree must be balanced", it's often necessary to hide the data constructors of the given type and then use the module system of the language in question to expose functions that manipulate values of the tree types internally [6]. As an example, the Haskell standard library offers high performance maps and sets based on red-black trees. The programmer is not exposed to the internal construction of the trees that back the Map and Set types, but internally they are composed of nothing more than the algebraic data types described so far.

The BinaryTree type, as with all the types discussed so far, is an example of a persistent data structure. Persistent data structures offer rather large benefits with regards to both simplification of and confidence in the correctness of the code that deals with them compared to their mutable counterparts (particularly in heavily concurrent/parallel programs). See Purely functional data structures by Chris Okasaki, Cambridge University Press, 1998, ISBN 0-521-66350-4.

5.11

Hiding

As mentioned in the previous section, there may sometimes be additional restrictions on the creation of values. For example, if the programmer wants to define a type representing the set of integers greater than or equal to 0, it can't be achieved directly. Most languages have multiple ways of dealing with this problem. In Haskell, the simplest way to achieve this is to hide the data constructors of the declared type, and provide one or more functions that enforce the desired invariants:

5.11.1. Natural

module Natural (
  Natural,
  make_natural,
  from_natural
) where

import Option

data Natural =
  MakeNatural Integer
  deriving Show

make_natural :: Integer -> Option Natural
make_natural x =
  if x >= 0
  then Some (MakeNatural x)
  else None

from_natural :: Natural -> Integer
from_natural n =
  case n of
    MakeNatural m -> m

The module declaration exports the Natural type, but does not export its constructor, MakeNatural. This effectively means that it's not possible for the programmer to construct values of type Natural directly! However, the module also exports a function make_natural which takes an integer and returns an optional value based on whether the integer falls within the desired range. It also exports a convenient function, from_natural, which "unwraps" a Natural value and returns the Integer contained within [7].

Essentially, this is a technique to make a given type abstract. That is, the structure of the type essentially becomes private. Languages such as OCaml allow types to be made abstract using module signatures. Java and Ada have the private keyword for controlling visibility of (various parts of) types.

5.12

Arrays

Arrays do not have an inductive structure, unlike all of the data structures described so far, and are therefore usually provided by the programming language's standard library as abstract types with associated get and set functions for accessing elements. They are mentioned here for completeness.

Arithmetic expression interpreter

6.1

Specification

An arithmetic expression is one of the following:

6.1.1. Arithmetic expression specification

An integer constant.
The sum of two arithmetic expressions.
The product of two arithmetic expressions.
The difference of two arithmetic expressions.

Evaluating an arithmetic expression requires the following steps:

6.1.2. Evaluation specification

If the expression is a constant, return the constant.
If the expression is an addition, evaluate the two subexpressions and then return the sum of their results.
If the expression is a multiplication, evaluate the two subexpressions and then return the product of their results.
If the expression is a subtraction, evaluate the two subexpressions and then return the difference of their results.

6.2

Implementation

First, it's possible to produce an almost line-for-line copy of the specification as a type:

6.2.1. ArithmeticExpression

module Expression where

data Expression
  = Constant       Integer
  | Addition       Expression Expression
  | Multiplication Expression Expression
  | Subtraction    Expression Expression
  deriving Show

The definition of the evaluation function is similarly trivial, and matches the textual specification:

6.2.2. Interpreter

module Interpreter where

import Expression

run :: Expression -> Integer
run (Constant       x)     = x
run (Addition       e0 e1) = (run e0) + (run e1)
run (Multiplication e0 e1) = (run e0) * (run e1)
run (Subtraction    e0 e1) = (run e0) - (run e1)

The run function is an example of an equational definition. The definition is the same as using a case expression in the body of the function, but more syntactically compact.

It's possible to evaluate expressions from the ghci interpreter for some assurance that the function really is correct:

6.2.3. Example expressions

ghci> import Interpreter

ghci> run (Constant 23)
23
ghci> run (Addition (Constant 23) (Constant 17))
40
ghci> run (Multiplication (Constant 23) (Constant 17))
391

The code has a large number of advantages over the original Java code:

6.2.4. Advantages

The compiler can statically guarantee that all expression types are handled: It knows exactly what forms an expression can take and will not let the programmer fail to handle them.
Each type of expression can be uniquely distinguished simply by pattern matching: The type of the expression is represented by the constructor used to create it. There is no manual bookkeeping required on the part of the programmer.
There are no nullable references. The code is immune to that class of bugs.
The declarations are compact and follow the definitions given in the textual specification directly. There is a high degree of confidence that the program implements the specification as written, and implements it without surprises.
The evaluation function is written using only structural recursion. This means that it operates on structures of decreasing size as it progresses: This gives a usable proof that the function terminates (which is essential when the code is subject to formal verification).

[0]

Used to cut down on even more boilerplate.

[1]

The IO type in the signature of the function indicates that the function performs I/O. See Input/Output in Haskell 98.

[2]

Unfortunately, the ghc compiler has all warnings disabled by default, so it's necessary to use -fwarn-incomplete-patterns at the command line to see warnings. The author recommends running with -W -Wall -Werror at all times!

[3]

Constructors are considered to be constants when matching.

[4]

Apparently, to purposefully confuse new users of the language.

[5]

See Lightweight static exceptions for an exploration of this.

[6]

Languages with dependent types allow the expression of types that statically guarantee that the resulting trees will be balanced. See Certified programming with dependent types for some impressive examples.

[7]

This function is optimized away by the compiler. In fact, the MakeNatural constructor would be optimized away too, leaving a plain Integer at run-time, if not for the unfortunate presence of lazy evaluation in Haskell. Other languages do not share this deficiency.