A Crash Course In Algebraic Types

Overview

This document attempts to describe so-called algebraic
data types, and the language construct - pattern matching - used to
manipulate them in functional languages.

The document assumes very little in the way of prior knowledge of type systems
and attempts to be light on mathematics and theory. The document is aimed at
people with a passing interest in static type systems, and also at authors of
new programming languages. It is also likely of interest to users of existing
statically typed languages who are curious as to what newer languages have to
offer.

Why?

Overview

In this example, a small arithmetic expression interpreter is developed. This
example highlights the problems caused by the lack of algebraic data types
and pattern matching, and also highlights the various complicated language
features and methodologies that arise to deal with a lack of them.

Specification

An arithmetic expression is one of the following:

- An integer constant.
- The sum of two arithmetic expressions.
- The product of two arithmetic expressions.
- The difference of two arithmetic expressions.

Evaluating an arithmetic expression requires the following steps:

- If the expression is a constant, return the constant.
- If the expression is an addition, evaluate the two subexpressions and then return the sum of their results.
- If the expression is a multiplication, evaluate the two subexpressions and then return the product of their results.
- If the expression is a subtraction, evaluate the two subexpressions and then return the difference of their results.

Implementation

The simplest (and possibly only) way to represent the type of an arithmetic
expression in Java is as a set of classes extending an abstract base class. The
ArithmeticExpression type demonstrates this:

package com.io7m.example.ccatpm; abstract class ArithmeticExpression { /** * An integer constant. */ public final static class ConstantExpression extends ArithmeticExpression { private final int value; @SuppressWarnings("synthetic-access") ConstantExpression( final int value) { super(ExpressionType.EXP_CONSTANT); this.value = value; } public final int getValue() { return this.value; } } static enum ExpressionType { EXP_CONSTANT, EXP_PLUS, EXP_MULTIPLY, EXP_SUBTRACT } /** * The product of two arithmetic expressions. */ public final static class MultiplyExpression extends ArithmeticExpression { private final ArithmeticExpression e_left; private final ArithmeticExpression e_right; @SuppressWarnings("synthetic-access") MultiplyExpression( final ArithmeticExpression e_left, final ArithmeticExpression e_right) { super(ExpressionType.EXP_MULTIPLY); this.e_left = e_left; this.e_right = e_right; } public final ArithmeticExpression getLeft() { return this.e_left; } public final ArithmeticExpression getRight() { return this.e_right; } } /** * The sum of two arithmetic expressions. */ public final static class PlusExpression extends ArithmeticExpression { private final ArithmeticExpression e_left; private final ArithmeticExpression e_right; @SuppressWarnings("synthetic-access") PlusExpression( final ArithmeticExpression e_left, final ArithmeticExpression e_right) { super(ExpressionType.EXP_PLUS); this.e_left = e_left; this.e_right = e_right; } public final ArithmeticExpression getLeft() { return this.e_left; } public final ArithmeticExpression getRight() { return this.e_right; } } /** * The difference of two arithmetic expressions. */ public final static class SubtractExpression extends ArithmeticExpression { private final ArithmeticExpression e_left; private final ArithmeticExpression e_right; @SuppressWarnings("synthetic-access") SubtractExpression( final ArithmeticExpression e_left, final ArithmeticExpression e_right) { super(ExpressionType.EXP_SUBTRACT); this.e_left = e_left; this.e_right = e_right; } public final ArithmeticExpression getLeft() { return this.e_left; } public final ArithmeticExpression getRight() { return this.e_right; } } private final ExpressionType type; private ArithmeticExpression( final ExpressionType type) { this.type = type; } public final ExpressionType getType() { return this.type; } }

This example shows a number of immediate problems:

- There is a fixed set of types of expressions that will be evaluated, but the language cannot know in advance how many classes will really extend the ArithmeticExpression class. Essentially, the type wants to be "closed", but the language requires that it be "open". The best that can be done is to make the single constructor of the class private, and the class itself have default visibility (in other words, only visible to classes in the same package).
- Each subclass of ArithmeticExpression is required to carry around a value of type ExpressionType in order to distinguish the values of the classes later. It's possible to use the instanceof keyword to distinguish classes, but this obviously cannot be used in switch statement, meaning that the compiler cannot help the programmer to ensure that all cases are covered. It's also possible for the programmer that specifies the value of ExpressionType for each class to accidentally re-use values, leading to hard-to-locate bugs later.
- Java has nullable references. Everywhere a non-primitive type appears is a possible source of runtime null pointer exceptions.
- The definitions are annoyingly verbose: The code is already over 100 lines from a four-line specification. It's not obvious, without a lot of tedious scanning, that the type even represents the original specification at all.
- It's possible to create circular structures (by accidentally assigning the wrong values in the constructors of arithmetic expressions). Functions written to work with expressions that don't assume that expressions can be circular (as most won't) will run forever.

The evaluation function is as follows:

package com.io7m.example.ccatpm; import com.io7m.example.ccatpm.ArithmeticExpression.ConstantExpression; import com.io7m.example.ccatpm.ArithmeticExpression.MultiplyExpression; import com.io7m.example.ccatpm.ArithmeticExpression.PlusExpression; import com.io7m.example.ccatpm.ArithmeticExpression.SubtractExpression; public final class Interpreter { public static int run( final ArithmeticExpression expr) { switch (expr.getType()) { case EXP_CONSTANT: { final ConstantExpression actual = (ConstantExpression) expr; return actual.getValue(); } case EXP_MULTIPLY: { final MultiplyExpression actual = (MultiplyExpression) expr; final int left = Interpreter.run(actual.getLeft()); final int right = Interpreter.run(actual.getRight()); return left * right; } case EXP_PLUS: { final PlusExpression actual = (PlusExpression) expr; final int left = Interpreter.run(actual.getLeft()); final int right = Interpreter.run(actual.getRight()); return left + right; } case EXP_SUBTRACT: { final SubtractExpression actual = (SubtractExpression) expr; final int left = Interpreter.run(actual.getLeft()); final int right = Interpreter.run(actual.getRight()); return left - right; } default: throw new AssertionError("unreachable!"); } } private Interpreter() { } }

Again, there are obvious problems:

- The code contains one dangerous cast per expression type. The language cannot be informed of the formal link between the enumeration value and the associated class.
- The code assumes expressions, or the contents of expressions, won't be null.
- The Java compiler can't determine that all cases of the enumeration type are present. It requires an unreachable default clause.

Visitor Implementation

A second attempt at the implementation, using the
visitor pattern,
is even larger than the original, and borders on obfuscated with regards to the
flow of execution. First, the interface that expressions must implement:

package com.io7m.example.ccatpm.visitor; interface Expression { int accept(ExpressionVisitor visitor); }

package com.io7m.example.ccatpm.visitor; abstract class Binary { private final Expression left; private final Expression right; public Binary( final Expression left, final Expression right) { this.left = left; this.right = right; } public final Expression getLeft() { return this.left; } public final Expression getRight() { return this.right; } }

Then, the types of each arithmetic expression case:

package com.io7m.example.ccatpm.visitor; final class Constant implements Expression { private final int value; public Constant( final int value) { this.value = value; } public int getValue() { return this.value; } @Override public int accept( final ExpressionVisitor visitor) { return visitor.visit(this); } }

package com.io7m.example.ccatpm.visitor; final class Add extends Binary implements Expression { public Add( final Expression left, final Expression right) { super(left, right); } @Override public int accept( final ExpressionVisitor visitor) { return visitor.visit(this); } }

package com.io7m.example.ccatpm.visitor; final class Multiply extends Binary implements Expression { public Multiply( final Expression left, final Expression right) { super(left, right); } @Override public int accept( final ExpressionVisitor visitor) { return visitor.visit(this); } }

package com.io7m.example.ccatpm.visitor; final class Subtract extends Binary implements Expression { public Subtract( final Expression left, final Expression right) { super(left, right); } @Override public int accept( final ExpressionVisitor visitor) { return visitor.visit(this); } }

Then, the type of arithmetic expression visitors:

package com.io7m.example.ccatpm.visitor; interface ExpressionVisitor { int visit(Add add); int visit(Constant constant); int visit(Multiply multiply); int visit(Subtract subtract); }

Then, finally, the evaluation function:

package com.io7m.example.ccatpm.visitor; public final class Interpreter { public static int evaluate( final Expression expression) { return expression.accept(new ExpressionVisitor() { @Override public int visit( final Add add) { return Interpreter.evaluate(add.getLeft()) + Interpreter.evaluate(add.getRight()); } @Override public int visit( final Constant constant) { return constant.getValue(); } @Override public int visit( final Multiply multiply) { return Interpreter.evaluate(multiply.getLeft()) * Interpreter.evaluate(multiply.getRight()); } @Override public int visit( final Subtract subtract) { return Interpreter.evaluate(subtract.getLeft()) - Interpreter.evaluate(subtract.getRight()); } }); } }

The implementation still has problems:

- The flow of execution is obfuscated. The maintainer of this code has to be familiar with the visitor pattern in order to understand it, as opposed to just being familiar with ordinary recursion.
- Java has nullable references. Everywhere a non-primitive type appears is a possible source of runtime null pointer exceptions.
- The definitions are even more verbose: The code is almost 150 lines. There is a lot of boilerplate code; each expression type has to implement an identical visit function. In a more complex example (such as the abstract syntax tree of a programming language), this quickly becomes excruciating.
- This implementation of the visitor pattern only allows the programmer to write functions that return int. Writing something more general requires even more complicated classes, and possibly even generics.

At the end of this document, an alternative implementation will be given
using algebraic types and pattern matching that avoids all of the problems of
both implementations, is less than a tenth of the size, and gives strong
assurances of safety and correctness.

What?

Algebraic data types are currently not well-known
outside of statically typed functional languages such as
OCaml,
Haskell,
Scala, amongst
others. The simplest way to think of them appears to be as a sort of
enumerated type,
where each case of the type can have associated data. Values of the types are inspected
via a language construct called pattern matching,
which can essentially be thought of
as a much more powerful version of the switch or
case construct found in languages such as C, Java,
Ada, and others. As underwhelming as this sounds, the combination of these types and
pattern matching allow for reducing what would be extremely complicated tasks in
other languages down to trivial symbol manipulation, and subsume vast parts
of the type systems in other languages.

Shapes

Declarations

In keeping with traditional introductory texts on programming languages, the
following definitions declare a collection of "shape" types:

module Shapes where data Circle = MakeCircle Integer deriving Show data Rectangle = MakeRectangle Integer Integer deriving Show data Shape = ShapeCircle Circle | ShapeRectangle Rectangle deriving Show

Each data declaration declares an
algebraic data type.
The Circle declaration, for example, declares a type
Circle with a single data constructor
named MakeCircle, which takes a single value of
type int as an argument. That is, the programmer calls
MakeCircle to create a value of type
Circle. It's important to note that the
MakeCircle function is the only way to create a
value of type Circle - this simple fact alone has far
reaching consequences and can be used to express and enforce various program
invariants (a point that this document will address later). The definitions also
ask Haskell, via the deriving statement, to provide
default implementations of the string conversion functions,
for the sake of being able to display values in the interpreter for examples in
this document - these can be ignored.

The Rectangle declaration is similar, except that its
single data constructor MakeRectangle takes two
integers as input (representing the width and height of the rectangle).

The last declaration, Shape, declares two data
constructors: ShapeCircle and
ShapeRectangle. That is, there are two ways to
obtain a value of type Shape: either call
ShapeCircle with a value of type
Circle, or call ShapeRectangle
with a value of type Rectangle.

It's possible to see how these declarations work by constructing values in
the interpreter and checking their types (the ::
notation can be read as "is of type" -
"23 :: Integer" means "23 is of type Integer").
The :type command, unsurprisingly, displays the type
of the given expression.

*Shapes> :type MakeCircle 23 MakeCircle 23 :: Circle *Shapes> :type MakeRectangle 23 11 MakeRectangle 23 11 :: Rectangle *Shapes> :type ShapeCircle (MakeCircle 23) ShapeCircle (MakeCircle 23) :: Shape *Shapes> :type ShapeRectangle (MakeRectangle 23 11) ShapeRectangle (MakeRectangle 23 11) :: Shape

Note that it's a type error to attempt to pass a Rectangle
to the ShapeCircle constructor:

*Shapes> :type ShapeCircle (MakeRectangle 23 11) <interactive>:1:14: Couldn't match expected type `Circle' with actual type `Rectangle' In the return type of a call of `MakeRectangle' In the first argument of `ShapeCircle', namely `(MakeRectangle 23 11)' In the expression: ShapeCircle (MakeRectangle 23 11) *Shapes> :type ShapeRectangle (MakeCircle 23) <interactive>:1:17: Couldn't match expected type `Rectangle' with actual type `Circle' In the return type of a call of `MakeCircle' In the first argument of `ShapeRectangle', namely `(MakeCircle 23)' In the expression: ShapeRectangle (MakeCircle 23)

Matching

Much as values are created via constructors,
it's necessary to deconstruct data structures
in order to get at the values contained within. This deconstruction is performed
via pattern matching. Most languages allow one
to, for example, pass an enumerated type into some form of
case statement and then perform some action for
each one of the individual cases. Pattern matching, however, allows the programmer
to perform actions based on the actual structure of the value passed in. As an
example, a simple function that takes any value of type Shape
and prints the name of the shape [1]:

module ShapeShow where import Shapes shape_show :: Shape -> IO () shape_show s = case s of ShapeRectangle _ -> print "rectangle" ShapeCircle _ -> print "circle"

Unsurprisingly, the case expression attempts to
match the input value against each pattern on the left hand side in turn, from
top to bottom, stopping at the first pattern that matches (there is no "fall-through"
as with switch statements in other languages). This
naturally leads to patterns being arranged in decreasing order of specificity (most
specific patterns first).

Notice that by pattern matching on a value of type Shape,
the programmer learns which constructor was used to construct the value and can
perform an action for each. The programmer is also statically prevented from failing
to handle a case, to ensure correctness - the compiler warns against patterns that
are redundant (unreachable) or that aren't exhaustive [2]:

module ShapeShowNE where import Shapes shape_show_ne :: Shape -> IO () shape_show_ne s = case s of ShapeRectangle _ -> print "rectangle"

ShapeShowNE.hs:7:3: Warning: Pattern match(es) are non-exhaustive In a case alternative: Patterns not matched: ShapeCircle _

module ShapeShowOL where import Shapes shape_show_ol :: Shape -> IO () shape_show_ol s = case s of ShapeRectangle _ -> print "rectangle" ShapeRectangle _ -> print "rectangle"

ShapeShowOL.hs:7:3: Warning: Pattern match(es) are overlapped In a case alternative: ShapeRectangle _ -> ...

The warning about non-exhaustive cases is extremely important with regards
to software maintenance: Add another constructor to a data type and the
compiler will immediately inform the programmer of every point in the code
that now needs to be modified to work correctly.

A similar function that prints the width of any shape:

module ShapeWidth where import Shapes shape_width :: Shape -> IO () shape_width s = case s of ShapeRectangle (MakeRectangle width _) -> print width ShapeCircle (MakeCircle radius) -> print (2 * radius)

The names used on the left hand side of the case
expression are given by the programmer for use in the expressions on the
right hand side. That is, the programmer chooses the names that he/she wishes to
give to each field he/she is interested in, at the point of use. The
_ or
wildcard symbol is used to indicate
that the programmer doesn't care about a particular value and therefore doesn't
want to provide a name. Wildcards match everything.
A silly function that prints "Boo!" regardless of whatever's passed in:

module ShapeBoo where import Shapes shape_boo :: Shape -> IO () shape_boo s = case s of _ -> print "Boo!"

Patterns consist of constants [3], variables,
and wildcards. An example of a function that compares shapes for equality with
slightly more complex patterns:

module ShapeEquals where import Shapes shape_equals :: Shape -> Shape -> Bool shape_equals s t = case (s, t) of (ShapeCircle (MakeCircle r0), ShapeCircle (MakeCircle r1)) -> r0 == r1 (ShapeRectangle (MakeRectangle w0 h0), ShapeRectangle (MakeRectangle w1 h1)) -> (w0 == w1) && (h0 == h1) (_, _) -> False

The Haskell definition of the function written with pattern matching is almost
exactly what one would expect given the typical informal mathematical specification
of the function:

- If both shapes are circles, then both shapes are equal iff their radii are equal.
- If both shapes are rectangles, then both shapes are equal iff both their widths and heights are equal.
- Otherwise, the shapes are not equal.

The Type Zoo

- 5.1. Overview
- 5.2. Unit
- 5.3. Boolean
- 5.4. Enumeration
- 5.5. Option
- 5.6. Choice
- 5.7. Pairs
- 5.8. Naturals (inductive)
- 5.9. Lists
- 5.10. Trees
- 5.11. Hiding
- 5.12. Arrays

Overview

As stated previously, the combination of algebraic data types and pattern
matching is extremely powerful. Whereas languages that lack them have to
keep various type system features completely distinct (in both concept
and implementation), the language of algebraic types is expressive enough
to support a wide range of structures with little or no "magic". Many
structures that require explicit language support in other systems can
simply be expressed in terms of algebraic types (possibly with extra
syntax support).

Unit

The Unit type is the simplest possible algebraic
type. It has one data constructor that takes no values. In impure languages
like OCaml,
the Unit type is often given as the return value
of functions that are only of interest for their side effects, such as
print_string (which is of type
string -> unit, and simply prints its argument
to standard out and returns unit). In Haskell,
the Unit type is called ()
and has a single data constructor, also called ()
[4].

Boolean

The Boolean type is an algebraic type with
two data constructors:

module Boolean where data Boolean = True | False deriving Show

This definition of Boolean type allows languages
to implement the usual if ..
then .. else
construct in terms of pattern matching, simplifying the language definition
and semantics.

Enumeration

As may or may not be obvious, it's possible to achieve the normal
enumeration types present in other languages. The data constructors of the
type are simply declared as not taking any values:

module Enumeration where data Color = Red | Blue | Green | Yellow deriving Show

Option

The Option type is
the first polymorphic type that has
been described so far. The definition of Option
is as follows:

module Option where data Option a = Some a | None deriving Show

The definition introduces a type Option, which
is parameterized by a type variable arbitrarily named
a. It defines two data constructors,
Some and None,
the first of which takes a value of type a. The
type encodes the concept of an optional value:
The programmer is required to pattern match on values of type
Option to determine whether or not a value is
present.

module OptionPresent where import Option present :: Option a -> IO () present o = case o of Some _ -> print "present" None -> print "not present"

Typically, most programs represent the above as a nullable reference type. This
leads to endless bugs involving accidentally dereferencing null references.
Programs written using Option types are
immune to this class of bugs, as it's statically impossible to get access
to a nonexistent value.

Note that as Option is polymorphic, it may
be instantiated with any other type, including other Option
types:

*Option> :type None None :: Option a *Option> :type Some Some :: a -> Option a *Option> :type Some True Some True :: Option Bool *Option> :type Some (23 :: Integer) Some (23 :: Integer) :: Option Integer *Option> :type Some (Some (23 :: Integer)) Some (Some (23 :: Integer)) :: Option (Option Integer) *Option> :type Some None Some None :: Option (Option a)

Choice

The Choice type is a logical continuation
of the Option type. It represents a choice
between values of two types:

module Choice where data Choice a b = ChoiceLeft a | ChoiceRight b deriving Show

As with Option, it may be instantiated with any
other types:

*Choice> :type ChoiceLeft ChoiceLeft :: a -> Choice a b *Choice> :type ChoiceRight ChoiceRight :: b -> Choice a b *Choice> :type ChoiceLeft True ChoiceLeft True :: Choice Bool b *Choice> :type ChoiceRight True ChoiceRight True :: Choice a Bool

Pairs

Where the Choice type represents a choice between
two possible values, the Pair type represents
a structure where values of two (possibly different) types will be present
simultaneously. The Pair type is equivalent
to the tuple type in other languages:

module Pair where data Pair a b = MakePair a b deriving Show

Naturally, Pair types can be nested, forming
arbitrarily complex structures:

*Pair> :type MakePair MakePair :: a -> b -> Pair a b *Pair> :type MakePair True MakePair True :: b -> Pair Bool b *Pair> :type MakePair True (23 :: Integer) MakePair True (23 :: Integer) :: Pair Bool Integer *Pair> :type MakePair (MakePair (23 :: Integer) True) (23 :: Integer) MakePair (MakePair (23 :: Integer) True) (23 :: Integer) :: Pair (Pair Integer Bool) Integer *Pair> :type MakePair (23 :: Integer) (MakePair (MakePair False True) (MakePair True False)) MakePair (23 :: Integer) (MakePair (MakePair False True) (MakePair True False)) :: Pair Integer (Pair (Pair Bool Bool) (Pair Bool Bool))

Most languages have shorthand syntax for constructing values of some standard library
pair type: "(x, y)".

Naturals (inductive)

Many functional languages (particularly those that act as proof assistants)
use the following definition for natural numbers:

module NaturalInd where data Natural = Z | S Natural deriving Show

This definition is taken directly from axioms 1 and 6 of the
Peano axioms,
and is probably the first type encountered by programmers upon being
introduced to most proof assistants. The type is the first recursive type
described so far; that is, a type that refers to itself in its own
definition.

*NaturalInd> :type Z Z :: Natural *NaturalInd> :type S S :: Natural -> Natural -- One *NaturalInd> :type S Z S Z :: Natural -- Two *NaturalInd> :type S (S Z) S (S Z) :: Natural -- Three *NaturalInd> :type S (S (S Z)) S (S (S Z)) :: Natural

It's also possible to copy the definition of, for example,
plus, almost directly:

module NaturalIndPlus where import NaturalInd plus :: Natural -> Natural -> Natural plus x y = case (x, y) of (n, Z) -> n (n, S m) -> S (plus n m)

-- 0 + 0 = 0 ghci> plus Z Z Z -- 0 + 1 = 1 ghci> plus Z (S Z) S Z -- 1 + 1 = 2 ghci> plus (S Z) (S Z) S (S Z) -- 2 + 2 = 4 ghci> plus (S (S Z)) (S (S Z)) S (S (S (S Z)))

Lists

The List type is a recursive type found in
just about every functional language in existence.
A List is either empty or an
element of some type prepended to another list of elements of that type.
By this definition, lists may contain elements of any type, but each
element of the list has the same type.

module List where data List a = Empty | Cell a (List a) deriving Show

Constructing values of type List works
as expected:

*List> :type Empty Empty :: List a *List> :type Cell Cell :: a -> List a -> List a *List> :type Cell True Cell True :: List Bool -> List Bool *List> :type Cell True Empty Cell True Empty :: List Bool *List> :type Cell True (Cell False Empty) Cell True (Cell False Empty) :: List Bool -- A list of lists! *List> :type Cell (Cell True Empty) Empty Cell (Cell True Empty) Empty :: List (List Bool)

Typically, the programmer writes recursive functions to manipulate
values of recursive types. Using pattern matching, it's
possible to write a function such as length
in a couple of lines:

module ListLength where list_length :: List a -> Integer list_length list = case list of Null -> 0 Cell _ rest -> 1 + (list_length rest)

As with the Pair type, Most languages have shorthand
syntax for constructing and matching on lists:
"[x, y, ...]".

Trees

The definition of a binary tree type is similarly simple (and is both polymorphic
and recursive):

module BinaryTree where data BTree a = Leaf | Tree (BTree a) a (BTree a) deriving Show

Manipulating values of these types should be unsurprising by this point:

*BinaryTree> :type Leaf Leaf :: BTree a *BinaryTree> :type Tree Leaf Tree Leaf :: a -> BTree a -> BTree a *BinaryTree> :type Tree Leaf True Tree Leaf True :: BTree Bool -> BTree Bool *BinaryTree> :type Tree Leaf True Leaf Tree Leaf True Leaf :: BTree Bool *BinaryTree> :type Tree Leaf True (Tree Leaf True Leaf) Tree Leaf True (Tree Leaf True Leaf) :: BTree Bool

Note that, as demonstrated, it's obviously possible to build binary trees
that are unbalanced. In order to preserve invariants such as "the tree must
be balanced", it's often necessary to hide the data constructors of the
given type and then use the module system of the language in question to
expose functions that manipulate values of the tree types internally
[6]. As an example, the Haskell standard library offers high performance
maps and sets based on red-black trees. The programmer is not exposed to the
internal construction of the trees that back the Map and
Set types, but internally they are composed of
nothing more than the algebraic data types described so far.

The BinaryTree type, as with all the types
discussed so far, is an example of a
persistent
data structure. Persistent data structures offer rather
large benefits with regards to both simplification of and confidence in the
correctness of the code that deals with them compared to their mutable counterparts
(particularly in heavily concurrent/parallel programs).
See Purely functional data structures by Chris Okasaki,
Cambridge University Press, 1998, ISBN 0-521-66350-4.

Hiding

As mentioned in the previous section, there may sometimes be additional
restrictions on the creation of values. For example, if the programmer wants
to define a type representing the set of integers greater than or equal to
0, it can't be achieved directly. Most
languages have multiple ways of dealing with this problem. In Haskell, the
simplest way to achieve this is to hide the data constructors of the declared
type, and provide one or more functions that enforce the desired invariants:

module Natural ( Natural, make_natural, from_natural ) where import Option data Natural = MakeNatural Integer deriving Show make_natural :: Integer -> Option Natural make_natural x = if x >= 0 then Some (MakeNatural x) else None from_natural :: Natural -> Integer from_natural n = case n of MakeNatural m -> m

The module declaration exports the Natural type,
but does not export its constructor, MakeNatural.
This effectively means that it's not possible for the programmer to construct
values of type Natural directly! However, the module also
exports a function make_natural which takes
an integer and returns an optional value based on whether the integer falls
within the desired range. It also exports a convenient function,
from_natural, which "unwraps" a
Natural value and returns the Integer
contained within [7].

Essentially, this is a technique to make a given type
abstract. That is, the structure of the type
essentially becomes private. Languages such as
OCaml allow types
to be made abstract using module signatures.
Java and Ada have the private keyword for controlling
visibility of (various parts of) types.

Arrays

Arrays do not have an inductive
structure, unlike all of the data structures described so far,
and are therefore usually provided by the programming language's
standard library as abstract types with associated
get and set
functions for accessing elements. They are mentioned here for completeness.

Arithmetic expression interpreter

Specification

An arithmetic expression is one of the following:

- An integer constant.
- The sum of two arithmetic expressions.
- The product of two arithmetic expressions.
- The difference of two arithmetic expressions.

Evaluating an arithmetic expression requires the following steps:

- If the expression is a constant, return the constant.
- If the expression is an addition, evaluate the two subexpressions and then return the sum of their results.
- If the expression is a multiplication, evaluate the two subexpressions and then return the product of their results.
- If the expression is a subtraction, evaluate the two subexpressions and then return the difference of their results.

Implementation

First, it's possible to produce an almost line-for-line copy of the
specification as a type:

module Expression where data Expression = Constant Integer | Addition Expression Expression | Multiplication Expression Expression | Subtraction Expression Expression deriving Show

The definition of the evaluation function is similarly trivial, and matches
the textual specification:

module Interpreter where import Expression run :: Expression -> Integer run (Constant x) = x run (Addition e0 e1) = (run e0) + (run e1) run (Multiplication e0 e1) = (run e0) * (run e1) run (Subtraction e0 e1) = (run e0) - (run e1)

The run function is an example of an
equational definition. The definition
is the same as using a case expression
in the body of the function, but more syntactically compact.

It's possible to evaluate expressions from the ghci
interpreter for some assurance that the function really is correct:

ghci> import Interpreter ghci> run (Constant 23) 23 ghci> run (Addition (Constant 23) (Constant 17)) 40 ghci> run (Multiplication (Constant 23) (Constant 17)) 391

The code has a large number of advantages over the original Java code:

- The compiler can statically guarantee that all expression types are handled: It knows exactly what forms an expression can take and will not let the programmer fail to handle them.
- Each type of expression can be uniquely distinguished simply by pattern matching: The type of the expression is represented by the constructor used to create it. There is no manual bookkeeping required on the part of the programmer.
- There are no nullable references. The code is immune to that class of bugs.
- The declarations are compact and follow the definitions given in the textual specification directly. There is a high degree of confidence that the program implements the specification as written, and implements it without surprises.
- The evaluation function is written using only structural recursion. This means that it operates on structures of decreasing size as it progresses: This gives a usable proof that the function terminates (which is essential when the code is subject to formal verification).

[0]

Used to cut down on even more boilerplate.

[1]

The IO type
in the signature of the function indicates that the function performs I/O.
See Input/Output in
Haskell 98.

[2]

Unfortunately, the
ghc compiler has all warnings disabled by default,
so it's necessary to use -fwarn-incomplete-patterns
at the command line to see warnings. The author recommends running with
-W -Wall -Werror at all times!

[3]

Constructors are considered
to be constants when matching.

[4]

Apparently, to purposefully confuse new users of the language.

[5]

See
Lightweight static exceptions
for an exploration of this.

[6]

Languages with dependent types allow
the expression of types that statically guarantee that the resulting trees will
be balanced. See Certified
programming with dependent types for some impressive examples.

[7]

This function is optimized away by the compiler. In fact,
the MakeNatural constructor would be optimized away too, leaving
a plain Integer at run-time, if not for the unfortunate
presence of lazy evaluation in Haskell. Other languages do not share this deficiency.