## Functional Scala: a little expression language with algebraic datatypes and pattern matching

Welcome to another episode of Functional Scala!

In this episode we’ll going to recap and employ everything we’ve learned so far about algebraic datatypes and pattern matching for implementing a more extensive example. We’ll put all those little pieces together which we’ve discovered so far and see how they work in cooperation for building a little language which allows us to write, pretty print, step-wise reduce and evaluate arbitrary arithmetic expressions (considering addition, subtraction and multiplication on integer values). And as the cherry on top, we’ll discover some new flavour of algebraic datatypes and pattern matching.

It may be a good start to first give some thoughts on how we wanna represent algebraic expressions. Just consider a language of expressions build up from basic values (integer literals) and some operations on them (e.g. addition or multiplication of some integer values). For example, the following are all arithmetic expressions:

17

17 + 23

109 * ( 8 + 15 )

An expression in its simplest form might be an integer literal. That’s a legal expression for sure! And how could we represent such an entity within our little language? What about an algebraic datatype Expression? Right, mate! Good idea! Let’s write it down:

```sealed abstract case class Expression
case class Literal( x :Int ) extends Expression
```

Ok, so far we have a simple and neat product type, since we could express a whole bunch of different integer literals, just using that single value constructor:

```val expr1 = Literal( 1 )
val expr2 = Literal( 17 )
...
val exprN = Literal( 532 )
```

Well, if you think that’s boring, you’re again right! So far, our Expressions are a bit too simple, right? How do you feel about adding some operations which will take some integer literals and act on them? The addition of two integer literals surely pose an Expression in itself. So let’s add it as another value constructor to our datatype, extending our expression language:

```sealed abstract case class Expression
case class Literal( x :Int ) extends Expression
case class Add( a :Literal, b :Literal ) extends Expression
```

Aaahh, now things start to become more interesting. Our new value constructor also features some components, that is two integer literals which act as the operands for our operation Add.

```val sum1 = Add( Literal( 17 ), Literal( 4 ) )
val sum2 = Add( Literal( 123 ), Literal( 321 )
```

In a sense, we’ve just created a recursive datatype (ok, indirectly), since those two components are also Expressions! Um, wait a minute! What was that? Those two components are also expressions? Yep, we just declared literals as being legal Expressions in its simplest form! But is there actually any reason why we restrict addition to act on integer literals directly? If we would allow arbitrary Expressions as operands, we could express nested Expressions within our language like this:

```val sum3 = Add( Literal( 21 ), Add( Literal( 17 ), Literal( 4 ) ) )
val sum4 = Add(  Add( Literal( 19 ), Literal( 2 ) ), Literal( 12 ) )
```

In fact, there’s nothing to be said against nested operations (unless you want to restrict yourself to ordinary, basic expressions). From this point of view, we could build arbitrary nested expressions in a recursive way. This recursice nature of building expressions is directly reflected within our value constructors! Observe:

```sealed abstract case class Expression
case class Literal( x :Int ) extends Expression
case class Add( x :Expression, y :Expression ) extends Expression
case class Mult( x :Expression, y :Expression ) extends Expression
case class Sub(  x :Expression, y :Expression ) extends Expression
```

See the recurive structure? We now defined an algebraic datatype with a recursive structure, allowing us to build arbitrary nested arithmetic expressions. Since there are several operations all acting on an arbitrary expression, we could mix them in every which way.

```val expr1 =  Mult( Literal( 6 ), Add( Literal( 3 ), Literal( 4 )  ) )
val expr3 = Mult(Sub(Literal(6), Mult(Sub( Literal(5), Literal(3)), Literal(3))), Add(Literal(3), Mult(Literal(5), Literal(8))))
```

Ah, ok! We’re now able to construct arbitrary expressions featuring any level of nesting, so we could come up with some rather complex expressions. Hm, is there anything annoying?

### A pretty printer

At least, we might got some problems when trying to read a more complex expression. It’s rather difficult to detect the correct nesting structure, resulting from the fact that operations are expressed in postfix notation within our little language. For that to change, let’s write a little function which takes an arbitrary expression and results into a string which represents the given expression in infix notation (which our brain is trained to scan and recognize).

Given the recursive nature of our expressions, we can mirror that fact directly within our function: we’ll pattern match against every possible value constructor for our Expression type, formatting them into their related infix format and than filling the gaps for every sub-expression (that are the operands for any given operation) by just formatting them recursively. Just watch:

```val format : Expression => String
=
_ match {
case Literal( x ) => x.toString
case Add( leftExpr, rightExpr ) => "( " + format( leftExpr ) + " + " + format( rightExpr ) + " )"
case Mult( leftExpr, rightExpr ) => "( " + format( leftExpr ) + " * " + format( rightExpr ) + " )"
case Sub( leftExpr, rightExpr ) => "( " + format( leftExpr ) + " - " + format( rightExpr ) + " )"
}
```

That’s all? Aye, mate! That’s all! We’ve just looked at every basic expression type and give an appropriate format. By binding the components for the value constructors of any given operation, we referred to them at the right side of a case expression being able to format them recursively before inserting them into their right place within the string representation for the current operation. This way we’re now able to pretty print any given expression, like the ones above:

```println( format( expr1 ) )  // ( 6 * ( 3 + 4 ) )
println( format( expr2 ) )  // ( ( 21 * ( 103 + 42 ) ) - ( 17 + ( 29 * 7 ) ) )
println( format( expr3 ) )  // ( ( 6 - ( ( 5 - 3 ) * 3 ) ) * ( 3 + ( 5 * 8 ) ) )
```

Ok, now we got a weapon for printing any given expression in infix notation, no matter how complex (read deeply nested) they may be.

### Evaluation

What about evaluating a given expression? Finally, we might wanna resolve a complex expression down to a single integer value by evaluating all operations within that expression. Is this any harder as formatting a nested expression? Surely not! It turns out that we can rely on the very same recursive nature of our datatype! So while evaluating a single operation, we just evaluate the value for every operand recursively, again.

```val eval : Expression => Int
=
_ match {
case Literal( x ) => x
case Add( leftExpr, rightExpr ) => eval( leftExpr ) + eval( rightExpr )
case Mult( leftExpr, rightExpr ) => eval( leftExpr ) * eval( rightExpr )
case Sub( leftExpr, rightExpr ) => eval( leftExpr ) - eval( rightExpr )
}

```

See the structural similarity to our function format, which’s again reflecting the structural similarity to our recursive datatype? Anyhow, now we can just apply any expression to our eval function and receive the final, resolved integer value, to which the expression evaluates:

```eval( expr1 )  // 42
eval( expr2 )  // 2825
eval( expr3 )  // 0
```

Wow, our last expression evaluated to zero! Did you see that coming? Well, it would be nice if we had a function which could reduce a given expression step-wise, so that we could follow the single resolvement steps until we finally receive a single integer literal.

### Step by step, oh baby …

For this to realize, we need to find a plan which expression patterns we’re able to reduce in which way. Surely, a given integer literal can’t be reduced any further. What’s about the addition of two literals? Well, we only could reduce it to a single integer literal, illustrating the result of that basic addition. As soon as their’s at least one more complex (nested) operand, we first need to reduce that operand solely, before reducing the whole operation in a next step. I think you got the idea: we just need to act on every operation in exactly the same way:

```val reduce : Expression => Expression
=
( expr :Expression ) => expr match {

case Literal(_) => expr

case Add( Literal(_), Literal(_) ) => Literal( eval( expr ) )
case Add( left @ Literal(_), rightExpr ) => Add( left, reduce( rightExpr ) )
case Add( leftExpr, right @ Literal(_) ) => Add( reduce( leftExpr ), right )
case Add( leftExpr, rightExpr ) => Add( reduce( leftExpr ), rightExpr )

case Sub( Literal(_), Literal(_) ) => Literal( eval( expr ) )
case Sub( left @ Literal(_), rightExpr ) => Sub( left, reduce( rightExpr ) )
case Sub( leftExpr, right @ Literal(_) ) => Sub( reduce( leftExpr ), right )
case Sub( leftExpr, rightExpr ) => Sub( reduce( leftExpr ), rightExpr )

case Mult( Literal(_), Literal(_) ) => Literal( eval( expr ) )
case Mult( left @ Literal(_), rightExpr ) => Mult( left, reduce( rightExpr ) )
case Mult( leftExpr, right @ Literal(_) ) => Mult( reduce( leftExpr ), right )
case Mult( leftExpr, rightExpr ) => Mult( reduce( leftExpr ), rightExpr )
}
```

Wowowow, hold on! What’s going on here? Everything looks familiar for the first case expression: there, we simply match the given expression against a single literal. We’re not interested in the given integer value for that literal (therefore the underscore), since we only return the given expression back in case of a match. Same goes for the first case expression matching against Add: since both operands mark some simple integer literals, we’ll return a new literal which represents the sum (by addition) of those two basic operands.

No need to get scared of the next case expression. There we wanna match against Add while the first operand being a simple literal and the second one a nested expression (we know that the second operand can’t be a literal – in that case there would’ve been a match on the first case for Add, right?). If we got a match – hence a literal for the first and a nested expression for the second operand – we’ll return a new (!) Add expression, were the second operand is going to be reduced and the first operand just remains the same. Now we got a dilemma: we needed to match the first operand against a literal pattern while also refering to it as a whole on the right side for building a new Add expression! But specifying a literal pattern prevents us to declare a variable binding for the first operand. And the other way round, if we simply declare a variable binding for the first operand, how can we be sure to match against a literal pattern simultaneously?

That’s the moment our new constsuct – the @-annotation – is waiting for! It’s just a way to bind a variable to a pattern! So in our case, we just match the first operand against a literal pattern and bind that whole pattern (that is the first operand, in case you’ve forgot it) to a variable named left. So if there’s a match, we have a convenient way to refer to the first operand on the right side while it’s guaranteed to be a simple literal.

Now it’s time to see our new function in action. Let’s take a rather complex expression and reduce it step by step:

```val expr1 = Mult(Sub(Literal(6), Mult(Sub( Literal(5), Literal(3)), Literal(3))), Add(Literal(3), Mult(Literal(5), Literal(8))))

val expr2 = reduce( expr1 )
val expr3 = reduce( expr2 )
val expr4 = reduce( expr3 )
val expr5 = reduce( expr4 )
val expr6 = reduce( expr5 )
val expr7 = reduce( expr6 )

println( format( expr1 ) )  // ( ( 6 - ( ( 5 - 3 ) * 3 ) ) * ( 3 + ( 5 * 8 ) ) )
println( format( expr2 ) )  // ( ( 6 - ( 2 * 3 ) ) * ( 3 + ( 5 * 8 ) ) )
println( format( expr3 ) )  // ( ( 6 - 6 ) * ( 3 + ( 5 * 8 ) ) )
println( format( expr4 ) )  // ( 0 * ( 3 + ( 5 * 8 ) ) )
println( format( expr5 ) )  // ( 0 * ( 3 + 40 ) )
println( format( expr6 ) )  // ( 0 * 43 )
println( format( expr7 ) )  // 0
```

There you go. Now we’ve received a better understanding on how the whole expression’s evaluating to zero. We’ve kind of debugged the resolvement process, watching every intermediate step. Too bad, we need two know the number of steps in advance! So why not write a function which takes an expression and results into a list of expressions which represent the intermediate expressions after each reduction step until we reach a single literal:

```val stepResolve : ( Expression,List[Expression] ) => List[Expression]
=
( expr :Expression, steps :List[Expression] ) => expr match {

case Literal(_) => expr :: steps
case _ => stepResolve( reduce( expr ), expr :: steps )
}
```

Experts that we are, we can easily see what’s going on: the function just takes an expression and a list of expressions and then just acts on the given expression. If it’s a simple literal (the base case) then we can’t reduce the expression any further. We just prepend the expression to the list of expressions and be done. Otherwise we’re also prepending the expression but need to reduce it at least one more time. For our conveniance, we could come up with another function which just take a single expression and then starts the recursive resolvement steps just by providing an empty list at the start:

```val resolve  =  ( expr :Expression ) => stepResolve( expr, Nil ).reverse
```

So if you apply the above expression expr1 to resolve, you gonna receive all intermediate expressions, starting with the given expression down to the final integer literal:

```for( expr <- resolve( expr1 ) ) println( format( expr ) )

// will print ...
// ( ( 6 - ( ( 5 - 3 ) * 3 ) ) * ( 3 + ( 5 * 8 ) ) )
// ( ( 6 - ( 2 * 3 ) ) * ( 3 + ( 5 * 8 ) ) )
// ( ( 6 - 6 ) * ( 3 + ( 5 * 8 ) ) )
// ( 0 * ( 3 + ( 5 * 8 ) ) )
// ( 0 * ( 3 + 40 ) )
// ( 0 * 43 )
// 0
```

### Summary

In this episode we saw algebraic datatypes and pattern matching in practice. We used both to implement a little language for building representations of arbitrary nested arithmetic expressions and operate on them. We recognized Expression as a recursive datatype, since operational expressions like Add or Mult may take some other Expressions as their operands, which is directly reflected by their corresponding value constructors.

In addition to that, most of our functions also reflected the recursice nature of such an expression in a similar way: they were matching against those different value constructors to handle each expression type individually while operating recursively on the operands for any given operation. There we discovered @-annotations which gives us a way to match against a pattern and bind that pattern to a variable for further reference.

We’re not at the end of our road, considering arithmetic expressions. As you’ve seen while debugging through the intermediate expressions during the single resolvement steps, we did some more needless reduction steps. Since the first operand of a multiplication was already zero, we still did some reduction steps on the right operand yet. We’ll see in the next episode how we could get rid of those unnecessary steps by applying some trivial simplifications to our expressions. In addition to that, we’ll see how to reduce the number of case expression within our function reduce. In both cases, we’re going to discover some yet unknown forms of pattern matching and will encounter a way for kind of creating your own patterns, leveraging so called extractors. Hope to see you then …

### 10 Responses to “Functional Scala: a little expression language with algebraic datatypes and pattern matching”

1. […] datatypes and pattern matching for implementing a more extensive example. We’ll put all… [full post] Mario Gleichmann brain driven development generalscala 0 0 0 […]

2. Lutz Hankewitz Says:

I only felt a littel uneasy when thinking about adding an expression type (like e.g. a new value constructor “Modulo”) after implementing some number of “match” fuctions which then had to be extended (the old visitor problem when extending the class hierarchy).

Why is implementing the methods like “format”, “reduce” or “eval” directly in the case classes not an option?
Or is it?

Thanks for sharing.

Lutz

• Mario Gleichmann Says:

Lutz,

In fact, if the number of value constructors for your datatype is very volatile you get into trouble! As you said, like applying the visitor pattern in an OO environment, it’s best used with a ‘fixed’ type and an unforeseen number of functions, operating on them.

Of course, Scala allows for a more object oriented solution as well as a more functional solution (in this series, i focuss on the functional side). If the problem space asks for an object oriented solution, i would go for that. In that sense, you could of course design ‘real’ objects (in contrast to pure algebraic datatypes) and spend them some methods, e.g. format or eval. But then you may get into the risk to work with mutable state and lose some of the characteristics of pure functional programs (like referential transparency and an easy way to reason about programs in a mathematical / algebraic way)

Greetings

Mario

3. Alex Says:

Great! But where I’ve run into trouble is with a typed AST. It makes writing eval methods easy and type-safe, but turns any more abstract treatment of expressions into existential hell.

I guess that’s why no one ever uses a typed AST. 🙂

• Mario Gleichmann Says:

Alex,

I’m not quite sure if i understand your statements the right way.

I think you’re talking about matching every single operational value constructor like Add or Mult individually instead of subsuming them within a single match and treat them equally?

If so, then i hope to adress your concerns about the ‘existential hell for a more abstract treatment’ within the next episode, when taking a deeper look at extractors.

Greetings

Mario

• Alex Says:

Hi Mario,

Yes, that’s right, I’m talking about a more OO-ish approach, in which we expect additional implementations of Expression to be brought online at a later date, so Expression has an eval method. Something like this:

abstract class Expression[T] {
def eval(context: Ctx): T
}

abstract class Literal[T](value: T) extends Expression[T] {
def eval(context: Ctx) = value
}

case class Add[T : Numeric](left: Expression[T], right: Expression[T]) extends Expression[T] {
private val num = implicitly[Numeric[T]]
def eval(context: Ctx) = {
import num.ops._
val l = left.eval(context)
val r = right.eval(context)
l + r
}
}

4. […] one is the sequel to our extended sample on representing and resolving arithmetic expressions using algebraic datatypes and pattern […]

5. […] data structure if you will) and how to represent them as an algebraic datatype. Remember our last example, where we introduced a datatype for representing an infinite set of algebraic expressions? […]

6. […] but what about deconstructing lists by pattern matching? As you surely remember from some older episodes, pattern matching is one of the main tools for operating on algebraic datatypes. And since our list […]

7. […] Functional Scala: a little expression language with algebraic datatypes and pattern matching […]