How does my tree look in Dotty ?
I wanted to learn more about Dotty, the new compiler that will be used for Scala 3 so I thought of using it to solve a small problem.
I decided to use an exercise I use with candidates for a Scala position as a starting point for discussion. It consists in modelling a binary tree and writing a function that acts on it.
The problem
First I ask the candidate to model a binary tree, similar to the one the figure below:
And then we work together on how to write a function that adds a unique identifier to all nodes of the tree, like in the following example. I have used arbitrary integers to uniquely identify the nodes:
A few things we can explore with this exercise are:
- How to model the tree ?
The main two different approaches are either to use a terminal Leaf node and a node that has to have a left and right branch, or use a node with one optional left branch and one optional right branch). It is a good way to discuss when to use optional (and when not to use them !).
- Shall we use a case class or a normal class ? Why choose one other the other ?
A good opportunity to discuss immutability and its impact on data structures.
- How to store any type in the tree ? There is possibility here to discuss the use of the ‘Any’ type versus generics.
- When thinking of the function that adds the id, which signature does it have ?
If we store a value in the tree using generics, then we can return a tree which contains a tuple made of the id and the value.
Here it is possible to discuss more advanced use of generics and how comfortable is the person with them.
This is probably the most ‘functional’ solution. We could also use a trait as well which contains 2 fields, one for the id and one for the value and return a tree which value has to implement the trait. That could be an opportunity to discuss ‘anonymous’ objects.
- Then when implementing the actual function, it is interesting to see how the person thinks recursively.
- Usually the easiest way to add a unique id is to choose an integer.
The main difficulty is how to make sure each the same integer does not get reused, which means first ids need to be allocated to one branch, usually the left one and then the right one which means the algorithm cannot be parallelized.
We could imagine using other solutions for the id, like a string describing the position of the node in the tree (like ‘llr’ for Left-Left-Right), which main advantage is that it can be parallelized.
We can then discuss disadvantages, like with this solution if the position of the nodes cannot be changed, i.e. it is possible to add or remove nodes but if the tree is rebalanced then the id of the nodes will change (which may or may not be a problem depending on how the ids are used).
Finally, one implementation
The main difference between the Scala solution and the Dotty one is how the tree is modelled. In both cases I use algebraic data types.
The product is represented as a case class in both Scala and Dotty.
Dotty uses the ‘enum’ keyword to represent the sum
enum Tree[+T] {
case Branch(v: T, l: Tree[T], r: Tree[T])
case Leaf
}
As explained by Martin Odersky, using enums allows a lighter syntax: in Scala 2.x, the trait used for the product has to be ‘sealed’ and the case classes ‘final’.
sealed trait Tree[T]
final case class Branch[T](v: T, l: Tree[T], r: Tree[T]) extends Tree[T]
final case class Leaf[T]() extends Tree[T]
The actual implementation of the function is similar in Scala and Dotty:
def addId[T](t:Tree[T]) : Tree[(T,Int)] = addId(t, 0)._1 private def addId[T](t: Tree[T], i:Int) : (Tree[(T,Int)], Int) = t match {
case Leaf => (Leaf, i)
case Branch(v, l, r) => {
val newLeft = addId(l, i)
val newRight = addId(r, newLeft._2)
(Branch((v, newRight._2), newLeft._1, newRight._1), newRight._2 + 1)
}
}
The only difference is that in Scala, to match and construct ‘Leaf’ we need to add parenthesis:
case Leaf() => (Leaf(), i)
Instead of returning the index of the id, it is possible to deduce it from the Tree that is returned but it makes the code a more complex (Scala 2.13 implementation):
def addId[T](t: Tree[T]) : Tree[(T,Int)] = addId(t, 0) def getId[T](t: Tree[(T,Int)], i:Int) = t match {
case Leaf() => i
case Branch(v, _, _) => v._2
} def addId[T](t: Tree[T], i: Int) : Tree[(T,Int)] = t match {
case Leaf() => Leaf()
case Branch(v, l, r) => {
val newLeft = addId(l, i)
val leftIndex = getId(newLeft, i)
val newRight = addId(r, leftIndex)
val rightIndex = getId(newRight, leftIndex)
Branch((v,rightIndex + 1), newLeft, newRight)
}
}
Enums are one of the rare language feature which was present in Java but not in Scala. Java has parameterized enum since Java 5 while in Scala 2.x they are implemented in the standard library.
Enums in Dotty are a useful addition, they will make building ADT based data structures clearer and easier to read.
Final thoughts
What I learnt from implementing this algorithm is that the Dotty compiler is noticibly faster than the Scala 2.x one and that the team behind Scala 3 is trying to hard to make the language easier to use and more consistent but in a evolutionary manner.
Links:
- Scala example available here: https://github.com/benoitpas/dotty-tree
- Dotty example available here: https://github.com/benoitpas/dotty-tree
- The story on my github blog: https://benoitpas.github.io/Dotty/