Taking the Slow Train Through the Harz Mountains


Not far away from Braunschweig in the Eastern part of the Harz Mountains lies a large narrow-gauge railway network where steam engines still operate on most services. In order to tour almost the entire network, my colleague and I started our tour in Quedlinburg, which is easily reached by a regional train from Magdeburg. From there, a steam train took us on the Selke Valley Railway to Alexisbad.


After switching to a diesel car in Alexisbad, we arrived at Eisfelder Talmühle where another steam train took us on the Trans-Harz Railway all the way to Drei Annen Hohne, where we changed to the famous Brocken Railway.

Eisfelder Talmühle


Unfortunately, soon after starting the ascent to the Brocken mountain, we entered the clouds so that I could not enjoy the superb view from the mountain over Northern Germany.



After a water stop in Drei Annen Hohne, our steam train took us on the remaining part of the Trans-Harz Railway to its terminal at Wernigerode, another interchange point with Deutsche Bahn’s regional network.

Drei Annen Hohne

Dijkstra in Scala


Dijkstra’s algorithm is a fundamental graph algorithm, which allows to compute the shortest path from a source node to a target node in a directed, weighted graph, i.e. the path with the smallest accumulated weight. In fact, the algorithm does not only compute the shortest path from one node to another, but a shortest-path tree, which describes the shortest path from the root of the tree (the source node) to any node that is reachable from the source.

In an earlier post, I presented a concise implementation of Dijkstra’s algorithm in Clojure, but the algorithm presented there only computes the distance (the price of the shortest path) from the source to any other node, not the shortest-path tree. So let us now look how to implement the algorithm in Scala, this time time computing a shortest-path tree.

We first have to fix a type for our graphs. Since we consider directed, weighted graphs, where every edge carries a price, the simplest way to represent a graph is a function mapping nodes to a set of pairs of the form (node, price). Instead of a set of pairs, we can also use a map from nodes to prices, which allows us to easily compute the price of an edge given a successor node. Hence, let us start with the following type definition, which is parameterized by the node type N.

type Graph[N] = N => Map[N, Int]

Hence, given an instance g of type Graph[N] and a node n, the map returned by g(n) maps successor nodes of n to the price of the corresponding edge.

The output of Dijkstra’s algorithm is a tree, which can be described as a map from nodes to their predecessor in the tree. It would be nice to also include the distances from the source in the output, so let us also return a map from nodes to their distance. Hence, a possible signature for our implementation of Dijkstra’s algorithm is:

def dijkstra[N](g: Graph[N])(source: N):
(Map[N, Int], Map[N, N])

From this we can easily define a function that optionally returns the shortest path from a source node to a target node as follows:

def shortestPath[N](g: Graph[N])(source: N, target: N):
Option[List[N]] = {
  val pred = dijkstra(g)(source)._2
  if (pred.contains(target) || source == target)
  else None

Here, iterateRight is a function that builds up a list from the end by repeatedly applying a function returning an option until it returns None:

def iterateRight[N](x: N)(f: N => Option[N]): List[N] = {
  def go(x: N, acc: List[N]): List[N] = f(x) match {
    case None => x :: acc
    case Some(y) => go(y, x :: acc)

  go(x, List.empty)

Since Scala is a hybrid language, which allows to write imperative code as well as functional code, it is perfectly fine to implement Dijkstra’s algorithm in an imperative fashion using mutable data structures. Such an implementation looks as follows:

def dijkstra1[N](g: Graph[N])(source: N):
(Map[N, Int], Map[N, N]) = {
  val active = mutable.Set(source)
  val res = mutable.Map(source -> 0)
  val pred = mutable.Map.empty[N, N]
  while (active.nonEmpty) {
    val node = active.minBy(res)
    active -= node
    val cost = res(node)
    for ((n, c) <- g(node)) {
      val cost1 = cost + c
      if (cost1 < res.getOrElse(n, Int.MaxValue)) {
        active += n
        res += (n -> cost1)
        pred += (n -> node)
  (res.toMap, pred.toMap)

This implementation is similar to the pseudocode given on Wikipedia, but omits the unnecessary initialization phase. Instead, nodes are added to the set of active nodes only when they are discovered.

What is the problem with this solution? First of all, Scala.Predef defines Map as an alias for scala.collection.immutable.Map, so in the end we need to convert our mutable maps res and pred to immutable maps. Second, in functional programming we consider it good style to avoid loops and mutable data structures. So, let’s try to come up with a more functional solution. Instead of using mutable data structures and modifying them inside a loop, we can convert the loop to a tail-recursive function which we call with the new values of our (now immutable) data structures.

def dijkstra2[N](g: Graph[N])(source: N):
(Map[N, Int], Map[N, N]) = {
  def go(active: Set[N], res: Map[N, Int], pred: Map[N, N]):
  (Map[N, Int], Map[N, N]) =
    if (active.isEmpty) (res, pred)
    else {
      val node = active.minBy(res)
      val cost = res(node)
      val neighbours = for {
        (n, c) <- g(node) if
          cost + c < res.getOrElse(n, Int.MaxValue)
      } yield n -> (cost + c)
      val active1 = active - node ++ neighbours.keys
      val preds = neighbours mapValues (_ => node)
      go(active1, res ++ neighbours, pred ++ preds)

  go(Set(source), Map(source -> 0), Map.empty)

The code is a straightforward adaption of the previous imperative solution. Inside the go function, we replaced the side-effecting for loop by a for expression whose result we use in the recursive call.

The code is looking good, but we used Set’s minBy method to identfy the active node with the least distance from the source. Since regular sets are unorderd, this method has to look at every node in the set in order to identify the one with the least distance; this becomes a problem if in a large proportion of calls to go many nodes are active. In order to demonstrate the problem, let us test our code on some generic graphs. The following function generates a binary tree of the given depth where every leaf of the tree has an edge leading back to the root, so that the graph becomes strongly connected:

def tree(depth: Int): Graph[List[Boolean]] = {
  case x if x.length < depth =>
    Map((true :: x) -> 1, (false :: x) -> 2)
  case x if x.length == depth => Map(Nil -> 1)
  case _ => Map.empty

Since going to the left successors costs 1 but going to the right successor costs 2, the shortest path from List(true), the left successor of the root Nil, going back to the root should go via the left branch of the tree, so that the distance from List(true) to Nil should be equal to the depth of the tree. We can confirm this in the REPL.

scala> val t = tree(10)
t: Graph[List[Boolean]] = <function1>
scala> dijkstra1(t)(List(true))._1(Nil)
res1: Int = 10
scala> dijkstra2(t)(List(true))._1(Nil)
res1: Int = 10

Now, if you choose greater numbers for the depth, you will notice that both implementations will slow down very quickly. For instance, on my machine dijkstra1 and dijkstra2 return on tree(14) only after around 30 seconds and on tree(15) only after more than two minutes!

How can we make our code run faster? In the previous post, I presented priority maps and their implementation in Scala. Priority maps can be used like regular maps but offer fast access to entries with a small value. In particular, we can call head on a priority map to access the entry with the smallest value in (almost) constant time. In order to profit from priority maps in our implementation of Dijkstra’s algorithm, we only need to adapt the type of active and replace the call to minBy by a call to head. Since nodes in the priority map also carry their distance from the source, we can also delay putting a node into res until it is removed from the priority map:

def dijkstra3[N](g: Graph[N])(source: N):
(Map[N, Int], Map[N, N]) = {
  def go(active: PriorityMap[N, Int], res: Map[N, Int], pred: Map[N, N]):
  (Map[N, Int], Map[N, N]) =
    if (active.isEmpty) (res, pred)
    else {
      val (node, cost) = active.head
      val neighbours = for {
        (n, c) <- g(node) if !res.contains(n) &&
          cost + c < active.getOrElse(n, Int.MaxValue)
      } yield n -> (cost + c)
      val preds = neighbours mapValues (_ => node)
      go(active.tail ++ neighbours, res + (node -> cost), pred ++ preds)

  go(PriorityMap(source -> 0), Map.empty, Map.empty)

Now, running dijkstra3 on tree(15) takes less than a second, and we can even run the code on tree(20)—a graph which contains more than a million nodes—in less than 30 seconds.

If you want to play with this code, it is available on GitHub.

Priority Maps in Scala


In many algorithms—including Dijkstra’s algorithm for finding the shortest path between two vertices in a weighted graph—we need access to a data structure that maps elements to numbers or priorities where we can efficiently obtain and remove the element(s) with the least priority and add new elements as the algorithm progresses. Such a data structure is usually called a priority queue and often implemented using a heap. While a heap implementation is very efficient, it has the drawback that some operations—such as determining the priority of an arbitrary element or finding the element with the greatest priority—still require linear time. Moreover, heaps are inherently mutable and therefore not thread-safe.

In functional programming, we usually prefer immutable (persistent) data structures where every update results in a new version of the data structure with the old version still accessible. For Clojure, Mark Engelberg has created an implementation of what he calls a priority map, which is an immutable data structure that combines the functionality of priority queues with the fast lookup functionality of maps. Alternatively, priority maps can be viewed as regular maps but with entries sorted by value, so iterating over a priority map produces key-value pairs with ascending values. In Scala, another functional programming language for the JVM, we have immutable maps and immutable sorted maps, where entries are sorted by keys, but no immutable map-like data structure where entries are sorted by value.

Without further ado, I present to you scala-prioritymap, which is an implementation of immutable priority maps in Scala and fully compatible with the Scala collections API. As such, priority maps extend the Map trait, and any operation that is defined on maps also works on priority maps, but they respect the (implicit) ordering on values.

scala> import de.ummels.prioritymap._
import de.ummels.prioritymap._

scala> val m = PriorityMap('a' -> 1, 'b' -> 2, 'c' -> 0)
m: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(c -> 0, a -> 1, b -> 2)

scala> m.toSeq
res1: Seq[(Char, Int)] = ArrayBuffer((c,0), (a,1), (b,2))

If there is no implicit ordering defined for your value type or you want to use a different ordering, you can supply another ordering as an extra argument to the factory method. For instance, we can use the reverse ordering on integers instead of the natural ordering.

scala> val ord = Ordering.by[Int,Int](x => -x)
ord: scala.math.Ordering[Int] =

val m2 = PriorityMap('a' -> 1, 'b' -> 2, 'c' -> 0)(ord)
m2: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(b -> 2, a -> 1, c -> 0)

If you remove a key/value pair or add one with the right type, you get back a new priority map.

scala> m - 'a'
res2: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(c -> 0, b -> 2)

scala> m + ('b' -> -1)
res3: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(b -> -1, c -> 0, a -> 1)

Care has to be taken if a key/value pair with a value of a different type is added. Since—unlike regular maps—priority maps are not covariant in their value type, it is for example not possible to add a value of type String to a priority map with values of type Int and get back another priority map. In fact, covariance is ruled out by the possibility that there may be no implicit ordering defined for the common supertype (or one that is not compatible with the ordering defined for the original value type). Since priority maps are compatible with regular maps, adding an incompatible binding therefore results in a regular map instead of a priority map.

scala> m + scala> m + ('a' -> "Hello")
res4: scala.collection.immutable.Map[Char,Any] =
Map(a -> Hello, b -> 2, c -> 0)

Obtaining or removing an entry with the minimal or maximal value can be done via the standard methods head, tail, last and init.

scala> m.head
res5: (Char, Int) = (c,0)

scala> m.tail
res6: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(a -> 1, b -> 2)

scala> m.last
res7: (Char, Int) = (b,2)

scala> m.init
res8: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(c -> 0, a -> 1)

Higher-order methods like map and filter can be used as well and return priority maps if possible.

scala> m map { case (k,v) => (k,-v) }
res9: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(b -> -2, a -> -1, c -> 0)

scala> m filter { case (k,v) => v % 2 == 0 }
res10: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(c -> 0, b -> 2)

Priority maps also offer methods from, until and range which return a submap where all entries have values inside the given range.

res11: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(a -> 1, b -> 2)

scala> m.until(2)
res12: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(c -> 0, a -> 1)

scala> m.range(1,2)
res13: de.ummels.prioritymap.PriorityMap[Char,Int] =
PriorityMap(a -> 1)

If you want to give scala-prioritymap a spin, you can include it in your SBT project by adding the following line to your build file:

libraryDependencies += "de.ummels" %% "scala-prioritymap" % "0.1.0"

Or if you want to check out the source, take a look on GitHub.

In the next post, we’ll see how we can use priority maps to implement Dijkstra’s algorithm in Scala.