Mold at OOPSLA'14

Thu, Oct 9, 2014

In a couple of weeks, Cosmin Rădoi will be presenting our work Translating Imperative Code to MapReduce at OOPSLA 2014. (Cosmin led this project, and the key ideas and system implementation are due to him.) The paper describes a tool Mold that can automatically translate small, sequential, imperative Java programs manipulating array-like data structures into efficient, parallel MapReduce programs. For example, consider the following sequential implementation of the “word count” program (the “Hello world!” of MapReduce):

Map<String,Integer> wordCount(List<String> docs) {
  Map<String,Integer> m = new HashMap<>();
  for (int i = 0; i < docs.size(); i++) {
    String[] split = docs.get(i).split(" ");
    for (int j = 0; j < split.length; j++) {
      String w = split[j];
      Integer prev = m.get(w);
      if (prev == null) prev = 0;
      m.put(w, prev + 1);
    } 
  }
  return m;
}

Mold can automatically translate the above to the following Scala code:

docs
  .flatMap({ case (i, v9) => v9.split(" ") })
  .map({ case (j, v8) => (v8, 1) })
  .reduceByKey({ case (v2, v3) => v2 + v3 })

This looks a lot like the word count program for Apache Spark (scroll down to “Word Count”). Mold provides multiple backends, so that the above code can be run using either Spark or the Scala parallel collections library.

A big challenge here is the same difficulty faced by parallelizing compilers, namely the need to execute loop iterations in parallel without breaking the program. For the Java program above, simply running the inner loop iterations in parallel may not be safe, since different iterations may update the count of the same word, causing a data race. MapReduce gets around this limitation via its “shuffle” operation, which allows for grouping inputs by some key and then applying the reducer to different groups in parallel. For word count, the keys are the words themselves, so, e.g., counts can be computed in parallel for words starting with letters ‘a’–‘c’, ’d’–‘f’, etc. Part of Mold’s power is in its ability to automatically find fruitful points to introduce these “shuffle” operations to extract more parallelism.

At a high level, Mold works as follows:

Mold first translates the input program into a functional intermediate form via Array SSA. While the correspondence between SSA and functional programming is well known, our translation is different in that it generates fold operations to preserve the structure of loops.
Given the functional IR, Mold employs a term rewriting system to search a large space of semantically-equivalent programs. The rewrite rules aim to introduce (parallel) map and shuffle operations wherever possible, exposing the parallelism in the input program. The final output program is chosen based on a heuristic cost function that can be customized to obtain good performance for different execution environments (cluster, multi-core, etc.).

Thus far, we’ve only run Mold on small input programs (and some challenges must be overcome to scale it up), but the results were promising, with performance comparable to hand-written MapReduce programs in most cases. Mold’s initial success shows how functional representations can be useful for parallelization even when the original program is imperative.

For all the technical details, check out the paper. Or, even better, if you’ll be at OOPSLA, attend Cosmin’s talk and ask him for details!

Transitive Closure

Mold at OOPSLA'14