Scala immutable/mutable collections microbenchmarking using JMH

Few days back, I was writing a program for Dijkstra's algorithm. I used mutable HashMap to store all nodes and edges in that program. I have to store around 400k edges and their weight. There are two types of HashMaps in Scala collection. one is mutable and the other one is immutable. I have to choose one of these. To find which one gives more performance among mutable HashMap and immutable HashMap when you have many insertions.

You can check Dijkstra's algorithm implementation on my Github project Scala-Experiment. -Diskstra's solution

To find this, I did microbenchmarking of mutable and immutable HashMaps.

I used JMH(Java Microbenchmarking Harness) through sbt-jmh for benchmarking.

You can check out the code on Github repository: Scala-Experiment.
https://github.com/pgkaila/Scala-Experiments

I used following tools for benchmarking

  • Oracle JDK version 1.8.0_66
  • Scala version 2.11.7
  • sbt version 0.13.9
  • sbt-jmh plugin version 0.2.6 shipped with JMH version 1.11.3

If you are new with sbt, check my previous article on getting started with sbt.
Build your first scala project with sbt

JMH

To use JMH via sbt add sbt-jmh plugin by creating file project/plugin.sbt with the following content.

addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.2.6")

Enable plugin by adding enablePlugins(JmhPlugin) in build.sbt file.

The following code is to benchmark two methods. In both methods, I'm inserting 1000000 entries in HashMap. In one method I used mutable HashMap and in the other one, I used immutable HashMap.
I'm setting output time unit as millisecond and benchmark modes as AverageTime, Throughput and SampleTime using JMH annotations.

More details about diffrent modes and scope is as following.

Modes

Name Description
Mode.Throughput Calculate number of operations in a time unit.
Mode.AverageTime Calculate an average running time.
Mode.SampleTime Calculate how long does it take for a method to run (including percentiles).
Mode.SingleShotTime Just run a method once (useful for cold-testing mode). Or more than once if you have specified a batch size for your iterations (see @Measurement annotation below) – in this case JMH will calculate the batch running time (total time for all invocations in a batch).
Any set of these modes You can specify any set of these modes – the test will be run several times (depending on number of requested modes).
Mode.All All these modes one after another.

Scope

Name Description
Scope.Thread This is a default state. An instance will be allocated for each thread running the given test.
Scope.Benchmark An instance will be shared across all threads running the same test. Could be used to test multithreaded performance of a state object (or just mark your benchmark with this scope).
Scope.Group An instance will be allocated per thread group (see Groups section down below).
package be.piyush.microbenchmarking

import java.util.concurrent.TimeUnit  
import org.openjdk.jmh.annotations._  
import scala.collection.{immutable, _}  
import scala.util.Random

@State(Scope.Thread)
@BenchmarkMode(Array(Mode.AverageTime, Mode.Throughput, Mode.SampleTime))
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class CollectionBenchmarking {

  @Benchmark
  @OperationsPerInvocation(2)
  def mutableHashMap: Unit ={
    val map = new mutable.HashMap[String, Long]()
    val r = new Random()
    r.setSeed(1000L)
    for (i <- 1 to 1000000) {
      val number = r.nextLong()
      map(number.toString) = number
    }
  }

  @Benchmark
  @OperationsPerInvocation(2)
  def immutableHashMap: Unit ={
    var map = new immutable.HashMap[String, Long]()
    val r = new Random()
    r.setSeed(1000L)
    for (i <- 1 to 1000000) {
      val number = r.nextLong()
      map = map.+(number.toString -> number)
    }
  }
}

Run Benchmark

To run benchmarking execute jmh:run -i 10 -wi 10 -f1 -t1 in sbt.

In this command, -i 10 is for 10 iterations, -wi 10 is for 10 warmup iterations, -f1 is for 1 fork and -t1 is to run the benchmark in one thread.

Output

After running this benchmarking, it would generate output like following,

[info] # Run complete. Total time: 00:02:58
[info] 
[info] Benchmark                                  Mode  Cnt    Score     Error   Units
[info] CollectionBenchmarking.immutableHashMap   thrpt   10    0.002 ±   0.001  ops/ms
[info] CollectionBenchmarking.mutableHashMap     thrpt   10    0.004 ±   0.001  ops/ms
[info] CollectionBenchmarking.immutableHashMap    avgt   10  564.104 ± 214.220   ms/op
[info] CollectionBenchmarking.mutableHashMap      avgt   10  216.340 ±  27.696   ms/op
[info] CollectionBenchmarking.immutableHashMap  sample   13  545.784 ± 154.552   ms/op
[info] CollectionBenchmarking.mutableHashMap    sample   21  292.191 ±  48.599   ms/op
[success] Total time: 182 s, completed 21 Mar, 2016 6:13:21 PM


We can see mutableHashMap has more throughput(ops/ms) compare to immutableHashMap. Also, It is taking less average time and sample time compare to immutable HashMap.
It says clearly that mutable collection is faster than immutable when you want to do more insertions.

Comments

comments powered by Disqus