I have spent years programming in Java and one thing (among others) that I found frustrating is the lack of mathematical libraries (not to say Machine learning framework) on the JVM.
In fact if you’re a little interested in machine learning you’ll notice that all the cool stuffs are written in C++ (for performance reasons) and most often provide a Python wrapper (because who wants to program in C++ anyway).
Yes I had to learn Python (even though it feel weird at the beginning) but yet I couldn’t find anything equivalent in the Java world.
Probably the main reason behind it is the lack of efficient parallelism available on the JVM. The lowest parallelism abstraction on the JVM is the thread level and it doesn’t allow to fully exploit the available hardware (e.g. GPUs).
Fortunately, we now have nd4j (N-Dimensions For Java) or – as its creators present it – the “numpy” for the JVM.
So the question is how these guys can leverage all the power of the CPU/GPUs and still run it on the JVM ? Well they use a clever trick: JNI. JNI is the Java Native Interface and has been around for years. It allows to hook the JVM together with a “native” library. And they did just that. They developed a native C++ engine – libnd4j – that can exploit the available hardware and used JNI bindings to integrate it with the JVM.
Cool, so let’s see it in action but first things first. We need to install it.
Pretty easy you just need to declare a maven dependency in your project. With sbt it looks like this
libraryDependencies += "org.nd4j" % "nd4j-native-platform" % "0.5.0"
If you now look at what’s actually pulled down with this dependency, you’ll notice that the jar include the libnd4j.so (the C++ library) and the backends (lib4jopenblas.so).
There is also a CUDA backend available but with a different dependency:
libraryDependencies += "org.nd4j" % "nd4j-cuda-7.5" % "0.5.0"
You probably noticed that nd4j doesn’t come alone. It has also added another dependency: JavaCPP. JavaCPP parses C/C++ header files and auto-generate the corresponding JNI bindings. JavaCPP also manages the off-heap memory where the data (tensors/matrices/…) are stored.
Now that we’re all setup let’s see how the code looks like with a very basic example:
Let’s create some simple matrices:
val ones = Nd4j.create(3, 2) // 3 x 2 matrix filled with 0s ones.addi(1.0) // addi stands for add in-place println(s"ones: \n$ones") val matrix = Nd4j.create( Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0), // the values Array(3, 2), // dimensions: 3 x 2 'c' // 'c' c-style (row major) ) println(s"matrix: \n$matrix") val result = ones.transpose mmul matrix // onesT x matrix println(s"result: \n$result")
The output is “numpy” formatted:
ones: [[1.00, 1.00], [1.00, 1.00], [1.00, 1.00]] matrix: [[1.00, 2.00], [3.00, 4.00], [5.00, 6.00]] result: [[ 9.00, 12.00], [ 9.00, 12.00]]
Nd4j fulfils a need that has long been missing on the JVM. It seems well designed – and deserve its title of the “Numpy for the JVM”. The performances seem pretty good too (I just did a basic comparison of matrix multiplication using nd4j-native with a pure scala implementation).
Nd4j brings more computation power to the JVM world allowing the developer to leverage the full CPU/GPU potential. It seems to me, that nd4j was the missing bit to really quick off real machine-learning projects on the JVM. I’ll for sure keep an eye on dl4j and see how it benefit from nd4j.