Nd4j – Numpy for the JVM

I have spent years programming in Java and one thing (among others) that I found frustrating is the lack of mathematical libraries (not to say Machine learning framework) on the JVM.

In fact if you’re a little interested in machine learning you’ll notice that all the cool stuffs are written in C++ (for performance reasons) and most often¬†provide a Python wrapper (because who wants to program in C++ anyway).
Continue reading “Nd4j – Numpy for the JVM”

TF-IDF

The idea from this blog post came after finishing the lab on TF-IDF of the edx Spark specialisation courses.

EDX - CS110x - Big data analysis with Spark

In this course the labs follow a step-by-step approach where you need to write some lines of code at every step. The lab is very detailed and easy to follow. However I found that focusing on a single step at a time¬†I was missing the big picture of what’s happening overall.
Continue reading “TF-IDF”