PCA stands for Principal Component Analysis. It is a mathematical concept which I am not going to explain in great details here as there are already plenty of books on the subject. Rather I would like to give a practical feeling of what it does and when to use it.
The idea behind PCA is that we represents the data using different axis. For example let’s imagine that we are dealing with accelerometer data from a smart watch sensor. This data comes in the form of (x, y, z) coordinates computed every 20ms.
Depending on how you move your arm the (x,y,z) values will change over time. In a 10s interval 500 (x, y, z) coordinates are computed and each axis holds some variations of data.
Continue reading “PCA: Principal Component Analysis”
If you ever want to get serious about data science soon or later you’re going to have your hands on some Python code.
If you are like me – coming from the JVM world – you probably think “yeah, Python … should be cool!!”. Everybody is using it, the syntax looks concise, and the machine learning ecosystem is pretty dense in python: theano, neon, scikit-learn, …
..So yeah let’s get started… and if you’ve never written any Python code before I’ll tell you it’s not going to be that fun. Continue reading “Python … wtf !!?”
“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.”
As you can see there are 2 worlds out there: the world of math and statistics and the world of software engineering.
Each of these worlds thinks he is better than the other (which is true in a sense) but the truth is that they also need each other to achieve good results.
One cannot harvest huge amount of data without a proper system and the other who knows how to build such systems doesn’t know how to extract valuable information from so much data.
Coming from a software engineering background I intend to publish some articles as I go along through this journey.