Today to conclude my series on neural network I am going to write down some guidelines and methodology for developing, testing and debugging a neural network.
As we will see (or as you already experienced) implementing a neural network is tricky and there is often a thin line between failure and success – between something that works great and something making absurd predictions.
The number of parameters we need to adjust is just great: from choosing the right algorithm, to tuning the model hyper-parameters, to improving the data, ….
In fact we need a good methodology and a solid understanding of how our model works and what is the impact of each of its parameters.
Continue reading “Neural network implementation guidelines”
After introducing the convolutional neural networks I continue my serie on neural networks with another kind of specialised network: the recurrent neural network.
The recurrent neural network is a kind of neural network that specialises in sequential input data.
With traditional neural network sequential data (e.g. time series) are split into fixed-sized windows and only the data points inside the window can influence the outcome at time t.
With recurrent neural network the network can remember data points much further in the past than a typical window size.
Continue reading “Recurrent Neural Network”
Convolutional Neural Networks are a kind of network inspired by the cats’ visual cortex.
A cat visual cortex is made of 2 distinct type of cells:
- simple cells which specializes into edge detection.
- complex cells with larger receptive field which are sensitive to a small region of the visual field and are less sensitive to the exact position of edges.
Convolutional neural network are inspired by the latter type of cells. Each neuron is sensitive to a small region of the input data and less to a specific position of a pattern.
Continue reading “Convolutional Neural Network”
As we’ve seen in the Tensorflow introduction having access to the computation is a powerful feature. We can define any operation we’d like and tensor flow (or Theano) will compute the gradient and perform the optimisation for us. That’s great!
However if you always define the same kind of operation you’ll eventually find this approach a bit tedious. This is where we need a higher level of abstraction that allows us to define our neural net in terms of layer and not in terms of operations.
Continue reading “Keras – Tensorflow and Theano abstraction”
Today I continue my neural network post series with some considerations on neural network implementation.
So far we covered what is a neural network and how it works but we are still left with numerous choices regarding its design.
How many layers should we use, how many units (neurons) in each layer, which activation functions, which cost function, … ? There are so many questions and choices to make that it has bothered me for quite some time now.
If you scroll the web you may find some advice on these questions. But this is it – you can only get advice as there is no clear answers. It’s just trial and errors so you’d better try for yourself and see how different designs perform on your problem.
Continue reading “Neural network design”
Following my previous post on neural network I thought it would be nice to see how to implement these concepts with tensorflow.
Tensor flow is a new library developed by google. It is aimed at building fast and efficient machine learning pipelines.
Actually it is based on the computation graph that we discussed earlier.
It provides a C++ and Python interface and can run on CPU or GPU (linux only).
Continue reading “Tensorflow introduction”
Machine learning applications widespread every day in many domains. One of today’s most powerful techniques is the neural network. This technique is employed in many applications such as image recognition, speech analysis and translation, self-driving cars, etc…
In fact such learning algorithms have been known for decades. But only recently it has become mainstream supported by the increase in computation power (GPU) and memory usage (SSD) which allow us to run these algorithms over billions of samples.
Neural network can represent a wide range of complex functions making it an algorithm of choice in many domains. However training such algorithms is complex and it’s only the recent increase in computation power and fast data access that allowed to exploit the full potential of this technique.
Continue reading “Neural Network”
k-means is a clustering algorithm which divides space into k different clusters.
Each cluster is represented by its centre of mass (i.e. barycentre) and data points are assigned to the cluster with the nearest barycentre.
The learning algorithm starts by choosing k random points. Each of these is the centre of mass of a cluster. Then we iterate over a sequence of assignation phases and an update phases until we reach stability (i.e. the clusters’ barycentres stop moving).
Continue reading “k-means clustering”
When you train several models over a dataset you need a way to compare the model performances and choose the one that best suites your needs.
As we will see there are different ways to compare the results and then pick the best one.
Let’s start with what scores we can get out of the training process. Assuming we are running a classification model with 2 possible outcomes, then the model performance can be summarised with 4 figures known as the confusion matrix.
These 4 figures are:
- TP – True positive rate: The number of samples correctly marked as positive
- TN – True negative rate: The number of samples correctly marked as negative
- FP – False positive rate: The number of samples incorrectly marked as positive (aka type 1 error)
- FN – False negative rate: The number of samples incorrectly marked as negative (aka type 2 error)
Continue reading “Confusion matrix”
The k-Nearest Neighbours is based on a simple idea: similar points tend to have similar outcomes.
Therefore the idea is to memorise all the points in the dataset. The prediction for a new entry is made by finding the closest point in the dataset. Then the prediction for the new entry is simply the same outcome as the value associated to its closest point.
If 2 points are close enough so should be their outcomes.
The name k-NN comes from the fact that you can look for the k closest points and compute (e.g. average) the outcome of the new point from the outcomes of the k-nearest points.
Continue reading “k-Nearest Neighbours”