Following my previous post on neural network I thought it would be nice to see how to implement these concepts with tensorflow.
Tensor flow is a new library developed by google. It is aimed at building fast and efficient machine learning pipelines.
Actually it is based on the computation graph that we discussed earlier.
It provides a C++ and Python interface and can run on CPU or GPU (linux only).
Enough talking let’s get started. If you have docker installed on your machine (if not I highly recommend doing so) you can just spawn up a new container with tensor flow already installed.
Start the docker container with:
docker run -i -t gcr.io/tensorflow/tensorflow /usr/bin/python
This will connect you to the docker container and open a Python shell.
Next we need to import tensor flow and numpy
import tensoflow as tf import numpy as np
In tensorflow all variables are represented as tensors. A tensor is just a n-dimensions array. So it can be used to represents anything from a scalar (0-dimension), a vector (1-dimension), a matrix (2-dimensions) and more.
Moreover each variable is also assigned a type (e.g. Float, Double, Int32, …)
Let’s start with something easy: declaring 2 scalar variables:
a = tf.constant(2) b = tf.constant(3)
and then defining a multiplication operation that multiplies these 2 variables:
c = a * b
At this point c is not equal to 6 because it didn’t compute the result of the multiplication. c represents the multiplication operation of variable a by variable b.
To get the results we need to evaluate this operation. In tensor flow this is done inside a session:
with tf.Session() as sess: result = sess.run(c) print(result)
Here we have used constant to declare a and b because their value didn’t change.
We could have done the same with a variable. The difference is that the value of a variable might change over time. Before being used in a computation it must be initialised.
with tf.Session() as sess: a = tf.constant(4) x = tf.Variable(5) x.initializer.run() c = a * b print(sess.run(c))
Ok but using a variable it more complicated than using a constant and it doesn’t do much in this example.
So let’s see how we can update the variable (change its state) over time
# create our variable x with initial value 1 x = tf.Variable(1) two = tf.constant(2) # multiply x by 2 doubled_val = tf.mul(two, x) # assign the result to x double_op = tf.assign(x, doubled_val) # we are now ready to run the double operation # but we need to initialise our variable first init_op = tf.initialize_all_variables() with tf.Session() as sess: sess.run(init_op) print(sess.run(x)) # double x, 5 times for i in range(5): sess.run(double_op)
Much better. Things start to look a bit more interesting. I think we are now ready to implement our first neuron. But before we get starting there is one more thing we need to know: Placeholders. A Placeholder is like a constant but its value will be provided at runtime when we call sess.run().
# input x x = tf.placeholder(tf.int32, shape=[4, 2]) # bias b b = tf.constant(-1) # weight w w = tf.constant([[1], [1]]) u = tf.matmul(x, w) + b y = tf.nn.relu(u) # defines our set of inputs inputs = [[0, 0], [0, 1], [1, 0], [1, 1]] init_op = tf.initialize_all_variables() with tf.Session() as sess: sess.run(init_op) output = sess.run(y, feed_dict={x: inputs}) print(output)
Congrats! You’ve just implemented a neuron the performs the AND function.
So now let’s see how we can turn this neuron into a whole network.
Luckily TensorFlow provides us with all the machinery. We just need to define the computation graph for our cost function and the network topology. Then tensor flow will help us to train our model and update its weights with back propagation.
So now let’s try to implement a real neural network for the XOR function.
Our network contains one hidden layer with 2 neurons
We can model it in tensor flow
import tensorflow as tf import numpy as np # our input data # None means we don't know the number of rows yet x = tf.placeholder(tf.float32, shape=[None, 2]) # the hidden layer wh = tf.Variable(tf.random_normal([2, 2])) bh = tf.Variable(tf.random_normal([2])) h = tf.nn.relu(tf.matmul(x, wh) + bh) # the output layer wo = tf.Variable(tf.random_normal([2, 1])) bo = tf.Variable(tf.random_normal([1])) # No activation function for the output layer y = tf.matmul(h, wo) + bo # The expected output values y_ = tf.placeholder(tf.float32, shape=[None, 1]) # We need a cost function to measure the performance of our network # Here we use a simple mean square cost = tf.reduce_mean(tf.square(y_ - y)) # Now we're ready to train our network train = tf.train.GradientDescentOptimizer(0.01).minimize(cost) # Initialise everything init = tf.initialize_all_variables() # And start the session sess = tf.Session() sess.run(init) # Our input data input = [[0, 0], [0, 1], [1, 0], [1, 1]] output = [[0], [1], [1], [0]] # Train our model for i in range(1000): sess.run(train, feed_dict={x: input, y_: output}) # Check that it works as expected sess.run(tf.round(y), feed_dict={x: input}) # Print the network parameters (weights and biases) print "hidden layer", sess.run(wh), sess.run(bh) print "output layer", sess.run(wo), sess.run(bo)
Congratulations for your first neural network!
As a wrap-up here are some tricky things to pay attention to:
- Carefully check the dimension of the variables in tensorflow. I’ve run into these issues a number of times
- initialise the network parameters randomly (it didn’t work with zeroes)
- Choose the gradient optimisation step carefully (too big values made the optimisation diverged)
That’s pretty cool to train a network however this is also way trickier than I expected. Especially getting the initialisation and optimisation step correctly can be challenging as the results were far away from my expectations although the implementation was correct.
This is a very basic network, I highly encourage you to head over tensorflow website and follow the tutorials (The MNIST tutorial is very detailed and is a nice introduction to neural network).