Keras – Tensorflow and Theano abstraction

As we’ve seen in the Tensorflow introduction having access to the computation  is a powerful feature. We can define any operation we’d like and tensor flow (or Theano) will compute the gradient and perform the optimisation for us. That’s great!

However if you always define the same kind of operation you’ll eventually find this approach a bit tedious. This is where we need a higher level of abstraction that allows us to define our neural net in terms of layer and not in terms of operations.

For this purpose I’d like to introduce Keras. Keras runs on top of Theano or Tensorflow and allows to quickly and easily define a neural net. For this purpose I am going to run a comparison between the basic MNIST tutorial provided by tensorflow and implementing the same simple network with Keras.

The MNIST dataset is a collection of images of handwritten digits. The dataset is available on Yann LeCun’s website. In the code below the dataset is automatically downloaded using tensorflow utility code.

Following the Tensorflow tutorial we are going to define a neural network with a single layer in order to classify the digits images. First let’s see how to code looks like using Tensorflow (This code is directly taken from the Tensorflow tutorial).

import tensorflow as tf
import numpy as np

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Neural network definition
x = tf.placeholder(tf.float32, [None, 784]) # Input layer of 28x28 images 
W = tf.Variable(tf.zeros([784, 10])) # Turns 28x28 images into
b = tf.Variable(tf.zeros([10])) # Bias
u = tf.matmul(x, W) + b
y = tf.nn.softmax(u)

# Labels
y_ = tf.placeholder(tf.float32, [None, 10]) # known outcomes

# Loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

# Training operation
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)
for i in range(1000):
 batch_xs, batch_ys = mnist.train.next_batch(100)
 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Evaluation
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

So how many lines do we have here? 17 lines (excluding imports and dataset loading).

Not let’s see how to implement the same network with Keras. But first thing first let’s install Keras. Well it can be install with regular Python package managers:

sudo pip install keras

Good! Now we can implement our network with Keras:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(input_dim=784, output_dim=10))
model.add(Activation("softmax"))

sgd = SGD(lr=0.5, momentum=0.0, decay=0.0, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.fit(mnist.train.images, mnist.train.labels, nb_epoch=1, batch_size=100)

loss_and_metrics = model.evaluate(mnist.test.images, mnist.test.labels)
print(loss_and_metrics[1]) # Accuracy

Here everything fits in less than 10 lines. Our simple network definition is only 3 lines of code and you can even choose to run it using Theano or Tensorflow just by changing a variable in Keras configuration.