TH's Notes
  • Home
  • Categories
  • Tags
  • Archives

Implement softmax regression for digit recognition

This post will demonstrate how to implement softmax regression for digit recognition.

The detailed derivations of algorithm can be found from this script.

Main workflow

  • Preparing training/validation/testing datasets.
  • Set the weight decay / numerical parameters.
  • Check if the gradients of the loss function are correct.
  • Training model.
  • Estimate the accuracy of predictions.

Ipython notebook

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

from dnn_play.classifiers.softmax import Softmax, softmax_loss, rel_err_gradients
from dnn_play.utils.data_utils import load_mnist
from dnn_play.utils.visualize_utils import display_network


# Plot settings
plt.rcParams['figure.figsize'] = (10.0, 10.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
In [2]:
# Load MNIST data
(X_train, y_train), (X_val, y_val), (X_test, y_test) = load_mnist()
#(X_train, y_train), (X_val, y_val), (X_test, y_test) = load_mnist(n_train=9000, n_val=1000, n_test=1000)

print("X_train shape = {} y_train shape = {}".format(X_train.shape, y_train.shape))
print("X_val   shape = {} y_val  shape = {}".format(X_val.shape, y_val.shape))
print("X_test  shape = {} y_test shape = {}".format(X_test.shape, y_test.shape))
X_train shape = (55000, 784) y_train shape = (55000,)
X_val   shape = (5000, 784) y_val  shape = (5000,)
X_test  shape = (10000, 784) y_test shape = (10000,)
In [3]:
# Sanity check of softmax loss function

# Number of layer units
input_size  = X_train.shape[1] # Dimension of features
n_classes = np.max(y_train) + 1

layer_units = (input_size, n_classes)

# Hyperparameters
reg = 1e-4

# Numerical parameters
max_iters = 400

# Define classifier
clf = Softmax(layer_units)
weights = clf.init_weights()

loss, grad = softmax_loss(weights, X_train, y_train, 0.0)

# Note there are 10 classes.
# As a rough sanity check, our loss should be something close to -log(0.1).
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))
loss: 2.302549
sanity check: 2.302585
In [4]:
# Gradient checking
if rel_err_gradients() < 1e-8:
    print("Gradient check passed!")
else:
    print("Gradient check failed!") 
Gradient check passed!
In [5]:
"""
Training
"""

model, loss_history, train_acc_history, val_acc_history = clf.fit(X_train, y_train, X_val, y_val, 
                                                                  reg=reg, max_iters=max_iters, verbose=True)   
iter:   20, loss: 0.325872, train_acc: 0.908945, val_acc: 0.928800
iter:   40, loss: 0.285137, train_acc: 0.922873, val_acc: 0.937600
iter:   60, loss: 0.272479, train_acc: 0.927382, val_acc: 0.939600
iter:   80, loss: 0.267087, train_acc: 0.930382, val_acc: 0.942600
iter:  100, loss: 0.264715, train_acc: 0.931636, val_acc: 0.940800
iter:  120, loss: 0.263642, train_acc: 0.932600, val_acc: 0.941200
iter:  140, loss: 0.263141, train_acc: 0.932745, val_acc: 0.941200
iter:  160, loss: 0.262935, train_acc: 0.933073, val_acc: 0.942000
iter:  180, loss: 0.262832, train_acc: 0.932982, val_acc: 0.942400
iter:  200, loss: 0.262783, train_acc: 0.933109, val_acc: 0.942400
iter:  220, loss: 0.262759, train_acc: 0.933109, val_acc: 0.942600
iter:  240, loss: 0.262749, train_acc: 0.932927, val_acc: 0.942600
iter:  260, loss: 0.262744, train_acc: 0.932891, val_acc: 0.942400
In [6]:
# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(loss_history)
plt.title('Loss history')
plt.xlabel('Epoch')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(train_acc_history)
plt.plot(val_acc_history)
plt.legend(['Training accuracy', 'Validation accuracy'], loc='lower right')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
Out[6]:
<matplotlib.text.Text at 0x10e838518>
In [7]:
# Visualize the weights 

W0 = model[0]['W']
image = display_network(W0)
plt.imshow(image, cmap = plt.cm.gray)
Out[7]:
<matplotlib.image.AxesImage at 0x10e88e390>
In [8]:
# Make predictions

pred = clf.predict(X_test)

acc = np.mean(y_test == pred)
print("Accuracy: {:5.2f}% \n".format(acc*100))
Accuracy: 92.46% 

In [9]:
# View some images and predictions
n_images = 3
images = X_test[:n_images].reshape((n_images, 28, 28))
pred = clf.predict(X_test[:n_images])

for i in range(n_images):
    plt.subplot(1, n_images, i+1)
    plt.imshow(images[i], cmap = plt.cm.gray)
    plt.title('Predicted digit: {}'.format(pred[i]))
    plt.axis('off')

    

Softmax classifier

In case you are interested in all codes related in this demonstration, please check the repository.

Comments
comments powered by Disqus

  • « Implement a neural network for digit recognition
  • Implement multilayer perceptron for digit recognition »

Published

Dec 23, 2015

Category

Machine learning

Tags

  • cv 16
  • Powered by Pelican. Theme: Elegant by Talha Mansoor