This post will demonstrate how to implement multilayer perceptron for digit recognition.
The detailed derivations of algorithm can be found from this script.
Main workflow
- Preparing training/validation/testing datasets.
- Set the weight decay / numerical parameters.
- Check if the gradients of the loss function are correct.
- Training model.
- Estimate the accuracy of predictions.
Ipython notebook
In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from dnn_play.classifiers.mlp import MLP, mlp_loss, rel_err_gradients
from dnn_play.utils.data_utils import load_mnist
from dnn_play.utils.visualize_utils import display_network
# Plot settings
plt.rcParams['figure.figsize'] = (10.0, 10.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
In [2]:
# Load MNIST data
(X_train, y_train), (X_val, y_val), (X_test, y_test) = load_mnist()
#(X_train, y_train), (X_val, y_val), (X_test, y_test) = load_mnist(n_train=900, n_val=100, n_test=100)
print("X_train shape = {} y_train shape = {}".format(X_train.shape, y_train.shape))
print("X_val   shape = {} y_val  shape = {}".format(X_val.shape, y_val.shape))
print("X_test  shape = {} y_test shape = {}".format(X_test.shape, y_test.shape))
In [3]:
# Number of layer units
input_size  = X_train.shape[1] # Dimension of features
hidden_size_L1 = 200
output_size = np.max(y_train) + 1 # Number of classes
# Network configuration
layer_units = ((input_size, hidden_size_L1, output_size))
# Hyperparameters
reg = 1e-4 # Regulation, weight decay
# Numerical parameters
max_iters = 400
# Define the classifier
clf = MLP(layer_units)
# Initialize weights
weights = clf.init_weights()
loss, grad = mlp_loss(weights, X_train, y_train, 0.0)
# Note there are 10 classes.
# As a rough sanity check, our loss should be something close to -log(0.1).
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))
In [4]:
# Gradient checking
if rel_err_gradients() < 1e-8:
    print("Gradient check passed!")
else:
    print("Gradient check failed!")  
In [5]:
"""
Training
"""
reg = 1e-4 # Regulation, weight decay
#clf = MLP(layer_units, weights = weights)
weights, loss_history, train_acc_history, val_acc_history = clf.fit(X_train, y_train, X_val, y_val, 
                                                                    reg=reg, max_iters=max_iters, verbose=True)
    
In [6]:
# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(loss_history)
plt.title('Loss history')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.subplot(2, 1, 2)
plt.plot(train_acc_history)
plt.plot(val_acc_history)
plt.legend(['Training accuracy', 'Validation accuracy'], loc='lower right')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
Out[6]:
In [7]:
# Visualize the weights 
W0 = weights[0]['W']
image = display_network(W0)
plt.imshow(image, cmap = plt.cm.gray)
Out[7]:
In [8]:
# Make predictions
pred = clf.predict(X_test)
acc = np.mean(y_test == pred)
print("Accuracy: {:5.2f}% \n".format(acc*100))
In [9]:
# View some images and predictions
n_images = 4
images = X_test[:n_images].reshape((n_images, 28, 28))
pred = clf.predict(X_test[:n_images])
for i in range(n_images):
    plt.subplot(1, n_images, i+1)
    plt.imshow(images[i], cmap = plt.cm.gray)
    plt.title('Predicted digit: {}'.format(pred[i]))
    plt.axis('off')
    
Multilayer perceptron
In case you are interested in all codes related in this demonstration, please check the repository.
comments powered by Disqus