TH's Notes
  • Home
  • Categories
  • Tags
  • Archives

Derive and implement softmax layer

Loss

Assuming there are \(K\) classes, the loss of softmax can be written as

$$ J = \sum_{i=1}^{N} J^{(i)} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{K} t_j^{(i)} \ln P_j^{(i)} $$

where

$$ t_j = 1 \{ y = j \} = \left \{ \begin{aligned} 1 \quad \text{for} \quad y = j \\ 0 \quad \text{for} \quad y \neq j \end{aligned} \right. $$
$$ P_j = \frac{ e^{x_j} }{ \sum_{l=1}^{K} e^{x_l}} \quad \text{the probability for the } j \text {th class.} $$

\(x_j\) is the score for the \(j\)th class; \(y\) stands for the correct class label.

Derivative

$$ \frac{ \partial J^{(i)} }{ \partial x_k } = -\frac{1}{N} \sum_{j=1}^{K} t_j^{(i)} \frac{ \partial }{ \partial x_k } \ln P_j^{(i)} = \frac{1}{N} ( P_k^{(i)} - t_k^{(i)} ) $$

Numerical trick

To avoid the numerical problem when the numerator/denominator of \(P_j\) is too large, we prefer to calculate \(P_j\) as follows.

$$ P_j = \frac{ e^{x_j} }{ \sum_{l=1}^{K} e^{x_l}} = \frac{ e^{x_j - x_{\text{max}}} }{ \sum_{l=1}^{K} e^{x_l - x_{\text{max}}}} $$
$$ x_{\text{max}} = max(x) $$

Code example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def softmax_loss(x, y):
  """
  Computes the loss and gradient for softmax classification.
  x: Input data, of shape (N, K)
     where x[i, j] is the score for the jth class
     for the ith input.
  y: Vector of labels, of shape (N,)
     where y[i] is the label for x[i] and 0 <= y[i] < K

  Returns
  ----
  loss: Scalar giving the loss
  dx: Gradient of the loss with respect to x
  """
  N = x.shape[0]
  x_max = np.max(x, axis=1, keepdims=True)
  proba = np.exp(x - x_max)
  proba /= np.sum(proba, axis=1, keepdims=True)
  loss = -np.sum(np.log(proba[range(N), y])) / N
  dx = proba.copy()
  dx[range(N), y] -= 1
  dx /= N

  return loss, dx
Comments
comments powered by Disqus

  • « Implement stacked multilayer perceptron for digit recognition
  • Derive and implement multiclass SVM layer »

Published

Feb 14, 2016

Category

Machine learning

Tags

  • cv 16
  • Powered by Pelican. Theme: Elegant by Talha Mansoor