Derivative of Cross-Entropy Loss w.r.t. Output Spike Count

Table of Contents

The cross entropy loss function is commonly used in classification tasks, especially in the context of neural networks.

The loss function itself is used along with its gradient to update the weights of the network during training.

Here we derive the derivative of the cross-entropy loss with respect to the output spike count for a spiking neural network.

First we do it manually, and then we use Sympy to verify our results.

Manual Derivation

To derivate the derivate of the cross-entropy loss with respect to the output spike count, we assume the following:

We have spike count ouput $s_i$ for each output neuron $i$ computed over some time window.
We apply a softmax activation function to the output spike count $s_i$ to get the predicted class probabilities $\hat{y}_i$ .
The true label is one hot encoded $y_i$ .

The cross-entropy loss is defined as:

L = -\sum_{i=1}^{C} y_i \log(\hat{y}_i)

The softmax function is defined as:

\hat{y}_i = \frac{e^{s_i}}{\sum_{j=1}^{C} e^{s_j}} = \frac{e^{s_i}}{Z}

Where $Z = \sum_{j=1}^C e^{s_j}$ , the partition function.

Now we plug the partition function into the cross entropy loss:

L = -\sum_{i=1}^{C} y_i \log(\hat{y}_i)

L = -\sum_{i=1}^{C} y_i \log\left(\frac{e^{s_i}}{Z}\right)

L = -\sum_{i=1}^{C} y_i \left(s_i - \log(Z)\right)

L = -\sum_{i=1}^{C} y_i s_i + \log(Z) \cdot \sum_{i=1}^{C} y_i

L = -\sum_{i=1}^{C} y_i s_i + \log(Z)

since $\sum_{i=1}^{C} y_i = 1$ .

Now we can compute the derivative of the loss with respect to the output spike count $s_i$ :

\frac{\partial L}{\partial s_k} = - \frac{\partial }{ \partial s_k} (\sum_i y_i s_i) + \frac{\partial }{ \partial s_k} \log(Z)

First term:

\frac{\partial }{ \partial s_k} (\sum_i y_i s_i) = y_k

Second term:

\frac{\partial }{ \partial s_k} \log(Z) = \frac{1}{Z} \cdot \frac{\partial Z}{\partial s_k}= \frac{1}{Z} \cdot \frac{\partial }{ \partial s_k} \left(\sum_{j=1}^{C} e^{s_j}\right) = \frac{1}{Z} \cdot e^{s_k} = \frac{e^{s_k}}{Z} = \hat{y}_k

Combine both:

\frac{\partial L}{\partial s_k} = -y_k + \hat{y}_k

This means that the derivative of the cross-entropy loss with respect to the output spike count is given by:

\frac{\partial L}{\partial s_k} = \hat{y}_k - y_k

This is our final result.

Notebook

Heres what the code does:

Defines symbols for logits s = [s0, s1, s2] and true class one-hot vector y = [y0, y1, y2].
Computes the softmax predictions y_hat[i] = exp(si) / Z.
Defines the cross-entropy loss L = -sum(y[i] * log(y_hat[i])).
Substitutes Z = sum(exp(si)) into L to get a purely s-dependent expression.
Differentiates the loss w.r.t. each logit s_i, getting dL/ds_i.
Simplifies the result and compares it against the known identity: $\frac{\partial L}{ \partial s_k} = \hat{y_k} - y_k$
Substitutes the one-ho- vector constraint sum(y) = 1 so Sympy can simplify.
Checks that grad_k - (y_hat_k - y_k) simplifies to zero (i.e. confirms the identity).
checks_with_constraint returns [0,0,0] showing the condition is true.