# How to compute the derivative of softmax and cross-entropy

Softmax function is a very common function used in machine learning, especially in logistic regression models and neural networks. In this post I would like to compute the derivatives of softmax function as well as its cross entropy.

The definition of softmax function is:

$$\sigma(z_j) = \frac{e^{z_j}}{e^{z_1} + e^{z_2} + \cdots + e^{z_n}}, j \in \{1, 2, \cdots, n\},$$

Or use summation form,

$$\sigma(z_j) = \frac{e^{z_j}}{\sum_{i=1}^n e^{z_i}}, j \in \{1, 2, \cdots, n \}.$$

And computing the derivative of softmax function is one of the most common task in machine learning. But there is a pitfall. We know that softmax is computed against one of the components $$z_j$$ of vector $$\mathbf{z}$$. When computing partial derivative, the component we take partial derivative against IS NOT necessarily the component on which softmax is computed! Or in math language, what we need to compute is $$\frac{\partial \sigma(z_j)}{\partial z_k}$$, where j and k are not necessarily equal! Many beginners end up with computing $$\frac{\partial \sigma(z_j)}{\partial z_j}$$, which is only a small propotion of the problem.

Now let’s compute $$\frac{\partial \sigma(z_j)}{\partial z_k}$$. We will use the quotient rule: $$(f(x)/g(x))’ = (f'(x)g(x) – f(x)g(x)’) / g(x)^2$$. Also we denote $$\sum_{i=1}^n e^{z_i}$$ as $$\Sigma$$ for convenience.

When $$j = k$$,

\begin{align*} \frac{\partial \sigma(z_j)}{\partial z_k} &= \frac{\partial}{\partial z_j} \frac{e^{z_j}}{\Sigma} && \cdots \text {since } j = k \\ &= \frac{\frac{\partial e^{z_j}}{\partial z_j} \cdot \Sigma – e^{z_j} \cdot \frac{\partial \Sigma}{\partial z_j}{}}{\Sigma ^ 2} && \cdots \text{quotient rule} \\ &= \frac{e^{z_j} \cdot \Sigma – e^{z_j} \cdot e^{z_j}}{\Sigma ^ 2} \\ &= \frac{e^{z_j}}{\Sigma} – \Big( \frac{e^{z_j}}{\Sigma}\Big) ^ 2 \\ &= \sigma(z_j) – (\sigma(z_j))^2 \\ &= \sigma(z_j) (1 – \sigma(z_j)), \end{align*}

When $$j \neq k$$,

\begin{align*} \frac{\partial \sigma(z_j)}{\partial z_k} &= \frac{\partial}{\partial z_k} \frac{e^{z_j}}{\Sigma} \\ &= \frac{\frac{\partial e^{z_j}}{\partial z_k} \cdot \Sigma – e^{z_j} \cdot \frac{\partial \Sigma}{\partial z_k}}{\Sigma ^ 2} && \cdots \text{quotient rule} \\ &= \frac{0 \cdot \Sigma – e^{z_j} \cdot e^{z_k}}{\Sigma ^ 2} && \cdots e^{z_j} \text{ is constant when taking derivative of }z_k \\ &= -\frac{e^{z_j}}{\Sigma} \cdot \frac{e^{z_k}}{\Sigma} \\ &= -\sigma(z_j) \sigma(z_k). \end{align*}

Thus, the derivative of softmax is:

$$\frac{\partial \sigma(z_j)}{\partial z_k} = \begin{cases} \sigma(z_j)(1-\sigma(z_j)), & \text{when } j = k, \\ -\sigma(z_j)\sigma(z_k), &\text{when }j \neq k. \end{cases}$$

## Cross Entropy with Softmax

Another common task in machine learning is to compute the derivative of cross entropy with softmax. This can be written as:

$$\text{CE} = \sum_{j=1}^n \big(- y_j \log \sigma(z_j) \big)$$

In classification problem, the n here represents the number of classes, and $$y_j$$ is the one-hot representation of the actual class. One-hot is a vector that only one component is 1 and all other components are zero. In other words, there is only one 1 in $$y_1, y_2, \cdots, y_n$$ and others are all zero.

For example, for a 5-class classification problem, if a specific data point has class #4, then its class label $$\mathbf{y} = (0, 0, 0, 1, 0)$$.

Now let us compute the derivative of cross entropy with softmax. We will use the chain rule: $$(f(g(x)))’ = f'(g(x)) \cdot g'(x)$$.

\begin{align*} \frac{\partial}{\partial z_k}\text{CE} &= \frac{\partial}{\partial z_k}\sum_{j=1}^n \big(- y_j \log \sigma(z_j) \big) \\ &= -\sum_{j=1}^n y_j \frac{\partial}{\partial z_k} \log \sigma(z_j) && \cdots \text{addition rule, } -y_j \text{ is constant}\\ &= -\sum_{j=1}^n y_j \frac{1}{\sigma(z_j)} \cdot \frac{\partial}{\partial z_k} \sigma(z_j) && \cdots \text{chain rule}\\ &= -y_k \cdot \frac{\sigma(z_k)(1-\sigma(k))}{\sigma(z_k)} – \sum_{j \neq k} y_j \cdot \frac{-\sigma(z_j)\sigma(z_k)}{\sigma(z_j)} && \cdots \text{consier both }j = k \text{ and } j \neq k \\ &= -y_k \cdot (1-\sigma(z_k)) + \sum_{j \neq k} y_j \sigma(z_k) \\ &= -y_k + y_k \sigma(z_k) + \sum_{j \neq k} y_j \sigma(z_k) \\ &= -y_k + \sigma(z_k) \sum_j y_j. \end{align*}

Since y is one-hot encoding, $$\sum_j y_j = 1$$. Thus the derivative of cross entropy with softmax is simply

$$\frac{\partial}{\partial z_k}\text{CE} = \sigma(z_k) – y_k.$$

This is a very simple, very easy to compute equation.

1. Where you write “Or in math language, what we need to compute is ∂σ(zj)∂zk, where i and j are not necessarily equal!”
do you mean “k and j” rather than “i and j”? My rationale is that you use subscript k for the z in the denominator and it would read a bit better if this was explicit?

• charlee

Yes it should be “k and j”. Fixed. Thanks for pointing out!

### Webmentions

• Danmark körkort September 22, 2019

[…] Information on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• Supercritical CO2 Extraction Machine September 22, 2019

[…] Here you will find 28694 more Information on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• gout de diamants prezzo September 22, 2019

• Server Teknik Servis September 22, 2019

[…] Info to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• HP Server Teknik Destek September 22, 2019

[…] There you will find 52765 additional Info to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• world market url September 22, 2019

[…] Info on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• internet bağlantısı kopuyor September 22, 2019

[…] Here you will find 40203 additional Info to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• exchange bulut mail September 22, 2019

[…] Find More here on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• kurumsal sunucu yapılandırma September 22, 2019

[…] Information to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• kurumsal it danışmanlığı September 22, 2019

[…] Read More here on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• HP Server Teknik Servis September 22, 2019

[…] Find More on on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• Supercritical CO2 Extraction Machine September 22, 2019

[…] There you can find 41080 additional Info to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• jshop cc September 22, 2019

• microsoft exchange September 22, 2019

[…] Read More on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• Weatherby Vanguard For Sale September 22, 2019

[…] Find More on on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• bilişim danışmanlık hizmeti September 22, 2019

[…] Information on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• Legal Gun Sites September 22, 2019

• Supercritical CO2 Extraction Machine September 22, 2019

[…] There you will find 89093 more Information on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• https://www.spiritspells.com/soulmate-spell September 22, 2019

[…] There you will find 65008 additional Info on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• kurumsal it danışmanlığı September 22, 2019

[…] Read More here to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

[…] Read More on to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• priceurb.com September 22, 2019

[…] Find More to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• it danışmanlık ücretleri September 22, 2019

[…] Read More to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• microsoft exchange kiralık sunucu September 22, 2019

[…] Find More on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• tecno September 22, 2019

[…] Find More on to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• internet bağlantısı kopuyor September 22, 2019

• microsoft exchange online plan 3 September 22, 2019

[…] Read More on to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• exchange online plan 1 September 22, 2019

[…] Find More to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• Supercritical CO2 Extraction Machine September 22, 2019

[…] Read More on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• microsoft exchange hosting fiyat September 22, 2019

[…] Read More to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• bulut depolama,bulut,yedekleme,ücretsiz bulut hizmeti,en iyi bulut şirketi,en iyi bulut firması,bulut depolama fiyatı,en ucuz bulut depolama,en iyi bulut depolama hangisi,en iyi bulut depolama,bulut arşivleme sistemi,online yedekleme,bulut yedekleme September 22, 2019

[…] Here you can find 31243 more Information on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• microsoft exchange online plan 3 September 22, 2019

[…] There you can find 68301 additional Information on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• http://www.sicipiscine.it/component/k2/itemlist/user/1484116 September 22, 2019

• exchange online plan 1 September 22, 2019

[…] Read More here to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• Führerscheinfabrik erfahrungen September 22, 2019

[…] Read More to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• ทางเข้า SBOBET September 22, 2019

• 홀덤 September 22, 2019

[…] Find More here to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

[…] There you will find 91029 more Info on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• arvest central mortgage co September 22, 2019

[…] There you will find 29808 more Information to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• Wazpbet September 22, 2019

• dumps with pin for sale September 22, 2019

[…] Here you can find 36169 additional Info on that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• sell cvv fresh for sale September 22, 2019

[…] There you will find 85354 additional Info to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• gửi hàng đi úc September 22, 2019

[…] Here you will find 95264 additional Info to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• lucaclub September 22, 2019

[…] Read More on to that Topic: charlee.li/how-to-compute-the-derivative-of-softmax-and-cross-entropy/ […]

• https://spaceworld.jp September 22, 2019