Below, we will derive the backprop rules for a convolutional layer. As this derivation is complex compared to the fully connected one, we will do it in stages.
1. one dimensional single channel input and single filter.
Considerthe following situation
- layer $l$ has dimension L, indexed as $0 \ldots L-1$. It has a single channel.
- The filter has width $W$. As Layer $l$ has a single channel, the filter also has a single channel,
- Layer $l+1$ has dimension $L-W+1$, indexed as $0, \ldots, L-W$. It also has a single channel.
The forward equation is
\begin{eqnarray}
z^l_i = \sum_{m=0}^{W-1} w^{l}_{m}a^{l-1}_{i+m} + b^l
\end{eqnarray}
Note that:
- Even though the operation above is named as “convolution”, we didn’t flip the filter as we had flipped in our signals and systems classes. So the operation we are doing is more akin to correlation than convolution, despite the name.
- A single bias is associated with the filter.
Let us derive the reverse equations. Define $\Gamma$’ as.
\begin{eqnarray}
\Gamma^l_i = \frac{\partial C}{\partial z^l_i}
\end{eqnarray}
Note that the output of the neuron $i$ in layer $l$ affects neurons
\begin{eqnarray}
\max (i-W+1,0), \ldots , \min(i, L-W)
\end{eqnarray}
in the layer $l+1$. Therefore we can write the backward equation for $\Gamma$ as
\begin{eqnarray}
\Gamma^{l-1}_i &=& \frac{\partial C}{\partial z^{l-1}_i}\\
&=& \sum_{m=\max (i-W+1,0)}^{\min(i, L-W)} \frac{\partial C}{\partial z^{l+1}_m} \frac{\partial z^{l+1}_m}{\partial z^l_i}\\
&=& \sum_{m=\max (i-W+1,0)}^{\min(i, L-W)} \frac{\partial C}{\partial z^{l+1}_m} \frac{\partial z^{l+1}_m}{\partial a^l_i} \frac{\partial a^l_i}{\partial z^l_i}\\
&=& \sum_{m=\max (i-W+1,0)}^{\min(i, L-W)} \Gamma^{l+1}_m w_{i-m}^{l+1} \frac{\partial a^l_i}{\partial z^l_i}\\
\end{eqnarray}