An online interactive CNN visualization example:
But not all data reside in a regular array-like space. In particular, networks
Examples of networks, or graph-based data structure
Generalized convolution
(Figure from Alex Graves)
Mathematical details: Let $\norm{\ppf{\vh_{\tau+1}}{\vh_\tau}}\approx \alpha$
$$ \begin{align} \norm{\ppf{\cL}{\vh_t}} &\propto \norm{\ppf{\vh_{t+1}}{\vh_t}\ppf{\cL}{\vh_{t+1}}} \propto \norm{ \left(\prod_{\tau=t}^{T-1}\ppf{\vh_{\tau+1}}{\vh_\tau}\right) \ppf{\cL}{\vh_T}} \leq \prod_{\tau=t}^{T-1}\norm{\ppf{\vh_{\tau+1}}{\vh_\tau}} \norm{\ppf{\cL}{\vh_T}} \\ \Rightarrow \norm{\ppf{\cL}{\vh_t}} &\approx \alpha^{T-t} \norm{\ppf{\cL}{\vh_T}} \end{align} $$At very old steps, i.e. when $T\gg t$,
The vanishing gradient problem turns out to be universal in deep learning, e.g. in CNN architectures.
This leads to the skip connection technique (which is now standard) and the Residual Network (ResNet).
Ref: Deep Residual Learning for Image Recognition, arXiv 1512.03385 (100k+ citations now...)
The ResNet essentially makes the following change: $$ \vx^{(j+1)} = \vF(\vx^{(j)}) \quad\Rightarrow\quad \vx^{(j+1)} = \vx^{(j)} + \vF(\vx^{(j)}) $$ to provide a "bypass" for the back-propagation.
Recall for a first-order ordinary differential equation (ODE) $$ \dot\vx = \vf(\vx),\quad \vx(0)=\vx_0,\quad t\in[0,T] $$ Forward Euler method with step size $\Delta t$, $\vx_j=\vx(j\Delta t)$ $$ \vx^{j+1} = \vx^{j} + \Delta t\vf(\vx^{j}) $$ Then $\Delta t\vf(\vx^{j})$ is as if the ResNet block $\vF(\vx^{(j)})$ in previous slide!
Ref: Neural Ordinary Differential Equations, arXiv 1806.07366
But ...
There of course many ... For example
And in the bigger picture: