# 4
> GDL

- toc: true 
- badges: true
- comments: false
- categories: [jupyter]


## __1.4 $\dots$ GDL blueprint (with fixed input domain)__

### $\quad$ *__Setup__*

Our formal treatment of a classification problem requires the following objects: 

* $\Omega$ : A graph with $n$ vertices. This is the domain of the signals considered.

* $\mathcal{X}(\Omega,\mathbb{R}^s)$ :  The space of $s$-channel signals $x : \Omega \to \mathbb{R}^s$. 

* $(e_j)_{j\,=\,1}^{\,n}$ : A 'spatial' coordinate system for scalar-valued signals $\mathcal{X} \to \mathbb{R}$. 

* $(z_c)_{c \,=\,1}^{\,s}$ : A channel coordinate system. 

* $G$ : A group $G$ acting on $\Omega$, via $\Omega \ni u \mapsto g.u \in \Omega$. This induces the `pointwise' action on the space of signals $\mathcal{X}(\Omega,\mathbb{R}^s)$, denoted $x \mapsto \mathbf{g}.x$. 

* $\rho$ : The representation of the induced action of $G$, $x \mapsto \mathbf{g}.x$. 

The basis 
$$
\{ e_j \otimes z_c : 1 \leq j \leq n,\, 1 \leq c \leq s  \}
$$ 

is the one in which signals are expressed to a computer. Given $x \in \mathcal{X}(\Omega,\mathbb{R}^s)$. In terms of this basis, one can write $x$ as
$$
x = \mathbf{X}^{cj} e_j \otimes z_c ,
$$
Note that $\mathbf{X}$ is the so-called design matrix previously introduced. 





$\vdots$

Let $g \in G$. In the basis $e_j \otimes z_c$, we can then identify $\rho(g)$ with a tensor... _(say more)_

$\vdots$

### $\quad$ *__coarsening operator__*

Let $\Omega$ and $\Omega'$ be domains, $G$ a symmetry group over $\Omega$, and write $\Omega' \prec \Omega$ if $\Omega'$ arises from $\Omega$ through some coarsening operator (presumably this coarsening operator needs to commute with the group action).

### $\quad$ *__GDL block__*

1. _linear $G$-equivariant layer_

$$
B : \mathcal{X}(\Omega, \mathcal{C}) \to \mathcal{X}(\Omega' , \mathcal{C}')
$$ 
such that $B(g.x) = g.B(x)$ for all $g \in G$ and $x \in \mathcal{X}(\Omega, \mathcal{C})$.

2. _nonlinearity_, or _activation function_ 
$$
a : \mathcal{C} \to \mathcal{C}'
$$ 
applied domain-pointwise as $(\mathbf{a}(x))(u) = a(x(u))$.

3. _local pooling operator_ or _coarsening operator_ 
$$
P : \mathcal{X}(\Omega, \mathcal{C}) \to \mathcal{X}(\Omega', \mathcal{C})
$$ 
which gives us our notion $\Omega' \prec \Omega$.

The last ingredient is a global pooling layer applied last, compositionally. 

4. _$G$-invariant layer_, or _global pooling layer_
$$
A : \mathcal{X} (\Omega, \mathcal{C}) \to \mathcal{Y}
$$ 
satisfying $A(g.x) = A(x)$ for all $g \in G$ and $x \in \mathcal{X}(\Omega, \mathcal{C})$. 

### $\quad$ *__Hypothesis space__*

These objects can be used to define a class of $G$-invariant functions $f: \mathcal{X}(\Omega, \mathcal{C}) \to \mathcal{Y}$ of the form
$$
f = A \circ \mathbf{a}_J \circ B_J \circ P_{J-1} \circ \dots \circ P_1 \circ \mathbf{a}_1 \circ B_1 ,
$$
where the blocks are selected so that the output space of each block matches the input space of the next one. Different blocks may exploit different choices of symmetry groups $G$.

### $\quad$ *__Discussion__*

Shift-invariance arises naturally in vision and pattern recognition. In this case, the desired function $f \in \textsf{H}$, typically implemented as a CNN, inputs an image and outputs the probability of the image to contain an object from a certain class. It is often reasonably assumed that the classification result should not be affected by the position of the object in the image, i.e., the function $f$ must be shift-invariant.

Multi-layer perceptrons lack this property, a reason why early (1970s) attempts to apply these architectures to pattern recognition problems failed. The development of NN architectures with local weight sharing, as epitomized by CNNs, was among other reasons motivated by the need for shift-invariant object classification. 



A prototypical application requiring shift-equivariance is image segmentation, where the output of $f$ is a pixel-wise image mask. This segmentation mask must follow shifts in the input image. In this example, the domains of the input and output are the same, but since the input has three color channels while the output has \emph{one channel per class}, the representations $(\rho, \mathcal{X}(\Omega, \mathcal{C}) )$ and $(\rho', \mathcal{Y} \equiv \mathcal{X}(\Omega, \mathcal{C}'))$ are somewhat different. 

When $f$ is implemented as a CNN, it may be written as a composition of $L$ functions, where $L$ is determined by the depth and other hyperparameters:
$$
f = f_L \circ f_{L-1} \circ \dots \circ f_2 \circ f_1 .
$$

Examining the individual layer functions making up CNN, one finds they are not shift-invariant in general but rather shift-equivariant. The last function applied, namely $f_L$, is typically a ``global-pooling" function that is genuinely shift-invariant, causing $f$ to be shift-invariant, but to focus on this ignores the structure we will leverage for purposes of expressivity and regularity. 