# Custom Layers

One factor behind deep learning's success
is the availability of a wide range of layers
that can be composed in creative ways
to design architectures suitable
for a wide variety of tasks.
For instance, researchers have invented layers
specifically for handling images, text,
looping over sequential data,
performing dynamic programming, etc.
Sooner or later you will encounter (or invent)
a layer that does not exist yet in the framework,
In these cases, you must build a custom layer.
In this section, we show you how.

## Layers without Parameters

To start, we construct a custom layer (a block) 
that does not have any parameters of its own. 
This should look familiar if you recall our 
introduction to block in :numref:`sec_model_construction`. 
The following `CenteredLayer` class simply
subtracts the mean from its input. 
To build it, we simply need to inherit 
from the Block class and implement the `forward` method.

In [1]:
import torch
from torch import nn

class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x):
        return x - x.mean()

Let us verify that our layer works as intended by feeding some data through it.

In [2]:
layer = CenteredLayer()
layer(torch.FloatTensor([1, 2, 3, 4, 5]))

tensor([-2., -1.,  0.,  1.,  2.])

We can now incorporate our layer as a component
in constructing more complex models.

In [3]:
net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())

As an extra sanity check, we can send random data 
through the network and check that the mean is in fact 0.
Because we are dealing with floating point numbers, 
we may still see a *very* small nonzero number
due to quantization.

In [4]:
y = net(torch.rand(4, 8))
y.mean()

tensor(6.2864e-09, grad_fn=<MeanBackward0>)

## Layers with Parameters

Now that we know how to define simple layers,
let us move on to defining layers with parameters
that can be adjusted through training. 
To automate some of the routine work
the `Parameter` class 
provide some basic housekeeping functionality.
In particular, they govern access, initialization, 
sharing, saving, and loading model parameters. 
This way, among other benefits, we will not need to write
custom serialization routines for every custom layer.


Now let's implement your own version of PyTorch's `Linear` layer. 
Recall that this layer requires two parameters,
one to represent the weight and another for the bias. 
In the `__init__` function, `in_units` and `units`
denote the number of inputs and outputs, respectively.

In [5]:
class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))
    def forward(self, x):
        return torch.matmul(x, self.weight.data) + self.bias.data

Next, we instantiate the `MyDense` class 
and access its model parameters.

In [6]:
dense = MyLinear(5, 3)
dense.weight

Parameter containing:
tensor([[-2.4106, -0.5526,  0.5341],
        [-1.3731,  0.8102,  0.6256],
        [-1.6634, -0.1392, -1.4495],
        [-1.3229,  0.4290,  1.5283],
        [ 0.7040,  2.0361,  0.6863]], requires_grad=True)

We can directly carry out forward calculations using custom layers.

In [7]:
dense(torch.randn(2, 5))

tensor([[-1.8643,  4.0211, -0.6618],
        [-0.8265,  6.6839,  4.6992]])

We can also construct models using custom layers.
Once we have that we can use it just like the built-in dense layer.

In [8]:
net = nn.Sequential(MyLinear(64, 8), nn.ReLU(), MyLinear(8, 1))
net(torch.randn(2, 64))

tensor([[17.1461],
        [ 7.6568]])

## Summary

* We can design custom layers via the Block class. This allows us to define flexible new layers that behave differently from any existing layers in the library.
* Once defined, custom layers can be invoked in arbitrary contexts and architectures.
* Blocks can have local parameters, which are stored in a `ParameterDict` object in each Block's `params` attribute.


## Exercises

1. Design a layer that learns an affine transform of the data.
1. Design a layer that takes an input and computes a tensor reduction, 
   i.e., it returns $y_k = \sum_{i, j} W_{ijk} x_i x_j$.
1. Design a layer that returns the leading half of the Fourier coefficients of the data. 



[Discussions](https://discuss.d2l.ai/t/59)