# The Preceptron

We refer a unit of Neural Network as a Preceptron.

![A simple perceptron with an input (x) and an output (y). The weights (w) and bias (b)](https://drive.google.com/file/d/1T3_Cars4GCfKPLhMVRe0LTPU09wzPsLb/view?usp=sharing)



y = f ( w x + b )
 
 
 In practical usage there are more than one input, f is an activation function, wx+b is a linear function also known as affine transform.

In [0]:
import torch
import torch.nn as nn

# Constructing a class to build a single layer Preceptron
class Percep(nn.Module):
    # input_dim --> represents the size of the input features
    def __init__(self, input_dim):
       
        super(Percep, self).__init__()
        self.fc1 = nn.Linear(input_dim, 1)
    # defining a function for he forward pass of the preceptron   
    def forward(self, x_in):
        """
        Args:
            x_in (torch.Tensor): an input data tensor 
                x_in.shape should be (batch, num_features)
        Returns:
            the resulting tensor. tensor.shape should be (batch,).
        """
        return torch.sigmoid(self.fc1(x_in)).squeeze()

PyTorch efficeintly offers a Linear class in torch.nn module and also keep tracks of weights and biases.

Sigmoid ( $f ( x ) = \frac { 1 } { 1 + e ^ { - x } }$ ) is the most popular activation function. There's a disadvatage of using the sigmoid in early stage of NN as it results in vanisihing gradient or exploding gradients problem.

To avoid the vanishing gradinet problem there's ReLU (f(x)=max(0,x)), this clips negative values to zero, but sometimes the network goes to zero and never revive again, referred as dying ReLU problem.
There are variants to deal with this problem such as Leaky ReLU and Parametric ReLU, where the leak coefficient a is a learned parameter f(x)=max(x,ax).  

There are other activation functions such as 
$\operatorname { softmax } \left( x _ { i } \right) = \frac { e ^ { x _ { i } } } { \sum _ { j = 1 } ^ { k } e ^ { y } }$    mainly used for classification tasks and  Tanh  $f ( x ) = \tanh x = \frac { e ^ { x } - e ^ { - x } } { e ^ { x } + e ^ { - x } }$ which is a variant of sigmoid function

