[![Open in Colab](https://img.shields.io/static/v1?label=&message=Open%20in%20Colab&labelColor=grey&color=blue&logo=google-colab)](https://colab.research.google.com/github/theaveas/DeepLearning/blob/main/NNFS/04_nnfs_activation_functions.ipynb#scrollTo=x96Asv5243q4)

# Activation Functions
The activation function is applied to the output of a neuron (or layer of neurons), which modifies output. We use activation functions because if the activation function itself is nonlinear, it allows for neural netowrks with usually two or more hidden layers map nonlinear functions.\
\
There are two types of activation functions:
   - activation functions used in hidden layers
   - activation functions used in output layers

In [1]:
import platform
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

print(platform.python_version())
print(np.__version__)
print(matplotlib. __version__)

#python version 3.9.7
#numpy version 1.21.2
#matplotlib version 3.5.0

3.7.12
1.19.5
3.2.2


In [12]:
# install library
!pip install nnfs

Collecting nnfs
  Downloading nnfs-0.5.1-py3-none-any.whl (9.1 kB)
Installing collected packages: nnfs
Successfully installed nnfs-0.5.1


## The Step Activation Function
Simple activation function that try to mimic a neuron `fire` or `not firing` based on input information.

In [2]:
# step activation function
def activation_step(x):
    if x > 0:
        y = 1
    else:
        y = 0
    
    return y
    
y1 = activation_step(3)
y2 = activation_step(-4)
print(y1, y2)

1 0


## The Linear Activation Function
Is simply the equation of a line `y = wx + b`\
This activation function is usually applied to the last layer's ouput in the case of a regression model.

In [3]:
# linear activation function
def activation_linear(x, w, b):
    return w * x + b

y1 = activation_linear(3, 2, 1)
print(y1)

7


## The Sigmoid Activation Function
The original, more granular, than the step activation function `y = 1 / (1 + e^^-x)`\
This function return a value in the range of 0 for negative infinity, through 0.5 for the input of 0, and to 1 for positive infinity.

In [4]:
# sigmoid activation function
def activation_sigmoid(x):
    s = 1 / (1 + np.exp(-x))
    
    return s

y1 = activation_sigmoid(4)
print(y1)

0.9820137900379085


## The Rectified Linear Activation Function
The `ReLU` activation function is simpler than the sigmoid, It's quite literally `y = x`, clipped at `0` from the negative side. `y = x if x > 0 else y = 0`

In [5]:
# relu activation function
def activation_relu(x):
    if x > 0:
        y = x
    elif x < 0:
        y = 0
    
    return y

y1 = activation_relu(3.4)
y2 = activation_relu(-3)
print(y1, y2)

3.4 0


In [6]:
inputs = [0, 2, -1, 3.3, -2.7, 1.1, 2.2, -100]

output = []
for i in inputs:
    output.append(max(0, i))

print(output)

[0, 2, 0, 3.3, 0, 1.1, 2.2, 0]


## The Softmax Activation Function
The softmax activation function is return confidence scores for each class and will add up to 1. `Sij = ezi,j / sum(ezi,j)`

In [7]:
# exponentiate the output, we do this with Euler's number "e"
E = 2.71828182846     # E = math.e

# values from the previous output
layer_outputs = [4.8, 1.21, 2.385]

# for each value in a vector, calc the exponential value
exp_values = np.exp(layer_outputs)
print(exp_values)

[121.51041752   3.35348465  10.85906266]


The purpose of this exponentiatioin is to Get the probabilities of this outputs

In [8]:
# calculate the norm probabilities 
# first normalize values
norm_base = sum(exp_values)

norm_values = exp_values / np.sum(exp_values)
print('Normalized exponentiated values: ', norm_values)

Normalized exponentiated values:  [0.89528266 0.02470831 0.08000903]


In [9]:
layer_outputs = np.array([[4.8, 1.21, 2.385], 
                          [8.9, -1.81, 0.2], 
                          [1.41, 1.051, 0.026]])
print('Note')
print(np.sum(layer_outputs))

print('We only need to add up the rows axis and keepdims')
print(np.sum(layer_outputs, axis=1, keepdims=True))

Note
18.172
We only need to add up the rows axis and keepdims
[[8.395]
 [7.29 ]
 [2.487]]


In [10]:
# get normalize probabilities
probabilities = np.exp(layer_outputs)/np.sum(np.exp(layer_outputs), axis=1, keepdims=True)

print(probabilities)

[[8.95282664e-01 2.47083068e-02 8.00090293e-02]
 [9.99811129e-01 2.23163963e-05 1.66554348e-04]
 [5.13097164e-01 3.58333899e-01 1.28568936e-01]]


## Why Use Activation Functions?
In real world and real problems, there are a number of factors that come into play, that lead to make our model nonlinear.\
So use linear activation is just not going to work.

---
## Our code so far

In [13]:
# import dataset 
import nnfs
from nnfs.datasets import spiral_data

# set random seed to 0, create float32 dtype, overrides the original dot product from Numpy
nnfs.init()

In [14]:
class Dense:
    def __init__(self, n_inputs, n_neurons):
        """ Initialize the weights and biases of each neurons
        n_inputs = number of input features
        n_neurons = number of desired neurons
        """
        # using np.random.randn and * 0.01 is to break the symetry of the neurons
        self.weights = np.random.randn(n_inputs, n_neurons) * 0.01
        # biases can be initialize as zeros
        self.biases = np.zeros((1, n_neurons))
    
    def forward(self, inputs):
        """ Calculate the output layer using The Dot product of input feature and weight plus bias
        Input:
        inputs = Training examples
        
        Output:
        output = Output of the training example
        """
        # calculate the output layer
        output = np.dot(inputs, self.weights) + self.biases
        
        return output

In [15]:
# ReLU activation
class Activation_ReLU:
    def forward(self, inputs):
        output = np.maximum(0, inputs)
        
        return output

In [16]:
# Sotfmax activation
class Activation_Softmax:
    def forward(self, inputs):
        # input - np.max to prevent the exponential function from overflowing
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        
        softmax = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        return softmax

In [17]:
# create dataset
X, y = spiral_data(samples=100, classes=3)

# create dense layer with 2 input features and 3 output values
l1 = Dense(2, 3)
a1 = Activation_ReLU()

# create dense layer with 3 input features and 3 output values
l2 = Dense(3, 3)
a2 = Activation_Softmax()

# forward pass through activation func
yhat1 = a1.forward(l1.forward(X))
yhat2 = a2.forward(yhat1)

print(yhat2[:10])

[[0.33333334 0.33333334 0.33333334]
 [0.33332068 0.33335868 0.33332068]
 [0.3332981  0.33340386 0.3332981 ]
 [0.3332748  0.3334504  0.3332748 ]
 [0.33325398 0.33349204 0.33325398]
 [0.33329442 0.3334112  0.33329442]
 [0.33321366 0.33357266 0.33321366]
 [0.33321416 0.33357167 0.33321416]
 [0.33318758 0.33362478 0.33318758]
 [0.33315328 0.33369344 0.33315328]]
