# Neural Networks and Deep Learning

![image.png](attachment:image.png)

Above images will give a idea about how a deep learning network look like

![image.png](attachment:image.png)

# Commonalities with Real Brain:

Each Neuron is connected to a small subset of other neurons.

Based on what it sees, it decides what it wants to say.

They learn to cooperate each other and try to accomplish the task.



![image.png](attachment:image.png)

Artificial Neuron contains non linear activation function and has several input and output connections

# Activation Function

Activation function defines the output of input or set of inputs or in other terms defines node of the output of node that is given in inputs. They basically decide to deactivate neurons or activate them to get the desired output.They also helps to normalize the output of any input in the range between 1 to -1. Activation function must be efficient and it should reduce the computation time because the neural network sometimes trained on millions of data points.

Let's take an example:

The neuron is basically is a weighted average of input, then this sum is passed through an activation function to get an output.

 
# Y = ∑ (weights*input + bias)

Here Y can be anything for a neuron between range -infinity to +infinity. So, we have to bound our output to get the desired prediction or generalized results.

 
# Y = Activation function(∑ (weights*input + bias)) 

# But why do we need?

Without activation function, weight and bias would only have a linear transformation, or neural network is just a linear regression model, a linear equation is polynomial of one degree only which is simple to solve but limited in terms of ability to solve complex problems or higher degree polynomials.  

 

But opposite to that, the addition of activation function to neural network executes the non-linear transformation to input and make it capable to solve complex problems such as language translations and image classifications. 

 

In addition to that, Activation functions are differentiable due to which they can easily implement back propagations, optimized strategy while performing backpropagations to measure gradient loss functions in the neural networks.

 

# Important Activation Functions

1. Relu
2. Sigmoid
3. Tanh

# Relu
The ReLU is the most used activation function in the world right now.Since, it is used in almost all the convolutional neural networks or deep learning.

![image.png](attachment:image.png)

As you can see, the ReLU is half rectified (from bottom). f(z) is zero when z is less than zero and f(z) is equal to z when z is above or equal to zero.

But the issue is that all the negative values become zero immediately which decreases the ability of the model to fit or train from the data properly. That means any negative input given to the ReLU activation function turns the value into zero immediately in the graph, which in turns affects the resulting graph by not mapping the negative values appropriately.

# Leaky Relu

It is an attempt to solve the dying ReLU problem.


![image.png](attachment:image.png)

The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01.

When a is not 0.01 then it is called Randomized ReLU.

Therefore the range of the Leaky ReLU is (-infinity to infinity).

Both Leaky and Randomized ReLU functions are monotonic in nature. Also, their derivatives also monotonic in nature.

# Tanh

tanh is also like logistic sigmoid but better. The range of the tanh function is from (-1 to 1). tanh is also sigmoidal (s - shaped).
![image.png](attachment:image.png)

The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped near zero in the tanh graph.

The function is differentiable.

# Sigmoid

![image.png](attachment:image.png)

The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output.Since probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.The function is differentiable.

# Why differentiation is used ?

When updating the curve, to know in which direction and how much to change or update the curve depending upon the slope.That is why we use differentiation in almost every part of Machine Learning and Deep Learning.

# Check out an example how weights and bias will update

In [1]:
# Import required libraries:
import numpy as np# Define input features:
X = np.array([[0,0],[0,1],[1,0],[1,1]])
print(X.shape)
print(X)

(4, 2)
[[0 0]
 [0 1]
 [1 0]
 [1 1]]


In [2]:
# Define y:
Y = np.array([[0,1,1,1]])# Reshaping our output into vector:
y= Y.reshape(4,1)
print(y.shape)
print (y)

(4, 1)
[[0]
 [1]
 [1]
 [1]]


In [3]:
# Define weights:
weights = np.array([[0.1],[0.2]])
print(weights.shape)
print (weights)
bias = 0.3 # Bias weight:
lr = 0.05 #learning rate

(2, 1)
[[0.1]
 [0.2]]


In [6]:
#Sigmoid function:
def sigmoid(x):
    return 1/(1+np.exp(-x))
# Derivative of sigmoid function:
def sigmoid_der(x):
    return sigmoid(x)*(1-sigmoid(x))

In [8]:
for epochs in range(1000):
    inputs = X
    #Feedforward input:
    in_feed = np.dot(inputs, weights) + bias 
    
    #Feedforward output:
    out = sigmoid(in_feed) 
    
    #Calculating error
    # error = 1/2 * (square(out - y))
    error = out - y
    x = error.sum()
    print(x)
    
    derror_dout = error
    dout_din_feed = sigmoid_der(out)
    deriv = derror_dout * dout_din_feed 
    inputs = X.T
    deriv_final = np.dot(inputs,deriv)
    
    #Updating the weights values:
    weights -= lr * deriv_final
    
    for i in deriv:
        bias -= lr * i
    

    
    

-0.0001441677177577566
-0.0001442118430388592
-0.0001442559474823385
-0.0001443000310991302
-0.0001443440938982063
-0.00014438813588955185
-0.0001444321570829507
-0.0001444761574890957
-0.00014452013711668127
-0.00014456409597597697
-0.00014460803407687767
-0.0001446519514292366
-0.00014469584804283064
-0.00014473972392706203
-0.00014478357909237377
-0.0001448274135482791
-0.000144871227304659
-0.0001449150203709504
-0.00014495879275737422
-0.0001450025444734021
-0.00014504627552924115
-0.00014508998593404376
-0.00014513367569780194
-0.00014517734483039668
-0.00014522099334165345
-0.00014526462124090506
-0.00014530822853860148
-0.00014535181524401308
-0.00014539538136659758
-0.00014543892691622207
-0.00014548245190303816
-0.00014552595633644805
-0.0001455694402262911
-0.00014561290358178214
-0.000145656346413163
-0.00014569976872997464
-0.00014574317054204255
-0.0001457865518587828
-0.00014582991269005557
-0.00014587325304505488
-0.00014591657293414745
-0.00014595987236717956
-0.000146

In [9]:
print(weights)
print(bias)

[[7.20642565]
 [7.20667957]]
[-3.24934367]


we have our weights and bias

# Checking sample poins with our update weight and bias

In [10]:
sample_point = np.array([1,0]) # test point
result1 = np.dot(sample_point, weights) + bias
result2 = sigmoid(result1)
print(result2)

[0.98123985]


Predicting the output as 1(almost)

In [11]:
sample_point = np.array([1,1])
result1 = np.dot(sample_point, weights) + bias 
result2 = sigmoid(result1)
print(result2)

[0.99998582]


In [12]:
sample_point = np.array([0,0])
result1 = np.dot(sample_point, weights) + bias
result2 = sigmoid(result1)
print(result2)

[0.03735048]
