# Artificial Neural Networks (ANNs)
Artificial neural networks are one of the main tools used in machine learning. As the “neural” part of their name suggests, they are brain-inspired systems which are intended to replicate the way that we humans learn. Neural networks consist of input and output layers, as well as (in most cases) a hidden layer consisting of units that transform the input into something that the output layer can use. They are excellent tools for finding patterns which are far too complex or numerous for a human programmer to extract and teach the machine to recognize.

**How exactly do NN “learn” stuff?**

In the same way that we learn from experience in our lives, neural networks require data to learn. In most cases, the more data that can be thrown at a neural network, the more accurate it will become. Think of it like any task you do over and over. Over time, you gradually get more efficient and make fewer mistakes.

When researchers or computer scientists set out to train a neural network, they typically divide their data into three sets. First is a training set, which helps the network establish the various weights between its nodes. After this, they fine-tune it using a validation data set. Finally, they’ll use a test set to see if it can successfully turn the input into the desired output.

**Do neural networks have any limitations?**

Biggest challenge with neural networks is the significantly large training time and the amount of computation power required to train the neural network. The biggest issue, however, is that neural networks are “black boxes,” in which the user feeds in data and receives answers. They can fine-tune the answers, but they don’t have access to the exact decision making process.

![title](4NN.jpg)

In [1]:
import numpy as np

In [2]:
def sig(z):
    return 1/(1 + np.exp(-z))

In [3]:
def derivativeSig(sig_out):
    return sig_out*(1 - sig_out)

In [4]:
X = np.array([[0,0,1], 
              [0,1,1],
              [1,0,1],
              [1,1,1]])
Y = np.array([[0],[1],[1],[0]])

**XOR decision Boundry**

Non linear

![title](5NN.jpg)

In [5]:
weights = 2 * np.random.random((3,1)) - 1       # generating random weights between -1 and 1 
learning_rate = 0.1

weights

array([[ 0.60059059],
       [ 0.05477104],
       [ 0.93807049]])

In [6]:
X.shape, weights.shape

((4, 3), (3, 1))

![title](7NN.jpg)
Let's use the above mentioned error fuction for following unit:
![title](6NN.jpg)

here input has n features per training example , consequently n weights and 1 bias should be used to get :
![title](8NN.JPG)

Suppose O1 applies sigmoid on this input and gives the output as y_predicted.

The sigmoid function applied is called the activation of this perceptron. It can be replaced by any other function like tanh, relu, leaky relu, or even identity function. 

Now simply using gradient descent to minimize error E wrt weight we do following process :

![title](9NN.jpg)
and it is given that O1 is sigmoid activation.


![title](1NN.jpg)

In [7]:
for iter in range(1000):
    output0 = X  # is basically output of 0th layer i.e the input layer hence equals to X 
    output1 = sig(np.dot(output0, weights))    # as mentioned above the output  of O1 is sigmoid applied on z which is 
                                               # dot product of input and weight matrices 
    first_term = output1 - Y                   # basically y_pred - y_act 
    second_term = derivativeSig(output1) # output of unit O1 as mentioned above 
#     print(first_term.shape)
    first_two = first_term * second_term
#     print(first_two.shape)
    changes = np.dot(output0.T, first_two)
#     print(changes.shape)
    weights = weights - learning_rate*changes # updating weights  
    
output1 = sig(np.dot(X, weights))
weights,output1

(array([[-0.00172202],
        [-0.00267029],
        [ 0.00260616]]), array([[ 0.50065154],
        [ 0.49998397],
        [ 0.50022103],
        [ 0.49955346]]))

Since above network had only one layer it wasn't able to create non-linear decision boundary and hence the results were poor.
Now we will add one more layer and see if output changes or not. 
( You should test the above result by changing Y=[0,0,0,1] which is having a linear decision boundary. )

In [8]:
weights0 = 2* np.random.random((3,4)) -  1
weights1 = 2* np.random.random((4, 1)) - 1
learning_rate = 0.5

In [9]:
for iter in range(5000):
    layer0 = X            # Input layer 
    layer1 = sig(np.dot(layer0, weights0))  # output  of layer1 is sigmoid applied on z1 i.e. input of layer 1
    layer2 = sig(np.dot(layer1, weights1))  # output  of layer2 is sigmoid applied on z2 i.e. input of layer 2
    
    l2_error = layer2 - Y                   
    l2_delta = l2_error * derivativeSig(layer2)   # delta k 
    net_change2 = np.dot(layer1.T, l2_delta)

    l1_error = np.dot(l2_delta,weights1.T)           # error j 
    l1_delta = l1_error  * derivativeSig(layer1)  # delta j
    net_change1 = np.dot(layer0.T, l1_delta)

    weights0 = weights0 - learning_rate*net_change1
    weights1 = weights1 - learning_rate*net_change2

In [10]:
layer0 = X
layer1 = sig(np.dot(layer0, weights0))
layer2 = sig(np.dot(layer1, weights1))
layer2 

array([[ 0.01449743],
       [ 0.97871508],
       [ 0.97670817],
       [ 0.02635731]])