# Neural Networks 

####  What is a neural network?
Neural network is a machine learning method that is loosely inspired by neurons in the brain. What they do is multiply weights and inputs. If we're given a training set of inputs and outputs it will learn the mapping by continously updating weights to get nearer to the output. 
### Neural Networks look like this
![](http://neuralnetworksanddeeplearning.com/images/tikz35.png)

### Let's start with a simple example

In [2]:
import numpy as np
weight = np.random.randn()
def Neural_Network(input, weight):
    return input * weight

In [3]:
Neural_Network(0.2354, weight)

0.015184426479923345

This is what neural networks do! They multiply weights and inputs to get a prediction. 

### Working  with multiple inputs and weights
For this example we'll work with 4 inputs and weights

In [4]:
weight = np.random.randn(4)
input = [2.3, 4.6, 7.8, 0.9]
print(Neural_Network(weight, input))

[ 1.09703325  5.33139259  1.79958776 -0.52102516]


Oops! We get more than one result. So, we need to sum these all results to get a prediction. So, We need to define a weighted sum funtion. We can do this by numpy 'dot' function

In [5]:
np.dot(input, weight)#We're gonna be using this instead of Neural_Network function that we defined above

7.7069884400010515

#### This is what the neural network that we coded looks like
![This is how our neural network looks like](http://www.theprojectspot.com/images/post-assets/an.jpg)

## Let's do this with an example
#### Will it rain or not

#### What is this Activation Function(f) in the above picture
What if we want to predict whether it'll rain or not. We can have some inputs(features)) 'X', Weights ' W' and Output 'Y'
The formula would be " X * W = Y' where 'Y' is the probability of Raining. From the program above that's not the case as it is not the prbability. That's were activation functions come in play!

There are alot of activation functions. Here we're gonna be using Sigmoid function that will convert the result into a probability between 0 and 1
![Activatio Function](https://qph.ec.quoracdn.net/main-qimg-05edc1873d0103e36064862a45566dba)

In [6]:
#We need to define the sigmoid function
def sigmoid(x):
    return 1/(1 + np.exp(-x))
#Now take the sigmoid of the dot product
print(sigmoid(np.dot(input, weight)))

0.999550528456


So, our model predicts that there is 98% chance of raining. But this may be wrong as multiplying inputs and random weights will give us random results. We need the correct weghts for the model to predict accurately 

#### Let's do this with an example where set of 1s and 0s that are mapped to set of 1s and 0s

In [58]:
inputs = np.array([[0, 1, 1], [1, 1, 1], [1, 0, 0], [0, 0, 0]])
true = np.array([[1], [1], [0], [0]]).T
weights = 2 * np.random.random((3, 1)) - 1 # Random weights stored in 3 x 1 matrix  
output = sigmoid(np.dot(inputs, weights))
mm = np.dot(inputs, weights)

In [44]:
print(output)

[[ 0.59024853]
 [ 0.53472105]
 [ 0.4437677 ]
 [ 0.5       ]]


In [53]:
inputs.shape

(4, 3)

#### Matrix Multiplication 

In [56]:
weights

array([[-0.22588478],
       [ 0.46561043],
       [-0.10061757]])

![Matrix multiplication](http://4.bp.blogspot.com/-chb2aBGjp9k/U11Py1FaydI/AAAAAAAAAjQ/IQCHVD8eSO4/s1600/Matrix+mutl.jpg)

## Measuring the error
error is the difference between true value and the model's prediction. This is a measure of our model's performance. If its high that means our model is making bad predictions and if its low, model is making good predictions. So we need to make this error as small as possible. We're gonna talk about gradient descent later, which is the most popular way to optimize our weights.

In [48]:
error = true - output

In [52]:
output.shape
true.shape

(1, 4)

The above program in [17] our model doesn't really do anything. The only way we can get our model to predict accurately is to optimize the weights but how do we find thse weights? 
## Gradient Descent
it is a popular optimizing technique to update our weights. So let's talk about it in detail if we had a graph with Error and weight values, it would look like this!
![](http://blog.hackerearth.com/wp-content/uploads/2016/12/graph.png)

From the above graph we see that the graph is a parabola. Our task is to find the local minima, which are the weights to get the smallest possible error. SO, let's do this in code