<a href="https://colab.research.google.com/github/xalejandrow/Neural-Networks/blob/main/Neural_Networks_copy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Recap: Maths in machine learning

### 1. Algebra of vectors and matrices

#### 1.1 Vectors

We have this vector representation:
$$ x = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} $$

The transpose of the two-component column vector is:
$$ x^\mathsf{T}= \begin{pmatrix} x_1 , x_2 \end{pmatrix} $$

The sum of two column vectors is given by:
$$ x+y = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} + \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \begin{pmatrix} x_1+y_1 \\ x_2+y_2 \end{pmatrix} $$

And the inner product by:
$$ x^\mathsf{T}y = \begin{pmatrix} x_1 , x_2 \end{pmatrix}  \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = x_1y_1 + x_2y_2 $$

The length or euclidean norm of the vector $x$ is:
$$ \Vert x \Vert = \sqrt{x_1^2 + x_2^2} = \sqrt{x^\mathsf{T}x}$$

As we can see, the inner product of $x$ and $y$ can be expressed in terms of the vector lenghts and the angle $\theta$ between the two vectors:
$$ x^\mathsf{T}y = \Vert x \Vert \Vert y \Vert \cos\theta$$

If $\theta$ is 90 degrees, then the vectors are said to be *orthogonal*, in which case: 
$$ x^\mathsf{T}y = 0$$

We can see that any vector can be expressed in terms of orthogonal *unit vectors*:
$$ x = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = x_1 \begin{pmatrix} 1 \\ 0 \end{pmatrix} + x_2 \begin{pmatrix} 0 \\ 1 \end{pmatrix} $$

$$ x = x_1i + x_2j$$


#### 1.2 Matrices

A 2x2 matrix is written in the form

$$ \mathbf{A} = \begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{pmatrix} $$

The notation per each element in the matrix is: first index means row, second index means column. When a matrix is multiplied with a vector, the result is another vector:

$$ \mathbf{A}x = \begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} a_{11}x_1 + a_{12}x_2 \\ a_{21}x_1 + a_{22}x_2 \end{pmatrix} $$

In general, for $ \mathbf{A} = (a_1, a_2 ... a_N)$, where the vectors $a_i$ are the columns of $\mathbf{A}$:
$$ \mathbf{A}x = x_1a_1 + x_2a_2 + ... + x_Na_N $$

The product of two 2x2 matrices is given by:
$$ \mathbf{A}\mathbf{B} = \begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22} \end{pmatrix} \begin{pmatrix} b_{11} & b_{12}\\ b_{21} & b_{22} \end{pmatrix} = \begin{pmatrix} a_{11}b_{11} + a_{12}b_{21} & a_{11}b_{12} + a_{12}b_{22} \\ a_{21}b_{11} + a_{22}b_{21} & a_{21}b_{12} + a_{22}b_{22} \end{pmatrix}$$

The matrix product is allowed whenever $\mathbf{A}$ has the same number of columns as $\mathbf{B}$ has rows. So for this case, if $\mathbf{A}$ has dimension $l$ x $m$ and $\mathbf{B}$ has dimension $m$ x $n$ then $\mathbf{A}\mathbf{B}$ is $l$ x $n$ with elements:
$$ (\mathbf{A}\mathbf{B})_{ij} = (\sum_{k=1}^m a_{ik}b_{kj}) : i= 1...l, j= 1...n$$

Note that matrix multiplication is not commutative, this means $\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$ in general. However it is associative:
$$ (\mathbf{A}\mathbf{B})\mathbf{C} = \mathbf{A}(\mathbf{B}\mathbf{C})$$



In [11]:
# Example
A = np.array([[1,2],[3,4]])
B = np.array([1,1])
np.dot(A,B)

array([3, 7])

In [12]:
# This is different
A * B

array([[1, 2],
       [3, 4]])

In [13]:
# Another way to create a multiplication
A @ B

array([3, 7])

![Neural network representation](https://www.datasciencecentral.com/wp-content/uploads/2021/10/2808330901.jpeg)

## 2. Creating a Neural Network

Suppose we have the following table:

| X1 | X2 | X3 | Y1 |
|----|----|----|----|
| 0  | 0  | 1  | 0  |
| 1  | 1  | 1  | 1  |
| 1  | 0  | 1  | 1  |
| 0  | 1  | 1  | 0  |

As we can see, we have 3 IVs and 1 DV, and by simply using measuring statistics we could see that X1 is perfectly correlated with Y1. Our neural network will have two processes: forward propagation, when creating the inner layers by multiplying the input (IVs) with weights and the backpropagation, when updating the weights.





In [14]:
X = np.array([[0,0,1],
              [1,1,1],
              [1,0,1],
              [0,1,1]])
X

array([[0, 0, 1],
       [1, 1, 1],
       [1, 0, 1],
       [0, 1, 1]])

In [15]:
y = np.array([[0,1,1,0]])
y

array([[0, 1, 1, 0]])

In [16]:
y.T

array([[0],
       [1],
       [1],
       [0]])

In [17]:
2*np.random.random((3,1)) - 1


array([[-0.71922612],
       [-0.60379702],
       [ 0.60148914]])

In [18]:
import numpy as np

# Let's create a sigmoid function (our non linear function)
def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# input dataset
X = np.array([[0,0,1],
              [1,1,1],
              [1,0,1],
              [0,1,1]])

# output dataset           
y = np.array([[0,1,1,0]]).T

# seed random numbers to make calculation
np.random.seed(1)

# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1
 
for iter in range(10000):
    # forward propagation
    l0 = X
    l1 = nonlin(np.dot(l0,syn0))
    # how much did we miss?
    l1_error = y - l1
    # multiply how much we missed by the
    # slope of the sigmoid at the values in l1
    l1_delta = l1_error * nonlin(l1,True)
    # update weights
    syn0 += np.dot(l0.T,l1_delta)

print("Output After Training:")
print(l1)

Output After Training:
[[0.00966449]
 [0.99211957]
 [0.99358898]
 [0.00786506]]


In [19]:
# Hidden layer
l1

array([[0.00966449],
       [0.99211957],
       [0.99358898],
       [0.00786506]])

In [20]:
# Weights
syn0

array([[ 9.67299303],
       [-0.2078435 ],
       [-4.62963669]])