# Optimization in Data Science Part 1

Referenced Textbook: https://cobweb.cs.uga.edu/~jam/scalation_guide/comp_data_science.pdf

Specifically Appendix A (Starting Page 619)

## Gradient Decent

*** Reminder: Brush up on you Calculus especially derivatives and the chain rule ***

### What is it?

Gradient Descent is an algorithm that solves optimization problems using first-order iterations. Since it is designed to find the local minimum of a differential function, gradient descent is widely used in machine learning models to find the best parameters that minimize the model’s cost function. 

![](../pics/grad_dec/gd.png)

### Why do we need it? 

Data science uses optimization to fit parameters in models, where for example a quality of fit measure (e.g.,sum of squared errors) is minimized. Typically, gradients are involved. In some cases, the gradient of the measure can be set to zero allowing the optimal parameters to be determined by matrix factorization. For complex models, this may not work, so an optimization algorithm that moves in the direction opposite to the gradient can be applied.

In addition as the number of variables or features in our problem increases, it becomes more computationally expensive to use matrix factorization or other pure linear alegebra techniques. So we can use the gradient descent algoritm to save time in computation. Also, gradient descent allows for parrallization and distributed calculations across processors. 


### How does it work?

![](../pics/grad_dec/gd_work.png)

Steps:

1. Initialize the inputs (weights) randomly and select a learning rate
1. Calculate the gradient 
1. Adjust the inputs (weights) with the gradients
1. Use new inputs (weights) to repeats steps 2 and 3 until some condition (ie. inputs (weights) no longer significantly reduce error, max iterations, etc.)

### Note

There are 2 flavors or ways to approach gradient decent.

1. Pure Gradient Desent - You can update your inputs (weights) for each data instance
2. Batch Gradient Desent - You can update your inputs (weights) for after going through the full training set with an average

In our case, we will just be showing pure gradient descent and updating for every instance

### Coding Example

![](../pics/grad_dec/cost_fnt.png)

Visual repersentation of the cost or objective function:

![](../pics/grad_dec/cfg.jpg)

#### 1. INITIALIZE INPUTS (WEIGHTS)

In [None]:
# Start by initializing our inputs (weights)

import numpy as np

X = [np.random.random() * 10, np.random.random() * 10]
X

#### 2. CALCULATE GRADIENT 

Formula for Gradient in 2 Dimensions:

![](../pics/grad_dec/grad.jpg)

In [None]:
# Calculating our gradient!
# We can do this manually or with the python package sympy

import sympy as sp

# Define our initatial cost function or more generally our function to minimize
# In this case we will let x1 = x and x2 = y

x = sp.Symbol('x')
y = sp.Symbol('y')

fox = (x - 4) ** 2 + (y - 2) ** 2

dx_fox = sp.diff(fox, x)

dy_fox = sp.diff(fox, y)

grad_vec = [dx_fox, dy_fox]

grad_vec


In [None]:
# Make a function to return the value from the gradiant for a given input

def getGrad (grad_vec, var, input):
    
    if (var == 0):
        grad = grad_vec[var].evalf(subs={x:input})
    else:
        grad = grad_vec[var].evalf(subs={y:input})
    
    return grad

#### 3. UPDATE INPUTS (WEIGHTS) 

Equation showing how we should update our inputs (weights)

![](../pics/grad_dec/gd_uf.jpg)

In [None]:
# Make a gradient decent function from the above equation!
# eta is our learning rate in this case and we will set it to .01

def gradientDecent (X, eta):
    X[0] = X[0] - eta * getGrad(grad_vec, 0, X[0])
    X[1] = X[1] - eta * getGrad(grad_vec, 1, X[1])
    return X
    

In [None]:
# Now lets put it in a loop to see how it behave over many iterations

max_iter = 100

X1_list = []
X2_list = []

for i in range (0, max_iter):
    
    X = gradientDecent (X , .1)
    print(X)
    X1_list.append(X[0])
    X2_list.append(X[1])

In [None]:
# plot our findings

import matplotlib.pyplot as plt

# Make a list of 0 to 1000 so we can plot our iterations

iters = range(0, max_iter)

plt.figure(figsize=(20, 10))

plt.subplot(1, 2, 1)
plt.plot(iters, X1_list)
plt.title('X1')
plt.xlabel('Iterations')
plt.ylabel('X1 Inputs')

plt.subplot(1, 2, 2)
plt.plot(iters, X2_list)
plt.title("X2")
plt.xlabel('Iterations')
plt.ylabel('X2 Inputs')


plt.show()