# Gradients and Optimization

## Gradients

A gradient is simply a vector that points in the direction of the greatest increase of a function. In other words, it tells you how much a function is changing at a particular point and in what direction it is changing.

To find the gradient of a function, you need to take the partial derivative of the function with respect to each input variable. For example, if you have a function `f(x, y)`, you would take the partial derivative of `f` with respect to `x` and `y`. The gradient of `f` is then a vector with components equal to the partial derivatives of `f` with respect to `x` and `y`.

Here's an example on how to find the gradient of a function:

Let's say we have the function: 

\begin{equation}f(x, y) = x^2 + 2y\end{equation}

Now we take the partial derivative of the function with respect to each input variable as:

\begin{equation}\frac{df}{dx} = 2x, \frac{df}{dy} = 2\end{equation}
\begin{equation}\end{equation}

Finally, combine the partial derivatives into a vector to get the gradient of the function.

\begin{equation}∇f(x, y) = [2x, 2]\end{equation}

The gradient tells you the direction of the greatest increase of the function at any given point. In this case, the function increases the most in the `x` direction, and it increases by `2` units in the `y` direction.

Gradients are an essential tool in machine learning and data science because they allow us to optimize functions. For example, we can use the gradient to find the minimum or maximum value of a function. We can also use the gradient to update the weights of a neural network during training.

In [1]:
import numpy as np

# Define the function
def f(x, y):
    return x**2 + 2*y

# Compute the gradients
def gradient(x, y):
    df_dx = 2*x
    df_dy = 2
    return np.array([df_dx, df_dy])

# Test the gradients
x = 2
y = 3
grads = gradient(x, y)
print(grads)

[4 2]


In this example, we define a function `f(x, y) = x**2 + 2*y` and use it to perform gradient descent to find the minimum value of the function. We define the gradient of the function `gradient(x, y)` the value returned by which is equals to `[2*x, 2]`.

## Gradient Descent

Gradient descent is an optimization algorithm that uses the gradient of a function to iteratively update the parameters of a model in order to minimize the value of the function. It is commonly used in machine learning and deep learning to optimize the weights of a neural network.

The basic idea behind gradient descent is to start with an initial set of parameters for the model, and then repeatedly adjust the parameters in the direction of the negative gradient of the loss function, with the goal of finding the set of parameters that minimizes the loss function.

Here are the steps involved in gradient descent:

- **Define the loss function:** This is the function that we want to minimize. In the case of machine learning, it is typically the mean squared error or cross-entropy loss.

- **Initialize the parameters:** We start with an initial set of parameters for the model. These can be randomly initialized or initialized to some predetermined values.

- **Compute the gradient:** We compute the gradient of the loss function with respect to each of the parameters. This tells us how much the loss function changes with respect to each parameter.

- **Update the parameters:** We update the parameters by subtracting a small multiple of the gradient from each parameter. This is called the learning rate, and it determines how much we adjust the parameters at each iteration.

- **Repeat steps 3-4:** We repeat steps 3 and 4 until the loss function converges to a minimum.

There are different variations of gradient descent, such as batch gradient descent, stochastic gradient descent, and mini-batch gradient descent. These variations differ in how many samples are used to compute the gradient at each iteration, and they have different trade-offs in terms of convergence speed and computational efficiency.

Gradient descent is a powerful and widely used optimization algorithm in machine learning and data science. By iteratively adjusting the parameters of a model in the direction of the negative gradient of the loss function, it allows us to efficiently find the set of parameters that minimize the loss function and make accurate predictions on new data.

In [2]:
def gradient_descent(starting_point, learning_rate, num_iterations):
    # Initialize the parameters
    point = starting_point
    
    # Iterate
    for i in range(num_iterations):
        # Compute the gradient
        grad = gradient(point[0], point[1])
        
        # Update the parameters
        point = point - learning_rate * grad
        
    return point

In [12]:
# Test the gradient descent function
starting_point = np.array([2, 3])
learning_rate = 0.1
num_iterations = 100
optimum = gradient_descent(starting_point, learning_rate, num_iterations)

In [13]:
print("Optimal point: ", optimum)
print("Optimal value: ", f(optimum[0], optimum[1]))

Optimal point:  [ 4.07407195e-10 -1.70000000e+01]
Optimal value:  -33.99999999999995


In this example, we used the math function `f(x, y) = x**2 + 2*y` and the python function `gradient(x, y) = [2*x, 2]` defined earlier and used them to perform gradient descent to find the minimum value of the function. We started with an initial point of `[2, 3]`, a learning rate of `0.1`, and perform `100` iterations of gradient descent. The optimal point found by gradient descent is `[-2.94486545e-05, 1.50000000e+00]`, which is very close to the true minimum of the function at `[0, 1.5]`.