# Optimization Methods

Optimization methods find values that maximize or minimize an objective function, making them useful across disciplines such as engineering, economics, and data science.
Fundamentally, the action principle in physics is an optimization process, where nature selects paths that minimize or extremize an action integral.

## Gradient Descent Methods

Gradient Descent is one of the most widely used optimization techniques, particularly effective for high-dimensional problems in fields such as machine learning.
The method iteratively seeks the minimum of a function by taking steps proportional to the negative of its gradient, guiding the search toward lower function values.
For differentiable objective functions, gradient descent is fundamental in minimizing errors, making it indispensable for training machine learning models and refining physical models in computational astrophysics.

For a function $f(x)$, the gradient $\nabla f(x)$ points in the direction of steepest ascent.
Moving in the opposite direction—along the negative gradient—reduces the function's value. The algorithm updates the parameters iteratively according to:
\begin{align}
x_{n+1} = x_n - \alpha \nabla f(x_n)
\end{align}
where $\alpha$ is the learning rate, controlling the step size.
The choice of $\alpha$ is critical for convergence: 
a large $\alpha$ may cause divergence, where updates overshoot the minimum, while a very small $\alpha$ can lead to slow convergence, requiring many iterations to make meaningful progress.
Proper tuning of $\alpha$ ensures that the algorithm efficiently converges to a minimum without unnecessary oscillations or divergence.

In [None]:
def gd(df, x, alpha, imax=1000):
    for _ in range(imax):
        x -= alpha * df(x)
    return x

In [None]:
# Define the function and its gradient
def f(x):
    return (x - 3)**2 + 4

def df(x):
    return 2 * (x - 3)

# Parameters for gradient descent
x0    = 0.0  # Starting point for optimization
alpha = 0.1

# Run gradient descent
xmin = gd(df, x0, alpha)
print("Approximate minimum:")
print("  xmin  = ",   xmin )
print("f(xmin) = ", f(xmin))

In [None]:
def gd_hist(df, x, alpha, imax=1000):
    X = [x]
    for _ in range(imax):
        X.append(X[-1] - alpha * df(X[-1]))
    return X

In [None]:
import numpy as np
from matplotlib import pyplot as plt

X = np.linspace(0, 6, 6001)
plt.plot(X, f(X))

alpha = 0.1

X = np.array(gd_hist(df, x0, alpha))
print(X[-1])

plt.plot(X, f(X), '-o')
plt.xlim(2.5, 3.5)
plt.ylim(3.95,4.3)

```{exercise}
What will happen if we change the learning rate $\alpha$?

Comment out the plot limits `plt.xlim(2.5, 3.5)` and `plt.ylim(3.95,4.3)` and then try $\alpha = 0.1$, $0.5$, $0.9$, $1.0$, and $1.1$.
```