<a href="https://colab.research.google.com/github/ttruong1000/MAT-494-Mathematical-Methods-for-Data-Science/blob/main/3_3_Unconstrained_Optimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **3.3 - Unconstrained Optimization**

### **3.3.0 - Python Libraries for Unconstrained Optimization**

In [1]:
import numpy as np
from scipy.optimize import minimize

### **3.3.1 - Necessary and Sufficient Conditions of Local Minimizers**

##### Definition 3.3.1.1 - Global Minimizer

Let $f: \mathbb{R}^d \to \mathbb{R}$. The point $\mathbf{x}* \in \mathbb{R}^d$ is a global minimizer of $f$ over $\mathbb{R}^d$ if
\begin{equation*}
  f(\mathbf{x})) \geq f(\mathbb{x}^*)
\end{equation*}
for all $\mathbf{x} \in \mathbb{R}^d$.

##### Definition 3.3.1.2 - (Strict) Local Minimizer

Let $f: \mathbb{R}^d \to \mathbb{R}$. The point $\mathbf{x}* \in \mathbb{R}^d$ is a local minimizer of $f$ over $\mathbb{R}^d$ if there is a $\delta > 0$ such that
\begin{equation*}
  f(\mathbf{x})) \geq f(\mathbb{x}^*)
\end{equation*}
for all $\mathbf{x} \in B_{\delta}(\mathbf{x}^*) \setminus \{\mathbf{x}^*\}$. If the inequality is strict, we say that $\mathbf{x}^*$ is a strict local minimizer. Alternatively, $\mathbf{x}^*$ is a local minimizer if there is an open ball around $\mathbf{x}^*$ where it attains the minimum value. 

##### Definition 3.3.1.3 - Descent Direction

Let $f: \mathbb{R}^d \to \mathbb{R}$. A vector $\mathbf{v}$ is a descent direction for $f$ at $\mathbf{x}_0$ if there is an $\alpha^* > 0$ such that
\begin{equation*}
  f(\mathbf{x}_0 + \alpha\mathbf{v}) < f(\mathbf{x}_0)
\end{equation*}
for all $\alpha \in (0, \alpha^*)$.

##### Lemma 3.3.1.4 - Descent Direction and Directional Derivatives

Let $f: \mathbb{R}^d \to \mathbb{R}$ be continuously differentiable at $\mathbf{x}_0$. A vector $\mathbf{v}$ is a descent direction for $f$ at $\mathbf{x}_0$ if
\begin{equation*}
  \frac{\partial f(\mathbf{x}_0)}{\partial \mathbf{v}} = \nabla f(\mathbf{x}_0)^T\mathbf{v} < 0
\end{equation*}
that is, the directional derivative of $f$ at $\mathbf{x}_0$ in the direction of $\mathbf{v}$ is negative.

##### Lemma 3.3.1.5 - Existence of a Descent Direction

Let $f: \mathbb{R}^d \to \mathbb{R}$ be continuously differentiable at $\mathbf{x}_0$ and assume that $\nabla f(\mathbf{x}_0) \neq 0$. Then, $f$ has a descent direction at $\mathbf{x}_0$.

##### Theorem 3.3.1.6 - First-Order Necessary Condition for Local Minimizers

Let $f: \mathbb{R}^d \to \mathbb{R}$ be continuously differentiable on $\mathbb{R}^d$. If $\mathbf{x}_0$ is a local minimizer, then $\nabla f(\mathbf{x}_0) = 0$.

##### Definition 3.3.1.7 - Positive Semi-Definite Matrix (PSD)

A square symmetric $d \times d$ matrix $H$ is positive semi-definite (PSD) if $\mathbf{x}^TH\mathbf{x} \geq 0$ for any $\mathbf{x} \in \mathbb{R}^d$.

##### Theorem 3.3.1.8 - Second-Order Necessary Condition for Local Minimizers

Let $f: \mathbb{R}^d \to \mathbb{R}$ be continuously differentiable on $\mathbb{R}^d$. If $\mathbf{x}_0$ is a local minimizer, then $\mathbf{H}_f(\mathbf{x}_0)$ is PSD.

##### Theorem 3.3.1.9 - Second-Order Sufficient Condition for Local Minimizers

Let $f: \mathbb{R}^d \to \mathbb{R}$ be continuously differentiable on $\mathbb{R}^d$. If $\nabla f(\mathbf{x}_0) = 0$ and $\mathbf{H}_f(\mathbf{x}_0)$ is positive definite, then $\mathbf{x}_0$ is a strict local minimizer.

### **3.3.2 - Convexity and Global Minimizers**

##### Definition 3.3.2.1 - Convex Sets


A set $D \subseteq \mathbb{R}^d$ is convex if for all $\mathbf{x}, \mathbf{y} \in D$ and for all $\alpha \in [0, 1]$, $(1 - \alpha)\mathbf{x} + \alpha\mathbf{y} \in D$.

##### Definition 3.3.2.2 - Convex Functions

A function $f: \mathbb{R}^d \to \mathbb{R}$ is convex if, for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$ and all $\alpha \in [0, 1]$,
\begin{equation*}
  f((1 - \alpha)\mathbf{x} + \alpha\mathbf{y}) \leq (1 - \alpha)f(\mathbf{x}) + \alpha f(\mathbf{y})
\end{equation*}
More generally, a function $f: D \to \mathbb{R}$ over a convex domain $D \subseteq \mathbb{R}^d$ is convex if the definition above holds over all $\mathbf{x}, \mathbf{y} \in D$.

##### Lemma 3.3.2.3 - Affine Functions are Convex

Let $\mathbf{w} \in \mathbb{R}^d$ and $b \in \mathbb{R}$. The function $f(\mathbf{x}) = \mathbf{w}^T\mathbf{x} + b$ is convex.

##### Lemma 3.3.2.4 - First-Order Convexity Condition

Let $f: \mathbf{R}^d \to \mathbb{R}$ be continuously differentiable. Then, $f$ is convex if and only if for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$, $f(\mathbf{y}) \geq f(\mathbf{x}) + \nabla f(\mathbf{x})^T(\mathbf{y} - \mathbf{x})$.

##### Lemma 3.3.2.5 - Second-Order Convexity Condition

Let $f: \mathbf{R}^d \to \mathbb{R}$ be twice continuously differentiable. Then, $f$ is convex if and only if for all $\mathbf{x} \in \mathbb{R}^d$, $\mathbf{H}_f(\mathbf{x})$ is PSD.

##### Theorem 3.3.2.6 - Global Minimizers of Convex Functions

Let $f: \mathbb{R}^d \to \mathbb{R}$ be a continuously differentiable, convex function. If $\nabla f(\mathbf{x}_0) = 0$, then $\mathbf{x}_0$ is a global minimizer.

##### Theorem 3.3.2.7 - Any Local Minimizer of Convex Functions are Global Minimizers

Let $f: \mathbb{R}^d \to \mathbb{R}$ be a convex function. Then, any local minimizer of $f$ is also a global minimizer.

### **3.3.3 - Gradient Descent**

##### Lemma 3.3.3.1 - Steepest Descent

Let $f: \mathbb{R}^d \to \mathbb{R}$ be continuously differentiable at $\mathbf{x}_0$. For any unit vector $\mathbf{v} \in \mathbb{R}^d$,
\begin{equation*}
  \frac{\partial f(\mathbf{x}_0)}{\partial \mathbf{v}} \geq \frac{\partial f(\mathbf{x}_0)}{\partial \mathbf{v}^*}
\end{equation*}
where
\begin{equation*}
   \mathbf{v}^* = -\frac{\nabla f(\mathbf{x}_0)}{||\nabla f(\mathbf{x}_0)||}
\end{equation*}

##### Theorem 3.3.3.2 - Step Size to Minimize Steepest Descent

Suupose that $f: \mathbb{R}^d \to \mathbb{R}$ is twice continuously differentiable. The step size is chosen to minimize
\begin{equation*}
  \alpha_k = \text{arg }\min_{\alpha > 0} f(\mathbf{x}^k - \alpha\nabla f(\mathbf{x}^k))
\end{equation*}
Then, the steepest descent started from any $\mathbf{x}^0$ produces a sequence $\mathbf{x}^k$, $k = 1, 2, \ldots$ such that if $\nabla f(\mathbf{x^k}) \neq 0$, then
\begin{equation*}
  f(\mathbf{x}^{k+  1}) \leq f(\mathbf{x}^k)
\end{equation*}
for all $k \geq 1$.

### **3.3.4 - References**

1. MAT 494 Chapter 3 Notes