# Lecture12

## Nonlinear Systems of Equations

### Outline:
1. Review Jacobian
2. Motivating Example
3. Nonlinear Systems of Equation
4. Picard's Method
5. Newton's Method
6. Inexact Newton Method
7. Line Search
8. Semi-smooth Newton's Method

### 1. Jacobians

#### A. Gradients

<font color=brown>**Def:**</font> Let $f:R^m \rightarrow R$. The gradient of $f$ at $x$ in $R^m$ is a vector $g$(if it exists) that satisfy following:

$$\lim_{n\rightarrow \infty}\frac{|f(x+h) - f(x) - <g, h>|}{||g||} = 0$$

The formula for the gradient looks something like:
$$\nabla f(x)' = \left(
    \frac{\partial f}{\partial x_{1}},
    \frac{\partial f}{\partial x_{2}},
    \dots,
    \frac{\partial f}{\partial x_{m}}
  \right)$$

<font color=blue>**HW Problem:**
Find a function whose partial derivative exists but whose gradient does not
</font>

**reference:**
https://calculus.subwiki.org/wiki/Existence_of_partial_derivatives_not_implies_differentiable

<font color=darkgreen>**My solution:**

Consider for the following function:

$$f(x,y) = \begin{cases}
  \frac{xy}{x^2 + y^2} & \text{if}(x,y) \neq (0,0)\\    
  0    & \text{otherwise}
\end{cases}$$

The partial derivative exist everywhere. Even in the origin. In the origin, $f_x(0,0) = 0$ and $f_y(0,0) = 0$. On the other hand $f$ is not differentiable at $(0,0)$. For example, let $v = (\frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}})$. Then $$\frac{f(0 + tv) - f(0)}{t} = \frac{1}{2t}$$
which does not have a limit as $t \rightarrow 0$.
</font>

#### B. Jacobian

<font color=brown>**Definition:**</font> Let $f:R^m \rightarrow R^n$. The gradient of $f$ at $x \in R^m$ is a matrix $J$(if it exists) that satisfy following:

$$\lim_{n\rightarrow \infty}\frac{||f(x+h) - f(x) - Jh||_{(n)}}{||h||_{(m)}} = 0$$

<font color=red>**Question:**what do these supscripts mean
</font>

The formula for the Jacobian looks something like,
$$J(f;x) = \begin{bmatrix}
             \frac{\partial f_{1}}{\partial x_{1}}   &
	       \frac{\partial f_{1}}{\partial x_{2}} &
               \cdots                                &
               \frac{\partial f_{1}}{\partial x_{n}} \\
	     \vdots                                  &
	       \vdots                                &
               \cdots                                &
	       \vdots \\
             \frac{\partial f_{m}}{\partial x_{1}}   &
	       \frac{\partial f_{m}}{\partial x_{2}} &
	       \cdots                                &
	       \frac{\partial f_{m}}{\partial x_{n}}
	   \end{bmatrix}$$

<font color=blue>**HW Problem:**
Characterize the relationship between the Jacobian and the gradient
</font>

<font color=darkgreen>**My solution:**
$$J(f;x) = \begin{bmatrix}
             \frac{\partial f_{1}}{\partial x_{1}}   &
	       \frac{\partial f_{1}}{\partial x_{2}} &
               \cdots                                &
               \frac{\partial f_{1}}{\partial x_{n}} \\
	     \vdots                                  &
	       \vdots                                &
               \cdots                                &
	       \vdots \\
             \frac{\partial f_{m}}{\partial x_{1}}   &
	       \frac{\partial f_{m}}{\partial x_{2}} &
	       \cdots                                &
	       \frac{\partial f_{m}}{\partial x_{n}}
	   \end{bmatrix} = \begin{bmatrix} \nabla f_1(x)' \\
       \vdots \\
       \nabla f_m(x)'
       \end{bmatrix}$$
</font>

### 2. Motivating Problem

Discuss with the team

### 3. Nonlinear Equations

**(1). Problem Formulation:** 
$$0 = F(x), \,\,\, F:R^m \rightarrow R^n$$
$$x = G(x), \,\,\, F:R^m \rightarrow R^n$$

**(2). Uniqueness of solution:** 

Linear case of Implicity Function Theorem. 

**reference:**   http://www.math.ucsd.edu/~jverstra/20e-lecture13.pdf

Let $M \in R^{n\times m}$, $m > n$, $rank(M) = n$. WLOG let $M = [A\,\, B]$, $A \in R^{n\times n}$ and invertible, $B \in R^{n \times (m-n)}$. Then for any $x \in R^{m-n}$ and $z = \begin{bmatrix} -A^{-1}Bx \\ x \end{bmatrix}$ satisfies $Mz = 0$

IFT: Let $F: R^m \rightarrow R^n$ continuous differentiable, $J: R^m \rightarrow R^{n \times m}$. Suppose $z^*$ satisfies $F(z^*) = 0$. WLOG $J(z^*) = [J_1(z^*)\,\,\,J_2(z^*)]$, $J_1(z^*) \in R^{n \times m}$, invertible. There is a neighborhood $U \subseteq R^{m-n}$ and continuous $g: U \rightarrow R^n$ s.t. $x \in U$ then $z = \begin{bmatrix} g(x) \\ z\end{bmatrix}$ satisfies $F(z) = 0$.

<font color=blue>**HW Problem:**
<br>
1.Using IFT, when will $F(x) = 0$ have isolated solution
<br>
2.What happens if $F(z^*) = 0$ and $rank(J(z^*)) < min(m,n)$

</font>

<font color=darkgreen>**My solution:**
<br>
1.When $m=n$, the neighborhood shrink to be only one point
<br>
2.If $rank(J(z^*)) < min(m,n)$, then $J_1(z^*)$ is not full rank and thus will be a singular matrix (Quesions....)
</font>

# Lecture13

### 4. Picard's Method

<font color=brown>**Definition:**</font> When $x = G(x)$, Picard' method is $x^+ = G(x^c)$ i.e. $x_1 = G(x_0)$,$x_2 = G(x_1)$, etc.

<font color=brown>**Definition:**</font> A map $G: R^m \rightarrow R^n$ is a contraction on a closed set $D \subseteq R^m$ if 
1. $x\in D$ then $G(x) \in D$ 
2. $\exists \alpha \in (0,1)$ s.t. $\forall x, y \in D$, $||G(x) - G(y)|| \leq \alpha ||x-y||$

<font color=brown>**Theorem:**</font> If G is a contraction on a closed set $D \subseteq R^m$. Then,
1. There exists a unique $x^* \in D$ s.t.$x^* = G(x^*)$
2. $x_0 \in D$ and $x_{k+1} = G(x_k)$ then, $x_k \rightarrow x^*$ 

<font color=blue>**HW Problem:**
    <br>
1. Prove this result, why $D$ needs to be closed (consider limit points)
<br>
2. Does this result hold in arbitrary metric space? (No)
</font>

<font color=black>**My solution:**

<br>
<font color=brown>**Theorem:**
Let $(X, d)$ be a complete metric space, if $Y \subseteq X$ and $Y$ is closed then $(Y, d)$ is a complete metric space. (proof skipped)
</font> 

**Existence:**

Define a sequence $\{x_k\}$ in $D$ s.t. $x_{k+1} = G(x_k)$.

First let us show that $\{x_k\}$ is cauchy sequence.

$\forall n > m \geq 1$ we have: 

$$\begin{align}
||x_n - x_m|| &= ||G^n(x_0) - G^m(x_0)|| \\
&\leq \alpha||G^{n-1}(x_0) - G^{m-1}(x_0)|| \\
& \vdots \\
& \leq \alpha^m ||G^{n-m}(x_0) - x_0|| \\
& \leq \alpha^m (||G^{n-m}(x_0) - G^{n-m-1}(x_0)|| + ||G^{n-m-1}(x_0) - G^{n-m-2}(x_0)|| + \cdots + ||G(x_0) - x_0||)\\
& \leq \alpha^m(\alpha^{n-m-1}||G(x_0) - x_0|| + \alpha^{n-m-2}||G(x_0) - x_0||+ \cdots + ||G(x_0) - x_0||)\\
& = \alpha^m [\sum_{k=0}^{n-m-1}\alpha^k]||G(x_0)-x_0|| \\
& \leq \alpha^m [\sum_{k=0}^{\infty}\alpha^k]||G(x_0)-x_0|| \\
& = \frac{\alpha^m }{1-\alpha}||G(x_0)-x_0|| 
\end{align}$$

Thus $\{x_k\}$ is Cauchy sequence. Also $(R^n, ||.||)$ is complete metric space, this implies that $(D, ||.||)$ is complete metric space from the theorem above. Thus, $\{x_k\}$ has a limit (definition of complete metric space.).
Thus, $$\exists x \in D,\,\, s.t.\,\, \lim_{k\rightarrow \infty} x_k = x$$

Also $G$ is a continuous function. Thus,

$$G(x) = G(\lim_{k\rightarrow \infty} x_k) = \lim_{k\rightarrow \infty}G(x_k) = \lim_{k\rightarrow \infty} x_{k+1} = x$$

**uniqueness:**
if $\exists x,y$ s.t. $x = G(x)$ and $y = G(y)$. Then,
$$\alpha ||x - y|| \geq ||G(x) - G(y)|| = ||x - y||$$
Contradiction, given $\alpha < 1$
</font>

The above statement already prove the whole theorem and explain why $D$ has to be closed.

This result does not hold for an arbitrary metric space. But it holds for any **complete** metric space. 

<font color=brown>**Example:**</font> 
$G(x) = 0.5(x + \frac{4}{x})$, $x_0 = 10$, $x_1 = 5.2$, ..., $x_4 = 2.006$, $x_5 = 2.0$

<font color=blue>**HW Problem:**
    <br>
1. Calculate $x=G(x)$ by hand, what is $G(x)$ doing
<br>
2. Show that G(x) is a contraction
</font>

<font color=darkgreen>**My solution:**
<br>
1. $x = 2$ or $x = -2$
<br>
2. Do we need specify $x > 0$?
</font>

### 5. Newton's Method

**reference:**

https://www.lakeheadu.ca/sites/default/files/uploads/77/docs/RemaniFinal.pdf

https://www.math.ntnu.no/emner/TMA4123/2012v/notater/nr-systems-a4.pdf

<font color=brown>**Definition:**</font> Given $F: R^m \rightarrow R^m$ with a continuoud Jacobian $J:R^m \rightarrow R^{m \times m}$. Newton's method is the following sequence:
$$x_+ = x_c - J(x_c)^{-1}F(x_c)$$

Intuition if $x^*$ s.t. $F(x^*) = 0$, $0 = F(x^*) \simeq F(x_c) + J(x_c)(x^* - x_c)$

**A. Local Convergence Theorem**

Assumption: $F: R^m \rightarrow R^m$ and a continuous Jacobian $J:R^m \rightarrow R^{m \times m}$. There is an $x^*$ and an $p^*$. 

1. $F(x^*) = 0$ and $J(x^*)$ is nonsingular.
2. (Lipschitz Continuous) $\exists \gamma >0$ s.t. $\forall x, y \in B(x^*, p^*)$, we have $||J(x)-J(y)|| \leq \gamma ||x-y||$.

<font color=brown>**Theorem:**</font> If the assumption holds and $||x_c - x^*|| \leq \min(p^*,\frac{1}{2\gamma||J(x^*)^{-1}||})$. Then,

1. $J(x_c)$ is non-singular and $||J(x_c)^{-1}|| \leq 2||J(x^*)^{-1}||$
2. $e_+ = x_+ - x^*$ and $e_c = x_c - x^*$, $||e_+|| \leq ||J(x^*)^{-1}||\,||e_c||^2 \gamma \leq \frac{||e_c||}{2}$

<font color=blue>**HW Problem:**
    <br>
1. What is the impact of the singular values of $J(x^*)$ on the "localness" of the result
<br>
2. Why is the first statement import?
<br>
3. Prove + interpret
</font>

<font color=darkgreen>**My solution:**
<br>
1. 
<br>
2. 
<br>
3.
</font>

<font color=brown>**Lemma:**</font> Suppose $A, B$ and $A$ is non-singular. If for some $\epsilon \in (0,1)$, $||AB -I|| < 1 - \epsilon$ then B is invertible and 

1. $||A-B^{-1}|| \leq (1-\epsilon)||B^{-1}||$
2. $||B^{-1}|| \leq ||A||$

**proof of theorem:**
(1). $$\begin{align} ||I - J(x^*)^{-1}J(x_c)|| &\leq ||J(x^*)^{-1}(J(x^*) - J(x_c))|| \\
& \leq ||J(x^*)^{-1}||\,||J(x^*)^{-1}(J(x^*) - J(x_c))|| \\
& \leq ||J(x^*)^{-1}||\,\gamma ||x^* - x_c|| \\
& \leq \frac{||J(x^*)^{-1}||\gamma}{2\gamma||J(x^*)^{-1}||} \\
& = \frac{1}{2}
\end{align}$$

By applying the lemma above, we can get:
$$||J(x_c)^{-1}|| \leq 2||J(x^*)^{-1}||$$

(2) we have $x_+ = x_c - J(x_c)^{-1}F(x_c)$ then,
$$\begin{align} e_+ &= e_c - J(x_c)^{-1}F(x_c) \\ 
&= J(x_c)^{-1}(J(x_c)e_c - F(x_c)) \\
&= J(x_c)^{-1}(J(x_c)e_c - (F(x_c) - F(x^*))) \\
&= J(x_c)^{-1}(J(x_c)e_c - (\int_{0}^{1} J(x^*+te_c)e_c dt))
\end{align}$$

Further,
$$\begin{align}
||e_+|| &= ||J(x_c)^{-1}(J(x_c)e_c - (\int_{0}^{1} J(x^*+te_c)e_c dt))|| \\
& \leq ||J(x_c)^{-1}||\int_{0}^{1}||J(x_c) - J(x^*+te_c)||\,||e_c|| dt \\
& \leq ||J(x_c)^{-1}||\,||e_c||^2 \int_{0}^{1} \gamma(1-t)dt \\
& = \frac{\gamma}{2}||J(x_c)^{-1}||\,||e_c||^2
\end{align}$$

<font color=blue>**HW Problem:**
    <br>
What happen if I use the mean value theorem.
</font>

<font color=darkgreen>**My solution:**
<br>
First, here is a very good summary of taylor theorem on single-variable functions, multi-variable scalar-valued functions and Multi-variable vector-valued functions:
http://fourier.eng.hmc.edu/e176/lectures/NM/node45.html

<br>
<br>
We know $x_c  = x^* + e_c$, we have the following two taylor theorem equations for multi-variable scalar-valued functions (check out the notes in CS 726):

$$f(x+p) = f(x) + \int_{0}^{1}\nabla f(x+\gamma p)^Tp d\gamma$$
$$f(x+p) = f(x) + \nabla f(x+\gamma p)^Tp, \,\,\, for\,some\, \gamma \in (0,1)$$

By applying the first one we get,
$$F(x_c) = F(x^*) + \int_{0}^{1} J(x^*+te_c)e_c dt$$
By applying the second one we get (this is just first one using mean value theorem),
$$F(x_c) = F(x^*) + J(x^* + \omega e_c)e_c,\,\, \exists \omega \in (0,1)$$

Thus,

$$\begin{align}
||e_+|| &= ||J(x_c)^{-1}[J(x_c)e_c - J(x^* + \omega e_c)e_c]|| \\
& \leq ||J(x_c)^{-1}||\,|| J(x_c) - J(x^* + \omega e_c) ||\,||e_c|| \\
& \leq ||J(x_c)^{-1}||\,||e_c||^2\,||\omega||\gamma \\
& \leq ||J(x_c)^{-1}||\,||\omega||\gamma
\end{align}$$
</font>

<font color=brown>**Definition: Rates of Convergence**</font>
Let $\{x_k\} \subseteq R_{>0}$ s.t. $x_k \rightarrow 0$
1. If $\limsup \frac{x_{k+1}}{x_k} = p \in (0,1)$ $\implies$ converges Q-linearly
2. If $\limsup \frac{x_{k+1}}{x_k} = 0)$ $\implies$ converges Q-superlinearly
3. If $\limsup \frac{x_{k+1}^2}{x_k} = L > 0 $ $\implies$ converges Q-quadratically
4. $\{y_k\} \subseteq R_{>0}$ s.t. $x_k \leq y_k$
then $y_k$ converges Q-linearly, Q-superlinearly, Q-quadratically $\implies$ $x_k$ converges R-linearly, R-superlinearly, R-quadratically.

**B. Stopping Criterion:**

Option 1. $||x_{k+1} - x_k||$ when is this below a threshold?

Option 2. $||F(x_k)|| \leq \tau_a + \tau_c ||F(x_0)||$

<font color=blue>**HW Problem:**
    <br>
Using FTC, $e = x-x^*$ and Prove that if $e_0 = x_0-x^*$ and $e$ are sufficiently small
    $$\frac{3}{5}k(J(x^*))^{-1}\frac{||e||}{||e_0||}\leq \frac{||F(x)||}{||F(x_0)||}\leq \frac{5}{3}k(J(x^*))\frac{||e||}{||e_0||}$$
    
   <br>
   1. How should we choos $\tau_a$ and $\tau_c$?
       
   <br>
   2. Guiding Principles?
</font>

# Lecture14

### 5. In exact Newton's Method

We want to solve:
$$x_+ = x_c + S(= x_c - J(x_c)^{-1}F(x_c))$$
$$J(x_c)s + F(x_c) = 0$$

**1.$J(x_c)$ cannot be evaluated precisely:**
$$||J_c - J(x_c)|| \leq \Delta_c$$
$$J_cs + F(x_c) = 0$$

**2.F(x_c) cannot be evaluated method:**
$$||F_c - F(x_c)|| \leq \epsilon_c$$
$$J(x_c)s + F_c = 0$$
$$J_cs + F_c = 0$$

**3.Might not be able to solve for s exactly:**
$$||J_cs+F_c|| \leq \eta_c ||F_c||$$

**Theorem:** General assumption holds and $||x_c - x^*|| \leq min(\frac{1}{2\gamma||J(x^*)^{-1}||}, \rho^*)$. Then $\exists k > 0$ s.t. $$||e_+|| \leq k(||e_c||^2 + (\eta_c + \Delta_c)||e_c|| + \epsilon_c)$$

**A.Chorel Method:**

Description: 1. Only compute the Jacobian every $v$ iterations.

**B.Forward Difference:**

Description: replace $J(x_c)$ with its numerical difference approximation

Let $f$ be $R^m \rightarrow R$ is differentiable with Lipschitz continuous gradient.($\exists L > 0$s.t. $||\nabla f(x) - \nabla f(y)^T|| \leq L||x-y||$)

<font color=blue>**HW Problem:**
    <br>
Prove $|f(y+h) - f(y) - \nabla f(y)'h| \leq \frac{L}{2}||h||^2$
</font>

<font color=darkgreen>**My solution:**
    
</font>

Choose $h = \epsilon e_i$, $e_i$ is standard basis vector,
$$\nabla f(y)^Te_i = \frac{f(y+\epsilon e_i)-f(y)}{\epsilon} + \gamma_i, \,\, |\gamma| \leq \frac{L\epsilon}{2}$$

<font color=blue>**HW Problem:**
    <br>
1. If $f$ can be evaluated at any input with accuracy $l_fU$, $l_f > 0$. Show that the right choice of $\epsilon \propto \sqrt{U}$
        <br>
    2. Coose a simple differentiable function and test out a bunch of $\epsilon$ and particularly compare to $\epsilon = \sqrt{U}$
        <br>
    3. How do I do Forward Difference Approximation for $F:R^m \rightarrow R^n$
</font>

**Algorithm:** $f$:Function, $\epsilon$:approximation size

1.$g \rightarrow$ zeros(length(x)), $I \rightarrow$ eye(length(x))

2.for $i$ = $1$:length(x)

2a.$g(i) = \frac{f(x+\epsilon I[.,i]) - f(x)}{\epsilon}$

2b.end

3.return

**C. Jacobian Free Newton Krylov Method**
- $J_cs + F(x) = 0$
- iteratively
- Forward difference for $J_c$

**D. Quasi-Newton Methods** (Brayden Methods)
- iterates: $x_-$,$x_c$,$x_+$
- Jacobian Approximation: $J_-$,$J_c$

Recall: $f: R \rightarrow R$ the secant equation of derivative 
$$b_c = \frac{f(x_c) - f(x_-)}{x_c - x_-} = \frac{y}{s}$$

<font color=brown>**Define:**</font>

$y_c = F(x_c) - F(x_-)$, $s_c = x_c - x_-$ (recall $x_c = x_- + s_c$), $J_c = J_- + \frac{(y_c - J_-s_c)s_c^T}{s_c^Ts_c}$. Then we will get $s$ by solving $J_cs = F(x_c)$

<font color=brown>**Sherman-Morrison**</font>

$$(A + uv^T)^{-1} = A^{-1} - \frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}$$

<font color=blue>**HW Problem:**
    <br>
1.Verify that S-M formula is true
        <br>
2.Use it to explicitly express $J_c^{-1}$ in terms of $J^{-1}$
</font>

<font color=darkgreen>**My solution:**
1. We only need to show $(A + uv^T)(A^{-1} - \frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}) = I$
   $$\begin{align}
   (A + uv^T)(A^{-1} - \frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u}) &= AA^{-1} - \frac{AA^{-1}uv^TA^{-1}}{1+v^TA^{-1}u} + uv^TA^{-1} - \frac{uv^TA^{-1}uv^TA^{-1}}{1+v^TA^{-1}u} \\
   &= I + uv^TA^{-1} - \frac{uv^TA^{-1}uv^TA^{-1}uv^TA^{-1}}{1+v^TA^{-1}u} \\
   & = I + uv^TA^{-1} - \frac{u(1+v^TA^{-1}u)v^TA^{-1}}{1+v^TA^{-1}u} \\
   & = I
   \end{align}$$
</font>

<font color=darkgreen>**My solution:**
2. 
$$\begin{align}
(J_c)^{-1} &= (J_- + \frac{(y_c - J_-s_c)s_c^T}{s_c^Ts_c})^{-1} \\
&= J_-^{-1} - \frac{J_-^{-1}(y_c - J_-s_c)\frac{s_c^T}{s_c^Ts_c}J_-^{-1}}{1+\frac{s_c^T}{s_c^Ts_c}J_-^{-1}(y_c - J_-s_c)} \\
&= J_-^{-1} - \frac{J_-^{-1}y_cs_c^TJ_-^{-1} - s_cs_c^TJ_-^{-1}}{s_c^TJ_-^{-1}y_c}
\end{align}$$
</font>