## Quadratic Forms and Steepest Descent

A quadratic form is a scalar, quadratic function of a vector of the form:
$$f (x) = \frac{1}{2} x^T Ax − b^T x + c,\text{ where }A = A^T$$

![quad](quad.png)

The gradient of a quadratic form is defined as $$ f^\prime(x) = \left( \begin{array}{c}                       \frac{\partial}{\partial x_1} f(x) \\                       \vdots \\                       \frac{\partial}{\partial x_n} f(x) \\                   \end{array} \right)$$

So to compute the derivative we simply insert our definition of
$$f(x) = \frac{1}{2} x^T A x - b^T x + c.$$ 
Then $$f^\prime(x) = Ax - b$$

We can now see that $f^\prime(x) = 0 \quad\Leftrightarrow\quad Ax - b = 0 
         \quad\Leftrightarrow\quad Ax = b$


This means that $Ax = b$ is equivalent to a minimisation problem. This minimisation problem is well-posed, i.e. has a unique minimum only if $A$ is positive definite. How do we solve that minimisation problem?

### Direction of Steepest Descent

The gradient $f^\prime(x)$ gives us the direction of steepest ascent. Since we know that 
$$f^\prime(x) = Ax - b = -r,$$ 
with $r$ being the residual $r = b-Ax$, we get that the residual $r$ is the direction of steepest descent.

Let us set up a simple finite difference problem with a Poisson matrix and a random right hand side.

In [None]:
import numpy as np
import numpy.matlib
import matplotlib.pyplot as plt
from scipy.sparse import diags
from ipywidgets import interact, FloatSlider, IntSlider
from IPython.display import display, clear_output

fig, ax = plt.subplots()
plt.close(fig)

N=2
h=0.25

dat = np.linspace(-20,20,100); x, y = np.meshgrid(dat,dat); xv = np.array([x.reshape(-1), y.reshape(-1)]).T

diagonals = [-np.ones(N-1), 2*np.ones(N), -np.ones(N-1)]
A = diags(diagonals, [-1, 0, 1]).toarray() #Poisson matrix
b = np.matlib.rand(N,1)                   # random right hand side


In [None]:
def f(x):
    return np.dot(np.dot(x.T,A),x) - np.dot(b.T,x)

h = np.array([f(xval) for xval in xv ]).reshape(100,100)
plt.contour(x, y, h,15);

In [None]:
def derivative_f(x): # derivative
    return np.dot(A,x.T) - b.T

dat = np.linspace(-5,5,8); x, y = np.meshgrid(dat,dat); xv = np.array([x.reshape(-1), y.reshape(-1)]).T
u = np.array([derivative_f(xval) for xval in xv ])[:,:,0]
v = np.array([derivative_f(xval) for xval in xv ])[:,:,1]
plt.quiver(x,y, u,v);

### Solving linear systems via Minimum Search

The basic idea is to find the minimum by moving into direction of steepest descent.
The most simple scheme is: $$x^{(i+1)} = x^{(i)} + \alpha r^{(i)}$$
where leaving $\alpha$ constant gives us a Richardson iteration (usually considered as a relaxation method). Choosing an $\alpha$ such that we move to lowest point in that direction gives us the steepest descent algorithm.

These types of algorithms are used in many different areas, work through minimisation.ipynb for some examples.

### Steepest Descent – find an optimal $\alpha$

We are performing a line search along the line: 
$$x^{(1)} = x^{(0)} + \alpha r^{(0)}.$$
We want to choose $\alpha$ such that  $f( x^{(1)} )$ is minimal, ie. such that $$\frac{\partial}{\partial\alpha} f (x^{(1)}) = 0$$
If we use chain rule, we get: $$ \frac{\partial}{\partial\alpha} f (x^{(1)})  = f^\prime (x^{(1)})^{\!T} \frac{\partial}{\partial\alpha} x^{(1)}           = f^\prime (x^{(1)})^{\!T} r^{(0)}$$
Remember $f^\prime (x^{(1)}) = -r^{(1)}$, thus: $$             - \left( r^{(1)} \right)^{\!T} r^{(0)} \stackrel{!}{=} 0. $$
This means that $f^\prime (x^{(1)}) = -r^{(1)}$ should be orthogonal to $r^{(0)}$

$$ \left( r^{(1)} \right)^{\!T} r^{(0)}        = \left( b - A x^{(1)} \right)^{\!T} r^{(0)} = 0 $$
$$ \left( b - A ( x^{(0)} + \alpha r^{(0)} ) \right)^{\!T} r^{(0)} = 0 $$
$$ \left( b - A x^{(0)} \right)^{\!T} r^{(0)} - \alpha \left( A r^{(0)} \right)^{\!T} r^{(0)} = 0 $$
$$ \left( r^{(0)} \right)^{\!T} r^{(0)} - \alpha \left( r^{(0)} \right)^{\!T} A r^{(0)} = 0 $$
  
 Solving for $\alpha$ gives:
 $$       \alpha  = \frac{ \left( r^{(0)} \right)^{\!T} r^{(0)} }{ \left( r^{(0)} \right)^{\!T} A r^{(0)} }$$
 
Using the value of $\alpha$ gives us the classical steepest descent algorithm.

#### Steepest Descent -- Algorithm

1. $ r^{(i)} = b - A x^{(i)} $
2. $ \displaystyle \alpha_i = \frac{ \left( r^{(i)} \right)^{\!T} r^{(i)} }  { \left( r^{(i)} \right)^{\!T} A r^{(i)} } $
3. $ x^{(i+1)} = x^{(i)} + \alpha_i r^{(i)}$

In [None]:
A[0,0] = 0.5; A[1,0]=0.0;A[0,1]=0.0; A[1,1]=2.5;b[0]=0.0;b[1]=0.0

ax.clear()
dat = np.linspace(-2.1,2.1,100);
x, y = np.meshgrid(dat,dat); xv = np.array([x.reshape(-1), y.reshape(-1)]).T    
h = np.array([f(xval) for xval in xv]).reshape(100,100);

In [None]:
def steepest_descent(num_it):
    x = np.array([2.0, 2./5.]).reshape(-1,1) # initial guess
    steps = x
    for i in range(0,num_it):
        r = b - np.dot(A,x).reshape(-1,1)             # compute residual
        alpha = np.dot(r.T,r)/np.dot(r.T,np.dot(A,r)) # compute step size
        x = x + np.multiply(alpha,r) # update
        steps = np.vstack((steps, x))
    return np.array(steps).reshape(num_it+1,2)

In [None]:
plt.contour(x, y, h, 30);
steps = steepest_descent(6)
plt.plot(steps.T[0], steps.T[1], "x-");

### Observations:

This method has a very slow convergence rate, we need many iterations of the method to reach the solution. Each iteration however can be computed in $O(N)$ since we only need matrix-vector products.
A detailed analysis reveals: $$ \left\| e^{(i)} \right\|_A \le           \left( \frac{ \kappa -1 }{ \kappa + 1 } \right)^i          \left\| e^{(0)} \right\|_A$$
 with condition number $$\kappa = \lambda_{\text{max}} / \lambda_{\text{min}},$$ where $\lambda$ are the largest and smallest eigenvalues of $A$ respectively. For positive definite $A$ this are always larger than zero.
