<a href="https://colab.research.google.com/github/wdconinc/practical-computing-for-scientists/blob/master/Lectures/lecture14.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture #14

##Standard Preamble

In [0]:
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import math
import numpy.matlib as ml

##In our last episode

* Newton's method in 2D: an example
* The partial derivative of a function of more than one variable
* The gradient vector
* The Hessian matrix

## The Newton method for scalar functions of one variable

As you know, from calculus, we can expand any well behaved function $f(x)$ in a series around a point $\bar{x}$:

$$f(x) = f(\bar{x}) + \frac{\partial f}{\partial x}\big|_\bar{x}(x-\bar{x}) + \frac{\partial^2 f}{\partial x^2}\big|_\bar{x} \frac{1}{2}(x-\bar{x})^2 $$

or with $\Delta x = x - \bar{x}$


$$f(x) = f(\bar{x} + \Delta x) = f(\bar{x}) + \frac{\partial f}{\partial x}\big|_\bar{x} \Delta x + \frac{\partial^2 f}{\partial x^2}\big|_\bar{x} \frac{1}{2} \Delta x^2 $$

We will successively improve our estimate of the minimum by choosing different points $\bar{x}_n$ that are closer to the minimum. We want $\bar{x}_n$ to be a stationary point, so that
$$ \frac{df(\bar{x} + \Delta x)}{d\Delta x} = 0$$

This happens when 
$$ \frac{\partial f}{\partial x}\big|_{\bar{x}_n} + \frac{\partial^2 f}{\partial x^2}\big|_{\bar{x}_n} \Delta x = 0 $$

So we update successively with this $\Delta x$:
$$ \bar{x}_{n+1} = \bar{x}_n + \Delta x =\bar{x}_n - \frac{\partial f}{\partial x}\big|_{\bar{x}_n} / \frac{\partial^2 f}{\partial x^2}\big|_{\bar{x}_n} $$


## The Newton method for scalar functions of more than one variable

In analogy with the one-dimensional function increment $\Delta x = -f'(x) / f''(x)$ we now want

$$ \Delta\vec{x} = - [H f(\vec{x})]^{-1} \nabla f(\vec{x}) $$

###The partial derivative

$$ \frac{\partial f}{\partial x_i} $$

In [0]:
def partial(f, i, h = 1e-6):
    ''' 
        Returns a function object to compute the partial derivative of f with respect to x[i].
        
        f(x) is assumed to be a scalar function of a vector or scalar argument x.
    '''
    def df(x, f = f, i = i, h = h):
        x = np.array(x, dtype = np.float64) # make a copy and assure the use of 64-bit floats
        x[i] += h
        f_plus = f(x)
        x[i] -= 2*h
        f_minus = f(x)
        return (f_plus - f_minus) / (2.0 * h)
    # note, partial() returns a function object, not the result of the function
    return df

### The gradient

$$ \mathbf{\nabla} f = \begin{bmatrix}
\dfrac{\partial f}{\partial x_{1}} \\[2.2ex]
\dfrac{\partial f}{\partial x_{2}} \\[2.2ex]
\vdots \\[2.2ex]
\dfrac{\partial f}{\partial x_{n}}
\end{bmatrix}$$

In [0]:
def gradient(f, x, h = 1e-6):
    ''' return the gradient of f(x) as a column vector with length = len(x) '''
    v = [ partial(f, i, h = h)(x)
         for i in range(len(x)) ]

    return ml.matrix(v, dtype = np.float64).T

### The Hessian matrix

$$\mathbf{H} f = \begin{bmatrix}
\dfrac{\partial^{2}f}{\partial x_{1}^{2}} & \dfrac{\partial^{2}f}{\partial x_{1}\,\partial x_{2}} & \cdots & \dfrac {\partial^{2}f}{\partial x_{1}\,\partial x_{n}} \\[2.2ex]
\dfrac{\partial^{2}f}{\partial x_{2}\,\partial x_{1}} & \dfrac {\partial^{2}f}{\partial x_{2}^{2}} & \cdots & \dfrac {\partial^{2}f}{\partial x_{2}\,\partial x_{n}} \\[2.2ex]
\vdots & \vdots &\ddots &\vdots \\[2.2ex]
\dfrac{\partial^{2}f}{\partial x_{n}\,\partial x_{1}} & \dfrac {\partial^{2}f}{\partial x_{n}\,\partial x_{2}} & \cdots & \dfrac {\partial^{2}f}{\partial x_{n}^{2}}
\end{bmatrix}$$

In [0]:
def hessian(f, x, h = 1e-6):
    ''' returns the two dimensional matrix of second partial derivatives of f(x). a.k.a. the Hessian matrix '''
    v = [ 
          [ 
              partial(partial(f,column),row)(x)
           for column in range(len(x)) ]
        for row in range(len(x)) ]
    
    return ml.matrix(v, dtype = np.float64)

####A test function: $f(\vec{x})=(x-1)^2 + (y-2)^2$ 

Using obscure `numpy.mgrid` tricks instead of `numpy.meshgrid`... See https://docs.scipy.org/doc/numpy/reference/generated/numpy.mgrid.html

In [0]:
f = lambda x : (x[0] - 1)**2 + (x[1] - 2)**2
x0, x1 = np.mgrid[-2.5:3:101j, 0:4:101j]

In [0]:
print(x0, x1)

In [0]:
plt.contourf(x0, x1, f([x0,x1]), 20)
plt.colorbar()

####The `multi_newton` function

Remember what we are trying to obtain! In analogy with the one-dimensional function increment $\Delta x = -f'(x) / f''(x)$ we now want

$$ \Delta\vec{x} = - [H f(\vec{x})]^{-1} \nabla f(\vec{x}) $$

In [0]:
def multi_newton(f, xguess, h = 1e-6, accuracy = 1e-6, nmax = 100, want_points = False, debug = False, gamma = 1.0):
    xbar = ml.matrix(xguess, dtype = np.float64).T # to get a column vector
    xpoints = [xbar.A1[0]]
    ypoints = [xbar.A1[1]]
    for i in range(nmax):
        H = hessian(f, xbar.A1)
        grad = gradient(f, xbar.A1) # to get a column vector
        if debug:
            print("=========")
            print("iteration: ", i)
            print("xbar =", xbar)
            print("grad =", grad)
            print("H =", H)
            print("H.I =", H.I)
            print("H.I.dot(grad) =", H.I.dot(grad))
        x = xbar - gamma * H.I.dot(grad)
        if debug: print("x =", x)
        xpoints.append(x.A1[0])
        ypoints.append(x.A1[1])
        if np.sum((x.A1 - xbar.A1)**2) < accuracy**2:
            if not want_points:
                return x.A1
            else:
                return x.A1, xpoints, ypoints
        else:
            xbar = x
    raise ArithmeticError("Failed to converge")

In [0]:
multi_newton(f, [1,1])

In [0]:
multi_newton(f, [1,1], debug = True)

In [0]:
minx, x0points, x1points = multi_newton(f, [1,1], want_points = True, gamma = 1.0)

plt.contourf(x0, x1, f([x0,x1]), 20)
plt.colorbar()

plt.plot(x0points, x1points, '-or')

#### A somewhat more difficult test function: $ f(\vec{x}) = 1 - x_0 e^{-x_0} + (x_1 - 2)^2 $



In [0]:
fharder = lambda x : 1 - x[0] * np.exp(-x[0]) + (x[1] - 2)**2
x0, x1 = np.mgrid[-2.5:4:101j, 0:4:101j]

In [0]:
plt.contourf(x0, x1, fharder([x0,x1]), 20)
plt.colorbar()

In [0]:
plt.contour(x0, x1, fharder([x0,x1]), 100)
plt.colorbar()

####Testing `multi_newton` on the new test function

In [0]:
minx, x0points, x1points = multi_newton(fharder, [1.8, 2.8], want_points = True)

plt.contour(x0, x1, fharder([x0,x1]), 100)
plt.colorbar()

print(x0points, x1points)
plt.plot(x0points, x1points, "-or")

In [0]:
minx = multi_newton(fharder, [1.8, 2.8], debug = True)

In [0]:
minx, x0points, x1points = multi_newton(fharder, [1.8, 2.8], want_points = True, gamma = 0.5)

plt.contour(x0, x1, fharder([x0,x1]), 100)
plt.colorbar()

print(x0points, x1points)
plt.plot(x0points, x1points, "-or")

##Other multi-dimensional minimization algorithms you will encounter

### Gradient-Descent Algorithm

We have seen how the Hessian can both improve and degrade the performance of the multi-dimensional Newton's algorithm. In the gradient descent algorithm, we just use the gradient and (essentially) set the Hessian to an identity matrix.

$$ \Delta\vec{x} = - \nabla f(\vec{x}) $$

In [0]:
def multi_gradient(f, xguess, h = 1e-6, accuracy = 1e-6, nmax = 100, want_points = False, debug = False, gamma = 1.0):
    xbar = ml.matrix(xguess, dtype = np.float64).T # to get a column vector
    xpoints = [xbar.A1[0]]
    ypoints = [xbar.A1[1]]
    for i in range(nmax):
        grad = gradient(f, xbar.A1) # to get a column vector
        if debug:
            print("=========")
            print("iteration: ", i)
            print("xbar =", xbar)
            print("grad =", grad)
        x = xbar - gamma * grad
        if debug: print("x =", x)
        xpoints.append(x.A1[0])
        ypoints.append(x.A1[1])
        if np.sum((x.A1 - xbar.A1)**2) < accuracy**2:
            if not want_points:
                return x.A1
            else:
                return x.A1, xpoints, ypoints
        else:
            xbar = x
    raise ArithmeticError("Failed to converge")

####A test function: $f(\vec{x})=(x-1)^2 + (y-2)^2$ 

In [0]:
f = lambda x : (x[0] - 1)**2 + (x[1] - 2)**2
x0, x1 = np.mgrid[-2.5:3:101j, 0:4:101j]

In [0]:
plt.contourf(x0, x1, f([x0,x1]), 20)
plt.colorbar()

In [0]:
multi_gradient(f, [1,1])

In [0]:
multi_gradient(f, [1,1], gamma = 0.5)

In [0]:
multi_gradient(f, [1,1], debug = True, gamma = 0.25)

In [0]:
multi_gradient(f, [1,1], debug = True, gamma = 0.5)

In [0]:
minx, x0points, x1points = multi_gradient(f, [1,1], want_points = True, gamma = 0.25)

plt.contourf(x0, x1, f([x0,x1]), 20)
plt.colorbar()

plt.plot(x0points, x1points, '-or')

#### A somewhat more difficult test function: $ f(\vec{x}) = 1 - x_0 e^{-x_0} + (x_1 - 2)^2 $



In [0]:
fharder = lambda x : 1 - x[0] * np.exp(-x[0]) + (x[1] - 2)**2
x0, x1 = np.mgrid[-2.5:4:101j, 0:4:101j]

In [0]:
plt.contourf(x0, x1, fharder([x0,x1]), 20)
plt.colorbar()

In [0]:
plt.contour(x0, x1, fharder([x0,x1]), 100)
plt.colorbar()

####Testing `multi_gradient` on the new test function

In [0]:
minx, x0points, x1points = multi_gradient(fharder, [1.8, 2.8], want_points = True, gamma = 0.35)

plt.contour(x0, x1, fharder([x0,x1]), 100)
plt.colorbar()

print(x0points, x1points)
plt.plot(x0points, x1points, "-or")

In [0]:
minx = multi_gradient(fharder, [1.8, 2.8], gamma = 0.25)

In [0]:
minx = multi_gradient(fharder, [1.8, 2.8], debug = True, gamma = 0.25)

In [0]:
minx, x0points, x1points = multi_gradient(fharder, [1.8, 2.8], want_points = True, gamma = 0.35)

plt.contour(x0, x1, fharder([x0,x1]), 100)
plt.colorbar()

print(x0points, x1points)
plt.plot(x0points, x1points, "-or")

#### An even more difficult test function: $ f(\vec{x}) = (1 - x_0)^2 + 100 (x_1 - x_0^2)^2 $



In [0]:
fharder = lambda x : (1 - x[0])**2 + 100 * (x[1] - x[0]**2)**2
x0, x1 = np.mgrid[-1.0:1.5:101j, -0.5:1.5:101j]

This is the Rosenbrock function. It has a minimum at $(1,1)$ where $f(\vec{x}) = 0$. Everywhere else, $f(\vec{x}) > 0$.

In [0]:
print(fharder([1, 1]))
print(fharder([1.01, 1.01]))

In [0]:
plt.contourf(x0, x1, fharder([x0,x1]), 20)
plt.plot(1, 1, 'ob')
plt.colorbar()

In [0]:
plt.contour(x0, x1, fharder([x0,x1]), 100)
plt.plot(1, 1, 'ob')
plt.colorbar()

####Testing `multi_gradient` on the new test function

In [0]:
minx, x0points, x1points = multi_gradient(fharder, [0.0, 0.0], want_points = True, debug = True, gamma = 0.05)

plt.contour(x0, x1, fharder([x0,x1]), 100)
plt.plot(0, 0, 'or')
plt.plot(1, 1, 'ob')
plt.colorbar()

print(x0points, x1points)
plt.plot(x0points, x1points, "-or")
plt.xlim(-1, 1)
plt.ylim(-0.5, 1.5)

#### Gradient descent with automatic determination of $\gamma_n$ in each step

#### Minimization methods for multidimensional curve fitting: a specific subset of a general problem

We want to determine the minimum of the function $\chi^2(\{\vec{x}_i\}, \{y_i\}, \{\delta y_i\}, \vec{\theta})$ for a set of data points $\{y_i\}$ at a set of independent variables $\{\vec{x}_i\}$. We want this minimum by changing the parameter vector $\vec{\theta}$.

$$ \chi^2(\{\vec{x}_i\}, \{y_i\}, \vec{\theta}) = \sum_i \frac{\big(y_i - f(\vec{x}_i, \theta)\big)^2}{\delta y_i^2} $$

We want to find $\vec{\theta}$ such that $\chi^2$ is minimized. Think of $\chi^2$ as a function of $\vec{\theta}$ only, with given $\{\vec{x}_i\}, \{y_i\}$.

Let's create a set of independent variables, $\vec{x}_i$
.

In [0]:
N = 100 # number of observations
D = 1 # dimensionality of independent variables

x = np.random.rand(N, D)
print(x[:10])

Let's now create a set of observations, $y_i$, based on an exact linear dependence.

$$ y_i = f(\vec{x}_i,\vec{\theta}) = \sum_{k=0}^{D-1} \theta_k (\vec{x}_i)_k + \theta_D = \vec\theta \cdot \vec{x}_i + \theta_D$$

In [0]:
theta_exact = [2.0] * D + [3.0]
print(theta_exact)

In [0]:
f = lambda x, theta: theta[D] + np.matmul(x, theta[:D])

y_exact = f(x, theta_exact)

delta_y = 0.25 # measurement uncertainty on y
y_observed = np.random.normal(y_exact, delta_y)

print(y_exact[:10])
print(y_observed[:10])

In [0]:
plt.errorbar(x[:,0], y_observed, delta_y, None, 'ok', ms = 6)

In [0]:
def chi2(theta):
  print(np.shape(y_observed), np.shape(x), np.shape(theta), np.shape(f(x,theta)))
  res = y_observed - f(x, theta)
  return np.sum(res**2 / delta_y**2)

In [0]:
y_observed

In [0]:
theta = [[1,2], [2,3]]
print(theta[:D], theta[D])
print(chi2(theta))

In [0]:
theta0, theta1 = np.mgrid[-4.0:4.0:201j, -4.0:4.0:201j]
plt.contour(theta0, theta1, chi2([theta0, theta1]), 100)
plt.colorbar()

### Levenberg-Marquardt Algorithm

This is used in the `scipy.optimize.leastsq` routine.

We want to minimize a sum of squares:

$$ \S \sum_i^N (y_i -f(x_i,\theta))^2 $$