# Non linear optimization: preconditioning

## Introduction to optimization and operations research

Michel Bierlaire


In [None]:

import numpy as np



Consider the function $f:\mathbb{R}^2 \to \mathbb{R}$ defined as
$$
f(x)= \frac{1}{2}x_1^2 + \frac{101}{2} x_2^2 +  x_1 x_2.
$$

# Question 1
Implement a function in Python that calculates the function and its first and second derivatives.

In [None]:
def the_function(x: np.array) -> tuple[float, np.array, np.array]:
    """Calculates the function and its derivatives

    :param x: a vector of dimension 2
    :return: a tuple with the value of the function, the gradient and the second derivatives matrix
    """
    f = ????
    g = ????
    h = ????
    return f, g, h



Test it at the point $(1, 1)$

In [None]:
x = np.array([1, 1])
function, gradient, hessian = the_function(x)
print(f'f(x)={function}')
print(f'gradient(x)={gradient}')
print(f'hessian(x)=\n{hessian}')


# Question 2
Consider a change of variables
$$
x' = L_k^T x,
$$
Consider the function in the new variables
$$
\tilde{f}(x') = f(L_k^{-T} x').
$$

The gradient of the new function is
$$ \nabla \tilde{f}(x') = L_k^{-1} \nabla f(L_k^{-T} x'),$$ that is, the solution of the system:
$$L_k \nabla \tilde{f}(x') = \nabla f(L_k^{-T} x').$$

The hessian of the new function is
$$ \nabla^2 \tilde{f}(x') = L_k^{-1} \nabla^2 f(L_k^{-T} x') L_k^{-T}.$$

that is, the solution of the system:
$$L_k \nabla^2 \tilde{f}(x') =  D^T_k,$$
where $D_k$ is the solution of the system
$$\nabla^2 f(L_k^{-T} x') = L_k D_k$$
Implement a Python function that calculates this function and its first and second derivatives.

In [None]:


def preconditioned_function(
    x: np.array, l_k: np.array
) -> tuple[float, np.array, np.array]:
    """Calculates the preconditioned function and its gradient.

    :param x: a vector of dimension 2.
    :param l_k:  matrix defining the change of variables.
    :return: a tuple with the value of the function, the gradient and the hessian.
    """
    x_original = ????
    f, g, h = ????
    the_gradient = ????
    d_k = ????
    the_hessian = ????
    return f, the_gradient, the_hessian



Consider $L_k$ to be the Cholesky factor of the second derivative matrix
$$
L_k L_k^T=  \nabla^2 f(x_k).
$$

In [None]:
l_k = ????
print(l_k)


Check that it is indeed the Cholesky factor

In [None]:
print(l_k @ l_k.T)


It must be the same as the hessian

In [None]:
print(hessian)


Evaluate the preconditioned function at the point $x'=L_k^T x$, where $x=(1,1)$

In [None]:
prec_x = ????
print(f'{prec_x=}')


In [None]:
prec_function, prec_gradient, prec_hessian = preconditioned_function(prec_x, l_k)
print(f'Preconditioned f(x)={prec_function}')
print(f'Preconditioned gradient(x)={prec_gradient}')
print(f'Preconditioned hessian(x)=\n{prec_hessian}')


# Question 3
Apply one iteration of the steepest descent
algorithm on $\tilde{f}$ from that point, that is
$$
x'_{k+1} = x'_k - \alpha \nabla \tilde{f}(x'_k),
$$
where the step size is
$$
\alpha = \frac{\nabla \tilde{f}(x'_k)^T \nabla \tilde{f}(x'_k)}{\nabla
\tilde{f}(x'_k)^T \nabla^2 \tilde{f}(x'_k) \nabla \tilde{f}(x'_k)}.
$$
It is the Cauchy point.

In [None]:
alpha = ????


print(f'{alpha=:.3g}')







In [None]:
new_prec_x = ????
print(f'{new_prec_x=}')


Identify the corresponding point in the original variables.

In [None]:
new_x = ????
print(f'{new_x=}')