# Example: Coefficient field inversion in an elliptic partial differential equation

We consider the estimation of a coefficient in an elliptic partial
differential equation as a first model problem. Depending on the
interpretation of the unknowns and the type of measurements, this
model problem arises, for instance, in inversion for groundwater flow
or heat conductivity.  It can also be interpreted as finding a
membrane with a certain spatially varying stiffness. Let
$\Omega\subset\mathbb{R}^n$, $n\in\{1,2,3\}$ be an open, bounded
domain and consider the following problem:

$$
\min_{a} J(a):=\frac{1}{2}\int_\Omega (u-u_d)^2\, dx + \frac{\gamma}{2}\int_\Omega|\nabla a|^2\,dx,
$$

where $u$ is the solution of

$$
\begin{split}
\quad -\nabla\cdot(a\nabla u) &= f \text{ in }\Omega,\\
u &= 0 \text{ on }\partial\Omega.
\end{split}
$$

Here $a\in U_{ad}:=\{a\in L^{\infty}(\Omega)\}$ the unknown coefficient field, $u_d$ denotes (possibly noisy) data, $f\in H^{-1}(\Omega)$ a given force, and $\gamma\ge 0$ the regularization parameter.

### The variational (or weak) form of the state equation:

Find $u\in H_0^1(\Omega)$ such that $(a\nabla u,\nabla v) - (f,v) = 0, \text{ for all } v\in H_0^1(\Omega),$
where $H_0^1(\Omega)$ is the space of functions vanishing on $\partial\Omega$ with square integrable derivatives. Here, $(\cdot\,,\cdot)$ denotes the $L^2$-inner product, i.e, for scalar functions $u,v$ defined on $\Omega$ we denote $(u,v) := \int_\Omega u(x) v(x) \,dx$.

### Optimality System:

The Lagrangian functional $\mathscr{L}:L^\infty(\Omega)\times H_0^1(\Omega)\times H_0^1(\Omega)\rightarrow \mathbb{R}$, which we use as a tool to derive the optimality system, is given by

$$
\mathscr{L}(a,u,p):= \frac{1}{2}(u-u_d,u-u_d) +
\frac{\gamma}{2}(\nabla a, \nabla a) +  (a\nabla u,\nabla p) - (f,p).
$$

The Lagrange multiplier theory shows that, at a solution all variations of the Lagrangian functional with respect to all variables must vanish. These variations of $\mathscr{L}$ with respect to $(p,u,a)$ in directions $(\tilde{u}, \tilde{p}, \tilde{a})$ are given by

$$
  \begin{alignat}{2}
    \mathscr{L}_p(a,u,p)(\tilde{p})  &= (a\nabla u, \nabla \tilde{p}) -
    (f,\tilde{p}) &&= 0,\\
     \mathscr{L}_u(a,u,p)(\tilde{u}) &= (a\nabla p, \nabla \tilde{u}) +
     (u-u_d,\tilde{u}) && = 0,\\
     \mathscr{L}_a(a,u,p)(\tilde{a})  &= \gamma(\nabla a, \nabla \tilde{a}) +
     (\tilde{a}\nabla u, \nabla p) &&= 0,
  \end{alignat}
$$

where the variations $(\tilde{u}, \tilde{p}, \tilde{a})$ are taken from the same spaces as $(u,p,a)$. 

The gradient of the cost functional $\mathcal{J}(a)$ therefore is

$$
    \mathcal{G}(a)(\tilde a) = \gamma(\nabla a, \nabla \tilde{a}) +
     (\tilde{a}\nabla u, \nabla \tilde{p}).
$$

### Goals:

By the end of this notebook, you should be able to:

- solve the forward and adjoint Poisson equations
- understand the inverse method framework
- visualise and understand the results
- modify the problem and code

### Mathematical tools used:

- Finite element method
- Derivation of gradiant via the adjoint method
- Armijo line search

### List of software used:

- <a href="http://fenicsproject.org/">FEniCS</a>, a parallel finite element element library for the discretization of partial differential equations
- <a href="http://www.mcs.anl.gov/petsc/">PETSc</a>, for scalable and efficient linear algebra operations and solvers
- <a href="http://matplotlib.org/">Matplotlib</a>, a python package used for plotting the results
- <a href="http://www.numpy.org/">Numpy</a>, a python package for linear algebra

## Set up

### Import dependencies

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

from dolfin import *

import numpy as np
import logging


logging.getLogger('FFC').setLevel(logging.WARNING)
logging.getLogger('UFL').setLevel(logging.WARNING)
set_log_active(False)

np.random.seed(seed=1)

### Model set up:

As in the introduction, the first thing we need to do is to set up the numerical model.

In this cell, we set the mesh ``mesh``, the finite element spaces ``Va`` and ``Vu`` corresponding to the parameter space and state/adjoint space, respectively. In particular, we use linear finite elements for the parameter space, and quadratic elements for the state/adjoint space.

The true parameter ``atrue`` is the finite element interpolant of the function

$$ a_{\rm true} = \left\{ \begin{array}{l} 4 \; \forall \,(x,y) \, {\rm s.t.}\, \sqrt{ (x-.5)^2 + (y-.5)^2} \leq 0.2 \\ 8 \; {\rm otherwise}. \end{array}\right. $$

The forcing term ``f`` and the boundary conditions ``u0`` for the forward problem are

$$ f = 1 \; \forall {\bf x} \in \Omega, \quad u = 0 \; \forall {\bf x} \in \partial \Omega. $$

In [None]:
# create mesh and define function spaces
nx = 32
ny = 32
mesh = UnitSquareMesh(nx, ny)
Va = FunctionSpace(mesh, 'Lagrange', 1)
Vu = FunctionSpace(mesh, 'Lagrange', 2)

# The true and inverted parameter
atrue = interpolate(Expression('8. - 4.*(pow(x[0] - 0.5,2) + pow(x[1] - 0.5,2) < pow(0.2,2))', degree=5), Va)
a = interpolate(Constant(4.),Va)

# define function for state and adjoint
u = Function(Vu)
p = Function(Vu)

# define Trial and Test Functions
u_trial, p_trial, a_trial = TrialFunction(Vu), TrialFunction(Vu), TrialFunction(Va)
u_test, p_test, a_test = TestFunction(Vu), TestFunction(Vu), TestFunction(Va)

# initialize input functions
f = Constant(1.0)
u0 = Constant(0.0)

# plot
plt.figure()
plot(mesh,title="Mesh")
plt.show()
plt.figure()
plot(atrue,title="True parameter field", mode="warp")
plt.show()

In [None]:
# set up dirichlet boundary conditions
def boundary(x,on_boundary):
    return on_boundary

bc_state = DirichletBC(Vu, u0, boundary)
bc_adj = DirichletBC(Vu, Constant(0.), boundary)

### The cost functional evaluation:

$$
J(a):=\underbrace{\frac{1}{2}\int_\Omega (u-u_d)^2\, dx}_{\text misfit} + \underbrace{\frac{\gamma}{2}\int_\Omega|\nabla a|^2\,dx}_{\text reg}
$$

In the code below, $W$ and $R$ are symmetric positive definite matrices that stem from finite element discretization of the misfit and regularization component of the cost functional, respectively.

In [None]:
# Regularization parameter
gamma = 1e-10

# weak for for setting up the misfit and regularization compoment of the cost
W_equ   = inner(u_trial, u_test) * dx
R_equ   = gamma * inner(nabla_grad(a_trial), nabla_grad(a_test)) * dx

W = assemble(W_equ)
R = assemble(R_equ)

# Define cost function
def cost(u, ud, a, W, R):
    diff = u.vector() - ud.vector()
    reg = 0.5 * a.vector().inner(R*a.vector() ) 
    misfit = 0.5 * diff.inner(W * diff)
    return [reg + misfit, misfit, reg]

### Set up synthetic observations:

To generate the synthetic observation we first solve the PDE for the state variable ``utrue`` corresponding to the true parameter ``atrue``.
More specifically, we solve the variational problem

Find $u\in H_0^1(\Omega)$ such that 

$$\underbrace{(a_{\text true} \nabla u,\nabla v)}_{\; := \; a_{\rm goal}} - \underbrace{(f,v)}_{\; := \;L_{\rm goal}} = 0, \text{ for all } v\in H_0^1(\Omega)$$.

Then we perturb the true state variable and write the observation ``ud`` as

$$ u_{d} = u_{\rm true} + \eta, \quad {\rm where} \; \eta \sim \mathcal{N}(0, \sigma^2).$$

Here the standard variation $\sigma$ is proportional to ``noise_level``.

In [None]:
# noise level
noise_level = 0.01

# weak form for setting up the synthetic observations
a_goal = inner( atrue * nabla_grad(u_trial), nabla_grad(u_test)) * dx
L_goal = f * u_test * dx

# solve the forward/state problem to generate synthetic observations
goal_A, goal_b = assemble_system(a_goal, L_goal, bc_state)

utrue = Function(Vu)
solve(goal_A, utrue.vector(), goal_b)

ud = Function(Vu)
ud.assign(utrue)

# perturb state solution and create synthetic measurements ud
# ud = u + ||u||/SNR * random.normal
MAX = ud.vector().norm("linf")
noise = Vector()
goal_A.init_vector(noise,1)
noise.set_local( noise_level * MAX * np.random.normal(0, 1, len(ud.vector().array())) )
bc_adj.apply(noise)

ud.vector().axpy(1., noise)

# plot
plt.figure()
plot(utrue, title="State solution with atrue",mode="warp")
plt.show()
plt.figure()
plot(ud, title="Synthetic observations",mode="warp")
plt.show()

## Setting up the state equations, right hand side for the adjoint and the necessary matrices:

$$
  \begin{alignat}{2}
    \mathscr{L}_p(a,u,p)(\tilde{p})  &= (a\nabla u, \nabla \tilde{p}) -
    (f,\tilde{p}) &&= 0,\\
     \mathscr{L}_u(a,u,p)(\tilde{u}) &= (a\nabla p, \nabla \tilde{u}) +
     (u-u_d,\tilde{u}) && = 0,\\
     \mathscr{L}_a(a,u,p)(\tilde{a})  &= \gamma(\nabla a, \nabla \tilde{a}) +
     (\tilde{a}\nabla u, \nabla p) &&= 0,
  \end{alignat}
$$

In [None]:
# weak form for setting up the state equation
a_state = inner( a * nabla_grad(u_trial), nabla_grad(u_test)) * dx
L_state = f * u_test * dx

# weak form for setting up the adjoint equations
a_adj = inner( a * nabla_grad(p_trial), nabla_grad(p_test) ) * dx
L_adj = -inner(u - ud, p_test) * dx


# weak form for setting up matrices
CT_equ   = inner(a_test * nabla_grad(u), nabla_grad(p_trial)) * dx
M_equ   = inner(a_trial, a_test) * dx


# assemble matrix M
M = assemble(M_equ)

## Initial guess
We solve the state equation and compute the cost functional for the initial guess of the parameter ``a_ini``

In [None]:
# solve state equation
A, state_b = assemble_system (a_state, L_state, bc_state)
solve (A, u.vector(), state_b)

# evaluate cost
[cost_old, misfit_old, reg_old] = cost(u, ud, a, W, R)

# plot
plt.figure()
plot(a, title="a_ini",mode="warp")
plt.show()
plt.figure()
plot(u, title="u(a_ini)",mode="warp")
plt.show()

## The steepest descent with Armijo line search:

We solve the constrained optimization problem using the steepest descent method with Armijo line search.

The stopping criterion is based on a relative reduction of the norm of the gradient (i.e. $\frac{\|g_{n}\|}{\|g_{0}\|} \leq \tau$).

The gradient is computed by solving the state and adjoint equation for the current parameter $a$, and then substituing the current state $u$, parameter $a$ and adjoint $p$ variables in the weak form expression of the gradient:

$$ (g, \tilde{a}) = \gamma(\nabla a, \nabla \tilde{a}) +(\tilde{a}\nabla u, \nabla p).$$

The Armijo line search uses backtracking to find $\alpha$ such that a sufficient reduction in the cost functional is achieved.
More specifically, we use backtracking to find $\alpha$ such that:

$$J( a - \alpha g ) \leq J(a) - \alpha c_{\rm armijo} (g,g). $$


In [None]:
# define parameters for the optimization
tol = 1e-4
maxiter = 1000
plot_any = 30
c_armijo = 1e-5

# initialize iter counters
iter = 1
converged = False

# initializations
g = Vector()
R.init_vector(g,0)

a_prev = Function(Va)

print "Nit  cost          misfit        reg         ||grad||       alpha  N backtrack"

while iter <  maxiter and not converged:

    # assemble matrix C
    CT =  assemble(CT_equ)

    # solve the adoint problem
    adj_A, adjoint_RHS = assemble_system(a_adj, L_adj, bc_adj)
    solve(adj_A, p.vector(), adjoint_RHS)

    # evaluate the  gradient
    MG = CT*p.vector() + R * a.vector()
    solve(M, g, MG)

    # calculate the norm of the gradient
    grad_norm2 = g.inner(MG)
    gradnorm = sqrt(grad_norm2)
    
    if iter == 1:
        gradnorm0 = gradnorm

    # linesearch
    it_backtrack = 0
    a_prev.assign(a)
    alpha = 8.e5
    backtrack_converged = False
    for it_backtrack in range(20):
        
        a.vector().axpy(-alpha, g )

        # solve the state/forward problem
        state_A, state_b = assemble_system(a_state, L_state, bc_state)
        solve(state_A, u.vector(), state_b)

        # evaluate cost
        [cost_new, misfit_new, reg_new] = cost(u, ud, a, W, R)

        # check if Armijo conditions are satisfied
        if cost_new < cost_old - alpha * c_armijo * grad_norm2:
            cost_old = cost_new
            backtrack_converged = True
            break
        else:
            alpha *= 0.5
            a.assign(a_prev)  # reset a
            
    if backtrack_converged == False:
        print "Backtracking failed. A sufficient descent direction was not found"
        converged = False
        break

    sp = ""
    print "%3d %1s %8.5e %1s %8.5e %1s %8.5e %1s %8.5e %1s %8.5e %1s %3d" % \
        (iter, sp, cost_new, sp, misfit_new, sp, reg_new, sp, \
        gradnorm, sp, alpha, sp, it_backtrack)

    if (iter % plot_any)==0 :
        plt.figure()
        plot(a, title="a",mode="warp")
        plt.show()
    
    # check for convergence
    if gradnorm < tol*gradnorm0 and iter > 1:
        converged = True
        print "Steepest descent converged in ",iter,"  iterations"
        
    iter += 1
    
if not converged:
    print "Steepest descent did not converge in ", maxiter, " iterations"

In [None]:
plt.figure()
plot(atrue, title="atrue",mode="warp")
plt.show()
plt.figure()
plot(a, title="a",mode="warp")
plt.show()

In [None]:
plt.figure()
plot(u, title="u",mode="warp")
plt.show()
plt.figure()
plot(p, title="p",mode="warp")
plt.show()