# Nonlinear equations

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import numpy.polynomial.polynomial as poly
from matplotlib.animation import FuncAnimation
from scipy import optimize

## Introduction

Although many phenomena in nature can be described (or approximated) by a linear response, many others are inherently **nonlinear** in that effects are not directly proportional to their causes. 
For instance, the air resistance of a moving object is proportional to the square of the velocity.

In analogy to linear equations, where a system of equations is written as $\mathbf{Ax}=\mathbf{b}$, we could write down a system of nonlinear equations as $\mathbf{f}(\mathbf{x})=\mathbf{y}$.

However, it is more customary to subtract $\mathbf{y}$ from $\mathbf{f}(\mathbf{x})$ so the equation that needs to be solved is expressed as $\mathbf{f}(\mathbf{x})=\mathbf{0}$.

In one dimension, this means that we are looking for the intersection of a curve with the x-axis. 
In general, we seek a vector $\mathbf{x}$ such that all component function $\mathbf{f}(\mathbf{x})$ are zero simultaneously.

A solution value $\mathbf{x}$ such that $\mathbf{f}(\mathbf{x})=\mathbf{0}$ is called a **root** of the equation, and a **zero** of the function $\mathbf{f}$. 
This problem is thus referred to as **root finding** or **zero finding**.

## Number of solutions

For linear systems, each equation describes a *flat* hyperplane in $\mathbb{R}^n$, and the solution corresponded with the points where all of them intersect.
For nonlinear equations, this is also true, but here each equation can describe a *curved* hyperplane.
Because curved surfaces can intersect in many more ways than flat ones, it is not possible to make general statements about the number of solutions to a nonlinear problem.

> **Examples**
> 
> Even in 1 dimension, many different cases are possible:
>
> - $e^x+1=0$ has no solution.
> - $e^{-x}-x=0$ has one solution.
> - $x^2 - 4 \sin(x)=0$ has two solutions.
> - $x^3-6x^2+10x-4=0$ has three solutions.
> - $\sin(x)=0$ has infinitely many solutions.

In [None]:
def fig_nr_solutions():
    plt.close("nr_solutions")
    fig, axs = plt.subplots(2, 3, figsize=(8, 5), num="nr_solutions")
    x = np.arange(-1, 4, 0.01)

    plots = [
        np.exp(x) + 1,
        np.exp(-x) - x,
        x**2 - 4 * np.sin(x),
        x**3 - 6 * x**2 + 10 * x - 4,
        np.sin(np.arange(-1, 4 * np.pi, 0.01)),
    ]
    labels = [
        "$e^x+1$",
        "$e^{-x}-x$",
        r"$x^2 - 4 \sin(x)$",
        "$x^3-6x^2+10x-4$",
        r"$\sin(x)$",
    ]

    for i in range(5):
        row, col = divmod(i, 3)
        ax = axs[row, col]

        if i == 4:
            x = np.arange(-1, 4 * np.pi, 0.01)

        ax.axhline(0, color="black")
        ax.plot(x, plots[i], label=labels[i])
        ax.legend()


fig_nr_solutions()

For a nonlinear equation it is possible to have degenerate solutions, which are called **multiple roots**. Generally, for a smooth function $f$, if $f(x^*)=f'(x^*)=f''(x^*)=\cdots=f^{(m-1)}(x^*)=0$ and $f^m(x^*)\neq0$, then $x^*$ is a root of **multiplicity** $m$. 

If $m=1$, then the solution is not degenerate and is called a **simple root**. 

Geometrically, this means that the curve defined by $f$ has a horizontal tangent at the x-axis.

> **Examples**
> 
> $x^2-4x+4=(x-2)^2=0$ has a root $x=2$ of multiplicity 2
>
> $x^3-6x^2+12x-8=(x-2)^3=0$ has a root $x=2$ of multiplicity 3

In [None]:
def fig_multiplicity():
    plt.close("multiplicity")
    fig, axs = plt.subplots(1, 2, figsize=(8, 4), num="multiplicity")

    x = np.arange(1, 3, 0.01)

    axs[0].plot(x, x**2 - 4 * x + 4, label="$x^2-4x+4=(x-2)^2$")
    axs[0].axhline(0, color="black")
    axs[0].legend()

    axs[1].plot(
        x,
        x**3 - 6 * x**2 + 12 * x - 8,
        label="$x^3-6x^2+12x-8=(x-2)^3$",
    )
    axs[1].axhline(0, color="black")
    axs[1].legend()


fig_multiplicity()

## Sensitivity

Let's investigate the sensitivity of the root finding problem $f(x)=0$, i.e. if $x^*$ is a root of $f$, how much does $x^*$ change for small changes to the parameters of $f$?

In one dimension, the condition number for the root-finding problem of $f$ near $x^*$ is $\frac{1}{\|f'(x^*)\|}$.

In other words, for functions for which $f'(x)$ is small near the root, the error in the root finding problem can be substantial.

At a multiple root $x^*$, $f'(x^*)=0$, so the condition number of a multiple root is infinite. 
Intuitively this is clear because a small change in the parameters of $f$ can cause the multiple root to disappear or split up in more than one root.

> **Example**
>
> As an example, consider the root-finding problem $f(x)=x^2=0$, which has twofold degenerate solution $x^*=0$.
>
> For a small change $\epsilon >0$ in $f$, we can find
>
> $x^2-\epsilon=0$, which has two roots at $\pm \sqrt{\epsilon}$
>
> or
> 
> $x^2+\epsilon=0$, which has no roots

In [None]:
def plot_parabola(epsilon, ax):
    # Generate x data
    x = np.linspace(-3, 3, 100)

    # Calculate the roots of the equation y = x^2 + epsilon = 0
    if epsilon < 0:
        roots = [np.sqrt(-epsilon), -np.sqrt(-epsilon)]
    elif epsilon == 0:
        roots = [0]
    else:
        roots = []

    # Plot the parabola and its roots
    ax.axhline(y=0, color="black")
    ax.plot(x, x**2 + epsilon, "b", linewidth=2.0, label=r"$y = x^2 + \epsilon$")
    ax.plot(roots, [0 for _ in roots], "o", color="black", label="Roots")

    # Labeling
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.set_xlim(-3, 3)
    ax.set_ylim(-1.5, 9)
    ax.set_title(
        rf"Sensitivity of $x^2 + \epsilon = 0$ with $\epsilon = {epsilon:.1f}$"
    )
    ax.legend()


# Set up the figure and subplots
plt.close("parabolas1")
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4), num="parabolas1")

# Plot for epsilon = -0.2
plot_parabola(-0.2, ax1)

# Plot for epsilon = 0.2
plot_parabola(0.2, ax2)

In multiple dimensions, the condition number is generalized using the Jacobian $\mathbf{J}$ to $\|\mathbf{J}^{-1}_{\mathbf{f}(\mathbf{x})}\|$.

> **Example** 
>
> Consider the two-dimensional system
>
> $$\mathbf{f}(\mathbf{x})=\begin{bmatrix}x_1^2-x_2+\gamma\\-x_1+x^2_2+\gamma\end{bmatrix}=\begin{bmatrix}0\\0\end{bmatrix}$$
>
> Each of these equations defines a parabola, and any point where they intersect is a solution to the system. 
> Depending on the value of $\gamma$, this system can have either zero, one, two or four solutions.
>
> For the specific case of $\gamma=0.25$, both parabola touch each other, i.e. they have one degenerate solution.
> 
> For this value, the Jacobian matrix reads
>
> $$\mathbf{J}_{\mathbf{f}(\mathbf{x})}=\begin{bmatrix}2x_1&-1\\-1&2x_2\end{bmatrix}$$
>
> which is singular at the unique solution $\mathbf{x}^*=[0.5,0.5]^\intercal$. 
>
> For a larger value of $\gamma$ the parabola no longer intersect, and for a smaller value of $\gamma$, they intersect at 2 points. 

In [None]:
def plot_parabolas(gamma, ax):
    # Define the x ranges for each curve
    x1, x2 = np.linspace(-2, 2.75, 100), np.linspace(-2, 2.75, 100)

    # Define the system of equations
    def f(x):
        return [x[0] ** 2 - x[1] + gamma, -x[0] + x[1] ** 2 + gamma]

    # Find intersection points
    roots = []
    start_conditions = np.array([[0, 0], [1, 1], [-1, 0], [0, -1]])
    for start_condition in start_conditions:
        result = optimize.root(f, start_condition)
        if result.success:
            roots.append(result.x)

    # Plot the curves
    ax.plot(x1, x1**2 + gamma, "b", linewidth=2.0, label=r"$x_{1}^{2}-x_{2}+\gamma$")
    ax.plot(
        x2**2 + gamma, x2, "r", linewidth=2.0, label=r"$-x_{1}+x_{2}^{2}+\gamma$"
    )

    # Plot intersection points
    ax.plot(
        [root[0] for root in roots], [root[1] for root in roots], "o", color="black"
    )

    # Labeling and limits
    ax.set_xlabel("$x_{1}$")
    ax.set_ylabel("$x_{2}$")
    ax.set_title(rf"$\gamma = {gamma:.2f}$")
    ax.set_xlim(-2, 4)
    ax.set_ylim(-3.5, 5.5)
    ax.legend()


# Set up figure with subplots for each case
plt.close("parabolas2")
fig, axs = plt.subplots(2, 2, figsize=(8, 6), num="parabolas2")
gammas = [-1, 0.25, -0.5, 1]  # Example values for gamma with 0, 1, 2, and 4 roots

# Plot each case on a separate subplot
for ax, gamma in zip(axs.flat, gammas, strict=True):
    plot_parabolas(gamma, ax)

## Convergence Rates and Stopping Criteria

The **convergence rate** is the effectiveness with which a certain algorithm reaches its solution.

To solve a nonlinear equation, one often has the choice between several iterative methods, with different converge rates. 
The total cost of solving the system does not only depend on the amount of iterations necessary to reach the solution with the desired accuracy, but also the computational complexity of a single iteration.

The convergence rate can be defined as follows: 

Let $\mathbf{e}_k=\mathbf{x}_k-x^*$ be the error at iteration $k$, where $\mathbf{x}_k$ is the approximate solution at iteration $k$ and $x^*$ the (usually unknown) true solution.

An iterative method is said to converge with rate $r$ if

$$\lim_{k\rightarrow\infty}\frac{\|\mathbf{e}_{k+1}\|}{\|\mathbf{e}_k\|^r}=C$$
for some finite constant $C>0$.

Interesting cases are:

- $r=1$ and $C<1$: *linear* convergence
- $r>1$ : *superlinear* convergence
- $r=2$ : *quadratic* convergence
- $r=3$ : *cubic* convergence

In an iterative method with linear convergence, the solution gains an additional $-r \log_{10}(C)$ number of correct digits as compared to the previous iteration.
For superlinearly convergent methods, the solution has about $r$ times as many correct digits as compared to the previous iteration.

> **Example** 
>
> To make this more concrete we will look at a couple of examples.
>
> consider the following sequence.
> 
> $$\lbrace 1;0.5;0.25;0.125;...\rbrace$$
>
> We see that this sequence will converge to 0 so that we can define the errors as $\mathbf{e}_k=\mathbf{x}_k-x^*=\frac{1}{2^k}-0$. 
> This sequence has a linear convergence rate with $C=0.5$. this is easily verified because we recognize the sequence of errors as $e_k=1/2^k$
> 
> $$\lim_{k\rightarrow\infty}\frac{\frac{1}{2^{k+1}}}{\frac{1}{2^{k}}}=0.5$$
>
> Now consider a sequence of errors. This is the sequence of errors from the demonstration of Newton's method below (the errors are those of iterations 12-16, because then the values get close enough to the real value to be meaningful).
>
> $$\lbrace 0.122;0.0128;0.00016;2.66\cdot10^{-8};6.66\cdot10^{-16}\rbrace$$
>
> As we can see we get double the amount of precision each iteration, which means that we have a convergence rate of 2, i.e. quadratic convergence.
>
> With this convergence rate of $r=2$ we can calculate the constant $C$. This constant seems to get closer and closer to $C=0.6$
> In the last iteration we hit machine precision so we will not count that as a valid estimation for the value of $C$.


The convergence of a certain algorithm tells us that we zoom in on the correct solution at a certain rate, but it doesn't tell us the current accuracy of our solution at any given iteration.

Therefore, we don't know whether we reached a solution that is sufficiently close to the real solution to decide that we can stop the algorithm. 

More often than not, it's not trivial to define a suitable **stopping criterion**.
A reasonable way is to look at the relative change in the solutions for successive iterations $\|\mathbf{x}_{k+1}-\mathbf{x}_{k}\|/\|\mathbf{x}_{k}\|<\varepsilon$, and check that this quantity becomes smaller than a predefined **error tolerance** $\varepsilon$. 

A sensible value for $\varepsilon$ *might* be (but this really depends on your specific problem) the double precision accuracy of $10^{-16}$.

## Solving nonlinear equations in one dimension

Let's focus on how to find the solution to nonlinear equations in one dimension:
For a continuous function $f:\mathbb{R}\rightarrow\mathbb{R}$, we seek a point $x^* \in \mathbb{R}$ such that $f(x^*)=0$.

### Bisection method

Because there might not exist a machine number $x^*$ for which $f(x^*)$ is exactly zero using finite-precision arithmetic. An alternative is to look for a (short) interval $[a,b]$ in which $f$ changes sign. 
Such a **bracket** ensures that the function must take a zero value somewhere within this interval.

The **bisection method** begins with an initial bracket and then iteratively reduces its length until the desired accuracy is reached. 

At every iteration, the function is evaluated at the **midpoint** of the interval, such that half of the interval can be discarded, based on the sign of the function value at the midpoint.

#### Convergence

The bisection method makes no use of the magnitudes of the function values, and as a result it is certain to converge, but very slowly. At each iteration, the bound on the possible error is reduced by half, meaning that it converges linearly with $r=1$ and $C=0.5$. 

Given a starting interval $[a,b]$, the length of the interval after $k$ iterations is $(b-a)/2^k$, so that achieving an error tolerance of $\varepsilon$ requires $n$ iterations, where

$$
n = \log_2\left(\frac{b-a}{\varepsilon}\right)
\qquad
\iff
\qquad
\varepsilon = \left(\frac{b-a}{2^{n}}\right)
$$

regardless of the particular function $f$ involved.


In [None]:
def func_bm(x):
    """Example function to demonstrate the bisection method."""
    return x * x - 4 * np.sin(x)


def bisection_method(f, a, b, tol):
    """Bisection method implementation to find the root
    of a function within an interval."""
    brackets = []
    while (b - a) >= tol:
        m = a + (b - a) / 2
        if np.sign(f(a)) == np.sign(f(m)):
            a = m
        else:
            b = m
        brackets.append([a, b, m])
        print(f"{len(brackets):3d}  {a:17.15f}  {b:17.15f}")
    return np.array(brackets)


def plot_bisection_with_function(f, a=1, b=6, tol=1e-8):
    # Perform bisection method and capture intervals
    results = bisection_method(f, a, b, tol)
    num_iterations = len(results)

    # Set up figure and subplots with shared x-axis
    plt.close("bisection")
    fig, (ax1, ax2) = plt.subplots(
        2,
        1,
        figsize=(8, 6),
        sharex=True,
        gridspec_kw={"height_ratios": [2, 1]},
        num="bisection",
    )

    # Top Plot: Function plot with zero crossing
    x = np.linspace(a, b, 400)
    y = f(x)
    ax1.plot(x, y, label=r"$f(x) = x^2 - 4\sin(x)$", color="blue")
    ax1.axhline(0, color="black", linewidth=0.5)
    ax1.set_ylabel("f(x)")
    ax1.set_title("Bisection Method Convergence")
    ax1.legend()

    # Highlight zero crossing (root) on the function plot, if known
    root_approx = results[-1, 2]  # Final midpoint as the root approximation
    ax1.plot(root_approx, f(root_approx), "ro", label="Approximate root")
    ax1.legend()

    # Bottom Plot: Bisection funnel
    for i, (left, right, mid) in enumerate(results):
        offset = -i  # Vertical offset for each interval
        ax2.hlines(
            offset, left, right, color="blue", linestyle="--", linewidth=1
        )  # Interval line
        ax2.plot([left, right], [offset, offset], "bo")  # Interval endpoints
        ax2.plot(mid, offset, "ro")  # Midpoint at each step

    # Bottom plot settings
    ax2.set_xlabel("x")
    ax2.set_ylabel("Iteration (Offset)")
    ax2.set_ylim(-num_iterations, 1)
    ax2.invert_yaxis()  # Invert y-axis for funnel effect


# Call the function to display the plot
plot_bisection_with_function(func_bm)

The bisection method is also implemented in SciPy:

In [None]:
optimize.root_scalar(
    func_bm, method="bisect", bracket=[1, 6], xtol=1e-16, maxiter=200
)

### Fixed-point iteration

Let's now consider an alternative problem.
Given a function $g:\mathbb{r}\rightarrow\mathbb{R}$, a value $x$ such that $x=g(x)$ is called a **fixed point** of the function $g$, since $x$ remains unchanged when $g$ is applied to it.

Geometrically, finding such a fixed point corresponds to finding an intersection between $g$ and the diagonal line $y=x$.

This problem is important because many iterative algorithms for solving nonlinear equations (see below) are based on iterations of the form

$$
x_{k+1}=g(x_k)
$$

where $g$ is a function chosen so that its fixed points are solutions for $f(x)=0$. 
Such a scheme is called **fixed-point iteration** or **functional iteration**, since the function $g$ is applied repeatedly to an initial starting value $x_0$.

For a given function $f(x)=0$, there are many equivalent fixed-point problems $x=g(x)$ with different choices for $g$. However, they are not all equally useful, as they may differ in their convergence rate and even whether or not they converge at all.

In [None]:
def func1(x):
    """Divergent function for fixed-point iteration."""
    return x**2 - 2


def func2(x):
    """Convergent function for fixed-point iteration."""
    return np.sqrt(x + 2)


def func3(x):
    """Convergent function for fixed-point iteration."""
    return 1 + 2 / x


def func4(x):
    """Function with potential divergence due to singularity."""
    with np.errstate(divide="ignore", invalid="ignore"):
        result = (x**2 + 2) / (2 * x - 1)
        if np.isscalar(result):
            return np.nan if np.isinf(result) else result
        result[np.isinf(result)] = np.nan
    return result


def fixed_point_iteration(f, x0, max_iter=10):
    """Perform fixed-point iteration, returning intermediate points."""
    results = [x0]
    for _ in range(max_iter):
        x_new = f(x0)
        if np.isnan(x_new):  # Stop if we encounter a singularity
            break
        results.extend([x0, x_new])  # Alternate: projection and function evaluation
        x0 = x_new
    return results


def plot_fixed_point_iterations():
    """Plot fixed-point iterations with four functions in subplots."""
    functions = [func1, func2, func3, func4]
    titles = [
        r"$f(x) = x^2 - 2$ (may diverge)",
        r"$f(x) = \sqrt{x + 2}$",
        r"$f(x) = 1 + \frac{2}{x}$",
        r"$f(x) = \frac{x^2 + 2}{2x - 1}$",
    ]
    initial_guesses = [2.1, 1, 1, 1]  # Starting points selected for demonstration

    # Generate subplots
    x_vals = np.linspace(0.5, 5, 400)
    plt.close("fixed_point")
    fig, axs = plt.subplots(2, 2, figsize=(10, 8), num="fixed_point")
    axs = axs.flatten()

    for i, (f, title, x0) in enumerate(
        zip(functions, titles, initial_guesses, strict=True)
    ):
        ax = axs[i]
        y_vals = f(x_vals)

        # Run fixed-point iteration
        results = fixed_point_iteration(f, x0, max_iter=10)

        # Plot function and diagonal y=x line
        ax.plot(x_vals, y_vals, label="f(x)", color="blue")
        ax.plot(x_vals, x_vals, label="y=x", color="gray", linestyle="--")

        # Plot iteration trajectory with arrows for each step
        for j in range(0, len(results) - 2, 1):
            x_start, y_start = results[j], f(results[j])
            x_proj = results[j + 1]
            # Horizontal arrow for projection step
            ax.annotate(
                "",
                xy=(x_proj, y_start),
                xytext=(x_start, y_start),
                arrowprops=dict(arrowstyle="->", color="black", lw=1.5),
            )

            # Vertical arrow for function step
            ax.annotate(
                "",
                xy=(x_proj, f(x_proj)),
                xytext=(x_proj, y_start),
                arrowprops=dict(arrowstyle="->", color="black", lw=1.5),
            )

            # Mark the function step point
            ax.plot(x_proj, f(x_proj), "go")  # Mark the next function step point

        # Set labels and titles
        ax.set_title(title)
        ax.set_xlim(0.5, 5)
        ax.set_ylim(0.5, 5)
        ax.set_xlabel("x")
        ax.set_ylabel("f(x)")
        ax.legend()


# Execute the plot function
plot_fixed_point_iterations()

One can obtain the same result using SciPy.
Note that 20 iterations is not sufficient to reach the default relative error tolerance of $10^{-8}$.

In [None]:
optimize.fixed_point(func3, x0=1, method="iteration", maxiter=30)

The simplest way to characterize the behavior of an iterative scheme $x_{k+1}=g(x_k)$ for a fixed-point problem $x=g(x)$ is to look at the derivative of $g$ in the solution $x^*$. 
It is a rule that if $x^*=g(x)$ and $\|g'(x^*)\|<1$, then the iterative scheme is **locally convergent**. 
If however $\|g'(x^*)\|>1$, then the scheme diverges for every initial value different from $x^*$.

> **Proof**
>
> If $x^*$ is a fixed point, then the error at the $k$-th iteration is
> 
> $$e_{k+1}=x_{k+1}-x^*=g(x_k)-g(x^*)$$
>
> There exist a point $\theta_k$ between $x_k$ and $x^*$ for which
> 
> $$g(x_k)-g(x^*)=g'(\theta_k)(x_k-x^*)$$
> 
> so
> 
> $$e_{k+1}=g'(\theta_k)e_k$$
> 
> We do not know the value of $\theta_k$, but if $\|g'(x^*)\|<1$, then by starting the iteration sufficiently close to $x^*$, there exists a constant $C$ for which $\|g'(\theta_k)\|\leq C<1$, for $k=0,1,\ldots$.
>
> Thus we have
> 
> $$\|e_{k+1}\|\leq C \|e_{k}\| \leq \ldots\leq C^k\|e_{e_0}\|$$
>
> As $C<1$ implies $C^k \rightarrow 0$, also $\|e_{k}\|\rightarrow 0$ and the sequence converges.

The convergence rate of the iterative scheme is linear with $C=\|g'(x^*)\|$. The smaller this constant, the faster the convergence. Ideally, we have $\|g'(x^*)\|=0$, in which case the Taylor expansion gives

$$g(x_k)-g(x^*)=g''(\xi_k)(x_k-x^*)^{2}/2$$

with $\xi_k$ between $x_k$ and $x^*$. This yields

$$\lim_{k\rightarrow \infty}\frac{\|e_{k+1}\|}{\|e_k\|^{2}}=\frac{g''(x^*)}{2}$$

In this case, the *rate of convergence becomes quadratic*.
In the next sections we'll see methods to systematically choose $g$ to reach this quadratic convergence.

### Newton's method

The bisection method does not make use of the function values (except for their sign), so it is reasonable to assume that better convergence can be achieved by also making use of their magnitude.

We start from the truncated Taylor series

$$
f(x+h)\approx f(x)+h f'(x),
$$

which is a linear function of $h$ that approximates $f$ near a given $x$. 
Its zero is easily determined to be $h=-f(x)/f'(x)$, assuming that $f'(x)\neq 0$.
Because the zeros of both functions are not identical, this procedure is repeated in an iterative scheme, called **Newton's method**

This method can be seen as a systematic way of transforming a nonlinear equation $f(x)=0$ into a fixed-point problem $x=g(x)$, where

$$
g(x)=x-f(x)/f'(x)
$$

#### Convergence

To study the convergence of this scheme we determine the derivative

$$
g'(x)=f(x)f''(x)/(f'(x))^2
$$

- For simple roots $(f(x^*)=0$ and $f'(x^*)\neq0)$, $g'(x^*)=0.$ Thus the asymptotic convergence rate of Newton's method is quadratic.
- For a multiple root with multiplicity $m$, it is only linearly convergent, with constant $C=1-(1/m)$.

> **Proof**
>
> Generally, you can write a function with a root of multiplicity $M$ at $x=x^*$ as $f(x)=(x-x^*)^M$
>
> As shown earlier, the constant $C$ of linear convergence is given by 
>
> $$\|g'(x^*)\|=  \|f(x)f''(x)/(f'(x))^2\|$$
>
> filling in 
> - $f(x)=(x-x^*)^M$
> - $f'(x)=M(x-x^*)^{(M-1)}$
> - $f''(x)=M(M-1)(x-x^*)^{(M-2)}$
>
> yields $C= 1-1/M$


Take note that these convergences are only local and it may not converge at all unless started sufficiently close to the solution.

In [None]:
def func_cube(x):
    """Example function for newton and secant methods."""
    return x**3 - 1  # a function with only one real root at x = 1


def func_cube_prime(x):
    """Derivative of func_cube."""
    return 3 * x**2


def newton_method(f, fp, x, niter):
    """Illutrative implementation of the Newton method.

    Parametes
    ---------
    f
        Function to be rooted.
    fp
        The derivative of the function to be rooted.
    x0
        The initial guess of the solution.
    niter
        The number of iterations.

    Returns
    -------
    root
        The approximation of the root.
    """
    for _ in range(niter):
        x = x - f(x) / fp(x)
    return x


# run the implementation for various N values
def newton_method_various_iterations(iterations=20):
    for niter in range(iterations):
        x = newton_method(func_cube, func_cube_prime, 25, niter)
        print(f"{niter:2d} iterations: x = {x:18.15f}")


newton_method_various_iterations()


def plot_and_animate_nm():
    x0 = 25
    x = np.arange(-10, 26, 0.05)
    y = func_cube(x)

    fig, ax = plt.subplots(num="animate_nm", clear=True)

    ax.axhline(0, color="black")
    ax.plot(x, y, color="blue")

    ax.set_xlim(-10, 26)
    # plt.ylim(-2, 20);

    (dots,) = plt.plot([], [], "o", markersize=10, color="r")

    def animate(i):
        x = newton_method(func_cube, func_cube_prime, x0, i)
        y = func_cube(x)
        dots.set_data([x], [y])
        return (dots,)

    return FuncAnimation(fig, animate, frames=20, interval=1000, repeat=False)


plt.close("animate_nm")
nm_anim = plot_and_animate_nm()

The Newton method is also implemented in SciPy:

In [None]:
optimize.root_scalar(
    func_cube,
    method="newton",
    x0=25,
    fprime=func_cube_prime,
    xtol=1e-16,
    maxiter=500,
)

### Secant method

One drawback of Newton's method is that both the function and its derivative needs to be explicitly and evaluated at every iteration. In the **Secant method**, the derivative is replaced by a finite difference approximation on successive iterates:

$$
f'(x_k)=\frac{f(x_k)-f(x_{k-1})}{x_k-x_{k-1}}
$$

The secant method can be interpreted geometrically as approximating the function $f$ by the secant line through the previous two estimates, and taking the zero of this function as the best approximate solution.

#### Convergence

Compared with Newton's method, the secant method has the advantage of requiring only one new function evaluation per iteration, but has the disadvantage of requiring two starting guesses and converging somewhat more slowly (subquadratically but still faster than linear with $r\approx 1.618$).

The lower cost per iteration often more than offsets the larger number of iterations required, such that the total cost of finding a root is often less for the secant method than for Newton's method.

In [None]:
def secant_method(f, x0, x1, niter):
    """Illustrative implementation of the secant method.

    Parameters
    ----------
    f
        The function to be rooted.
    x0, x1
        Two different initial guesses.
    niter
        The number of iterations

    Returns
    -------
    root
        The approximate root
    """
    fx0 = f(x0)
    for _ in range(niter):
        temp = x1
        fx1 = f(x1)
        x1 = x1 - fx1 * (x1 - x0) / (fx1 - fx0)
        x0 = temp
        fx0 = fx1
    return (x0 + x1) / 2


def secant_method_various_iterations(iterations=20):
    for niter in range(iterations):
        x = secant_method(func_cube, 25, 24, niter)
        print(f"{niter:2d} iterations: x = {x:18.15f}")


secant_method_various_iterations()


def plot_and_animate_sm(f):
    x0 = 25
    x1 = 24
    x = np.arange(-10, 26, 0.05)
    y = func_cube(x)

    fig, ax = plt.subplots(num="animate_sm", clear=True)
    ax.axhline(0, color="black")
    ax.plot(x, y, color="blue")
    ax.set_xlim(-10, 26)

    (dots,) = plt.plot([], [], "o", markersize=12, color="r")

    def animate(i):
        x = secant_method(func_cube, x0, x1, i)
        y = func_cube(x)
        dots.set_data([x], [y])
        return (dots,)

    return FuncAnimation(fig, animate, frames=20, interval=1000, repeat=False)


plt.close("animate_sm")
sm_anim = plot_and_animate_sm(func_cube)

The secant method is also implemented in SciPy:

In [None]:
optimize.root_scalar(
    func_cube, method="secant", x0=25, x1=24, xtol=1e-16, maxiter=50
)

### Inverse Interpolation

The secant method fits a straight line to two values of the function for each iteration.
Its convergence rate can be improved (but not made to exceed $r=2$) by fitting a higher order polynomial instead of a straight line. 

This has however the drawbacks that the zeros of the fitted polynomial might be difficult to compute, or might not exist at all.

Instead, we can use **inverse interpolation** where, instead of fitting a polynomial to values $f(x_k)$ as function of the values $x_k$, we do the opposite:
we fit a polynomial $p$ to the values $x_k$ as a function of the values $f(x_k)$. 
The next approximate solution is than simply $p(0)$.

The most used implementation of this idea is **inverse quadratic interpolation** where a parabola is fitted through the values obtained at the last 3 iterations. 
Similar to the secant method this only requires one additional function evaluation per iteration, but requires a little more memory and overhead in fitting the parabola. 
This algorithm has a converge rate of $r\approx 1.839$.

### Root finding in SciPy

The best and safest method used to find the roots of a one-dimensional function is the **Brent method**
``optimize.brentq``
This is a so-called **safeguarded method**, which combines the safety of (slow) bracket method like the bisection method and the high converge rates of inverse quadratic interpolation.

This method works by defining a suitable starting bracket (at the end of which the function has a different sign), and trying a fast-convergence method. 
If this method does not converge and the next approximate solution falls outside the defined bracket, the method falls back on the bisection method for one iteration to reduce the size of the bracket and try the fast method (with a higher chance for success) again until the solution is found.

A more detailed explanation can be found on:
<https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brentq.html#scipy.optimize.brentq>

As we'll see below, in multiple dimensions we'll need the `optimize.root` function, which also works in 1 dimension.

In [None]:
def func_par(x):
    """A simple quadratic test function with two obvious roots."""
    return x**2 - 1


# find its root in the bracket -2 and 0
print(f"Root in bracket [-2, 0]: {optimize.brentq(func_par, -2, 0)} \n")

# find its root in the bracket 0 and 2
print(f"Root in bracket [0, 2]: {optimize.brentq(func_par, 0, 2)} \n")

# now search for its root using the optimize.root function, with 2 as initial guess
optimize.root(func_par, x0=2)

#### Example

The force $F$ acting on an object in free fall is given by the sum of the gravitational force $F_g=-mg$ (with $m$ the mass and $g$ the gravitational constant), and the air resistance $F_r=C\rho A v^2/2$ (with $C$ the drag constant of the object, $\rho$ the density of the fluid the object falls through, $A$ the projected surface and $v$ the velocity).

For an object starting at position 0 at time 0, this leads to the following equation of motion:

$$
F=m\frac{d^2x}{dt^2}=-mg+\frac{C\rho A}{2}\left(\frac{dx}{dt}\right)^2
$$

The analytical solution to this differential equation is given by the nonlinear equation

$$
x(t) = -\frac{\log(\cosh(\sqrt{(AgmC\rho/2)}t))}{(AC\rho/2)}
$$

Let's now consider 2 skydivers: first a quite big person who jumps in a horizontal position
The second skydiver is a 50 kg adolescent who enthusiastically dives down head-first to minimize her air resistance, but who starts 1 second later.

The graph below shows that the first skydiver almost immediately reaches his terminal velocity, whereas the second skydiver needs a bit more time to accelerate, but due to her more streamlined position reaches a higher velocity and eventually overtakes the first.

**the question we want to answer is when the second one overtakes the first**

In [None]:
def skydivers():
    # define the freefall equation
    def freefall(x, A, m, C):
        g = 9.8  # m/s²
        r = 1.21  # kg/m³
        return (
            -1.0
            * np.log(np.cosh(np.sqrt(A * g * m * C * r / 2) * x))
            / (A * C * r / 2)
        )

    # Parameters for both skydivers:
    #      Skydiver 1: A = 0.2 m^2,  m =  50 kg, C = 0.7
    #      Skydiver 2: A = 0.8 m^2,  m = 110 kg, C = 1.0

    # skydiver 1
    def skydiver1(x):
        A1, m1, C1 = 0.2, 50.0, 0.70
        return freefall((x - 1), A1, m1, C1)

    # skydiver 2
    def skydiver2(x):
        A2, m2, C2 = 0.8, 110.0, 1.0
        return freefall(x, A2, m2, C2)

    # define x values
    x = np.arange(0, 5.5, 0.01)

    # plot trajectories for both skydivers
    plt.close("skydivers")
    fig, ax = plt.subplots(num="skydivers")

    ax.plot(x[x > 1], skydiver1(x[x > 1]), color="blue", label="50kg skydiver")
    ax.plot(x, skydiver2(x), color="red", label="110kg skydiver")
    ax.set_xlabel("time (s)")
    ax.set_ylabel("height (m)")
    ax.legend()

    # The question we want to answer for this problem is:
    # "At what time does the first skydiver overtake the second one?"

    def fall(x):
        return skydiver1(x) - skydiver2(x)

    x_root = optimize.brentq(fall, 1.01, 5.0)
    y_root = skydiver1(x_root)

    # Plot the intersection point
    ax.plot(x_root, y_root, "o", color="black")


skydivers()

### Roots of polynomial functions

All methods we saw until now zoom in on a single root of the function under study.
Sometimes we're interested in all the roots of e.g. a polynomial function. 

For a polynomial $p(x)$ of degree $n$, we want to find all $n$ zeros (which might be complex).

To this end we can resort to several methods:

- Use one of the methods shown above to find one root $x_1$ and then deflate the polynomial $p(x)$ to $p(x)/(x-x_1)$ which has a degree that is one lower and repeat the process. 
Note that it's a good idea to zoom in on each of the obtained roots using the approximate values used this way to avoid any numerical errors introduced in the deflating process.
- Use a dedicated (complex) routine specifically designed for this purpose. 
These work by isolating the roots of a polynomial in the complex plane, and then refining in a way similar to the bisection method to zoom in on each of the roots. 
Their complexity is beyond the scope of this course.
- Form the **companion matrix** of the given polynomial and use an eigenvalue routine to find its eigenvalues, which are also the roots of the polynomial.

The latter method is the one that is used by NumPy in the [`numpy.polynomial.polynomial.polyroots`](https://numpy.org/doc/stable/reference/generated/numpy.polynomial.polynomial.polyroots.html) function.

> **Eample:**
> 
> Find the roots of the polynomial $x^3 - 6 x^2 + 11 x - 6$, which equals $(x-1)(x-2)(x-3)$.
> Note that the `polyroots` function asks an array of polynomial coefficients as input:

In [None]:
poly.polyroots([-6, 11, -6, 1])

## Systems of nonlinear equations

Systems of nonlinear equations are more difficult to solve than single nonlinear equations for a number of reasons:

- A much wider range of behavior is possible, so we don't get as far with theoretical analysis of the existence and number of solutions.
- There is no simple way to bracket a desired solution.
- Computational overhead increases rapidly with the dimension of the problem.

Most of the methods we saw to solve a 1-dimensional nonlinear problem do not generalize to more than 1 dimension. One method that does is Newton's method:

For a differentiable vector function $\mathbf{f}$, the truncated Taylor series reads:

$$\mathbf{f}(\mathbf{x}+\mathbf{s})\approx\mathbf{f}(\mathbf{x})+\mathbf{J}_{f(\mathbf{x})}\,\mathbf{s}$$

where ${\mathbf{J}_{\mathbf{f}(\mathbf{x})}}$ is the Jacobian matrix of $\mathbf{f}$ with elements

$$\bigl[\mathbf{J}_{\mathbf{f}(\mathbf{x})}\bigr]_{ij}=\frac{\partial f_i(\mathbf{x})}{\partial x_j}$$

If $\mathbf{s}$ satisfies the linear system $\mathbf{J}_{f(\mathbf{x})}\,\mathbf{s}=-\mathbf{f}(\mathbf{x})$, then $\mathbf{x+s}$ is taken as an approximate zero of $\mathbf{f}$.

Essentially, Newton's method replaces a system of nonlinear equations with a system of linear equations, but as the solutions of both systems are not identical, the process must be repeated until the desired accuracy is reached.

If the Jacobian of the function is not available, there exist more advanced methods which estimate the Jacobian based on function evaluations, similar to how the secant method works in 1 dimension. 

The computational cost of Newton's method in $n$ dimensions is substantial: 
- Evaluating the Jacobian matrix (or approximating it) requires $n^2$ function evaluations.
- Solving the system $\mathbf{J}_{\mathbf{f}(\mathbf{x})}\,\mathbf{s}=-\mathbf{f}(\mathbf{x})$, for instance using LU-factorization, costs $\mathcal{O}(n^3)$ operations.

Without going into too much detail in how these methods work, we'll have a look how the solution to such problems can be found using `scipy` using the `optimize.root` function.
Its documentation can be found here
<https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.root.html#scipy.optimize.root>

Its use is illustrated with an example:

> **Example**
>
> Solve the nonlinear system 
>
> $$\mathbf{f}(\mathbf{x})=\begin{bmatrix}x_1+2x_2-2\\x_1^2+4x_2^2-4\end{bmatrix}=\begin{bmatrix}0\\0\end{bmatrix}$$

In [None]:
def func_2d(x):
    return [x[0] + 2 * x[1] - 2, x[0] ** 2 + 4 * x[1] ** 2 - 4]


# find the roots of this equation with the point [2, 2] as initial guess
optimize.root(func_2d, [2, 2])

> We find the solution $\mathbf{x^*}=[0,1]^\intercal$