# Numerical and Automatic Derivatives

Derivatives are fundamental in mathematical modeling because they
quantify rates of change.
They provide insight into how physical systems evolve, how signals
vary, and how models respond to inputs.

In computational physics, engineering, and machine learning, the
efficient and accurate computation of derivatives is essential.
We rely on derivatives for simulations, optimization, and sensitivity
analysis.

For simple functions, derivatives can often be computed analytically.
But in real applications, functions are frequently nonlinear,
high-dimensional, or too complex for manual differentiation.
In these cases, alternative computational techniques become
indispensable.

### Definition of the Derivative

The derivative of a real-valued function $f(x)$ at a point $x=a$ is
defined as the limit
\begin{align}
  f'(a) \equiv \lim_{h\rightarrow 0}
  \frac{f(a+h) - f(a)}{h}
\end{align}

If this limit exists, it represents the slope of the tangent line to
the curve $y=f(x)$ at $x=a$.
More generally, the derivative function $f'(x)$ describes the local
rate of change of $f(x)$.

### Chain Rule

One of the most important rules in calculus is the chain rule.
For a composite function $f(x) = g(h(x))$,
\begin{align}
  f'(x) = g'(h(x)) h'(x).
\end{align}
The chain rule is not just a basic calculus identity.
It is the central principle behind modern computational approaches to
derivatives.
As we will see, both numerical differentiation schemes and automatic
differentiation rely heavily on repeated applications of this rule.


### Approaches to Computing Derivatives

There are three main approaches to computing derivatives in practice:
1. Symbolic differentiation
   Applies algebraic rules directly to mathematical expressions,
   producing exact formulas.
   (This is what you do in calculus class.)
2. Numerical differentiation
   Uses finite differences to approximate derivatives from discrete
   function values.
   These methods are easy to implement but introduce truncation and
   round-off errors.
3. Automatic differentiation (AD)
   Systematically applies the chain rule at the level of elementary
   operations.
   AD computes derivatives to machine precision without symbolic
   algebra, making it efficient for complex functions and large-scale
   systems.


## Symbolic Differentiation

Symbolic differentiation computes derivatives by applying calculus
rules directly to symbolic expressions.
Unlike numerical methods that we will see later, which only
approximate derivatives at specific points, symbolic methods yield
exact analytical expressions.
This makes them valuable for theoretical analysis, closed-form
solutions, and precise computation.

The basic algorithm for symbolic differentiation can be described in
three steps:
1. Parse the expression:
   Represent the function as a tree (nodes are operations like `+`,
   `*`, `sin`, etc.).
2. Apply differentiation rules:
   Recursively apply rules (e.g., product rule, chain rule) to each
   node.
3. Simplify:
   Reduce the resulting expression into a cleaner, more efficient
   form.


Consider the function $f(x) = x^2 \sin(x) + e^{2x}$.
To compute $f'(x)$, a symbolic differentiation system would:
1. Differentiate $x^2 \sin(x)$ using the product rule:
   \begin{align}
   \frac{d}{dx}[x^2 \sin(x)] = x^2 \cos(x) + 2 x \sin(x)
   \end{align}
2. Differentiate $e^{2x}$ using the chain rule:
   \begin{align}
   \frac{d}{dx}[e^{2x}] = 2 e^{2x}
   \end{align}
3. Combine the results:
   \begin{align}
   f'(x) = x^2 \cos(x) + 2 x \sin(x) + 2 e^{2x}
   \end{align}

### Symbolic Differentiation with SymPy

We can use [SymPy](https://www.sympy.org), a Python library for
symbolic mathematics, to automate this process.

In [None]:
!pip install sympy

In [None]:
import sympy as sp

# Define symbolic variable and function
x = sp.symbols('x')
f = x**2 * sp.sin(x) + sp.exp(2*x)

# Differentiate
fp = sp.diff(f, x)

# Simplify result
fp_simplified = sp.simplify(fp)

# Display the result with equation support
display(f)
display(fp_simplified)

SymPy can compute higher-order derivatives just as easily:

In [None]:
fpp  = sp.diff(f, x, 2)  # second derivative
fppp = sp.diff(f, x, 3)  # third derivative

fpp_simplified  = sp.simplify(fpp)
fppp_simplified = sp.simplify(fppp)

display(fpp_simplified)
display(fppp_simplified)

We can visualize the function and its derivatives:

In [None]:
import numpy as np

# Convert sympy expression into numpy-callable functions
f_num    = sp.lambdify(x, f,               "numpy")
fp_num   = sp.lambdify(x, fp_simplified,   "numpy")
fpp_num  = sp.lambdify(x, fpp_simplified,  "numpy")
fppp_num = sp.lambdify(x, fppp_simplified, "numpy")

In [None]:
import matplotlib.pyplot as plt

X = np.linspace(-1, 1, 101)
plt.plot(X, f_num(X),    label=r'$f(x)$')
plt.plot(X, fp_num(X),   label=r"$f'(x)$")
plt.plot(X, fpp_num(X),  label=r"$f''(x)$")
plt.plot(X, fppp_num(X), label=r"$f'''(x)$")

plt.legend()

### Pros and Cons

Symbolic differentiation is useful because it provides:
* Exact results:
  no approximation or rounding errors (until it is evaluated with
  floating point numbers).
* Validity across the domain:
  derivative formulas apply everywhere the function is defined.
* Analytical insight:
  exact expressions make it easier to solve ODEs, optimize functions,
  and manipulate formulas algebraically.

Symbolic differentiation also has important drawbacks:
* Expression growth:
  formulas can quickly become large and messy for complex functions.
* Computational cost:
  evaluating or simplifying derivatives can be expensive for
  high-dimensional systems.
* Limited applicability:
  not suitable when functions are given only by data, simulations, or
  black-box algorithms.

In such cases, numerical or automatic differentiation is usually the
better choice.

### Software Tools

Symbolic differentiation is supported in many systems:
* [`SymPy`](https://www.sympy.org/):
  An open-source Python library that provides capabilities for
  symbolic differentiation, integration, and equation solving within
  the Python ecosystem.
* [`Mathematica`](https://www.wolfram.com/mathematica/):
  A computational software developed by Wolfram Research, offering
  extensive symbolic computation features used widely in academia and
  industry.
* [`Maple`](https://www.maplesoft.com/):
  A software package designed for symbolic and numeric computing,
  providing powerful tools for mathematical analysis.
* [`Maxima`](https://maxima.sourceforge.io/):
  An open-source computer algebra system specializing in symbolic
  manipulation, accessible for users seeking free alternatives.

In [None]:
# HANDSON: Differentiate and Simplify
#
# Define f(x) = ln(x^2 + 1) exp(x).
# Use SymPy to compute f'(x), simplify it, and plot both
# f(x) and f'(x).


In [None]:
# HANDSON: Product of Many Functions
#
# Define f(x) = sin(x) cos(x) tan(x).
# Compute derivatives up to order 3.
# What happens to expression complexity?


## Finite Difference Methods

Numerical differentiation estimates the derivative of a function using
discrete data points.
Instead of exact formulas, it provides approximate values that are
especially useful when analytical derivatives are difficult or
impossible to obtain.
This flexibility makes numerical methods essential for handling
complex, empirical, or high-dimensional functions that appear in
scientific and engineering applications.

The most common numerical approach is the finite difference method,
which estimates derivatives by evaluating the function at nearby
points and forming ratios of differences.
These methods are simple to implement and widely used in practice.
The key idea is to approximate the derivative $f'(x)$ by sampling the
function at points around $x$.
The three elementary finite difference formulas are forward
difference, backward difference, and central difference.

### Forward Difference

The forward difference uses the function values at $x$ and $x+h$:
\begin{align}
  f'(x) \approx \frac{f(x+h) - f(x)}{h}.
\end{align}
This method is easy to implement.
Assuming $f(x)$ is already available, it requires only one extra
function evaluation.
However, it introduces a **truncation error** of order
$\mathcal{O}(h)$.
While decreasing $h$ improves accuracy, making $h$ too small causes
floating-point **round-off errors** to dominate.

### Backward Difference

The backward difference uses values at $x$ and $x-h$:
\begin{align}
  f'(x) \approx \frac{f(x) - f(x-h)}{h}.
\end{align}
This has the same truncation error of order $\mathcal{O}(h)$ as the
forward method.
It is particularly useful when values of $f(x+h)$ are unavailable or
expensive to compute.

### Central Difference

The central difference combines forward and backward differences to
achieve higher accuracy:
\begin{align}
  f'(x) \approx \frac{f(x+h) - f(x-h)}{2h}.
\end{align}
This method has a truncation error of order $\mathcal{O}(h^2)$, making
it significantly more accurate for smooth functions.
The trade-off is that it requires two extra function evaluations (not
at $x$) instead of one, but the improved accuracy often makes it the
preferred method.


### Truncation Error vs. Round-Off Error

Finite difference methods must balance two sources of error:
* **Truncation error** comes from approximating the derivative using a
  discrete difference.
* **Round-off error** comes from the finite precision of
  floating-point arithmetic.

For forward and backward differences, truncation error decreases
linearly with $h$.
For central differences, it decreases quadratically, giving better
accuracy for small $h$.

However, if $h$ becomes too small, round-off error dominates because
the difference $f(x+h) - f(x)$ may be nearly indistinguishable in
floating-point representation.
Hence, we may be facing catastrophic cancellation as before.

The optimal choice of $h$ balances these two errors.
A common rule of thumb is to set
\begin{align}
  h \sim \sqrt{\epsilon},
\end{align}

where $\epsilon$ is the machine epsilon that we learned before.
I.e., the smallest number such that $1 + \epsilon > 1$ in
floating-point arithmetic.

### Sample Codes

Below we implement the three basic finite-difference formulas
(forward, backward, and central) and demonstrate their behavior on a
smooth test function.
We will also run a convergence study to see how truncation error
(improves as $h$ goes smaller) and round-off error (gets worse as $h$
goes smaller) trade off in practice.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Test function and its exact derivative
f  = lambda x: np.sin(x)
fp = lambda x: np.cos(x)

# Basic finite-difference formulas
def fp_forward(f, x, h):
    return (f(x+h) - f(x)) / h

def fp_backward(f, x, h):
    return (f(x) - f(x-h)) / h

def fp_central(f, x, h):
    return (f(x+h) - f(x-h)) / (2*h)

We pick a moderate step size $h$ and compare forward/backward/central
differences against the analytic derivative.
Central differences are usually much more accurate for smooth
functions.

In [None]:
X = np.linspace(0, 2*np.pi, 201)
h = 0.1  # "reasonable" step for visual comparison

plt.plot(X, fp(X),                      label=r"Exact $f'(x)=\cos x$")
plt.plot(X, fp_forward (f, X, h), '-.', label=f'Forward (h={h:g})')
plt.plot(X, fp_backward(f, X, h), '--', label=f'Backward (h={h:g})')
plt.plot(X, fp_central (f, X, h), ':',  label=f'Central (h={h:g})')

plt.xlabel('x')
plt.ylabel('derivative')
plt.legend()

In [None]:
# HANDSON: Adjust `h` in the above cell and observe how the finite
#          difference methods behave


Next, we perform a convergence study, i.e., how error changes with
step size.
We fix a point $x_0$ and sweep $h$ over many orders of magnitude.

In [None]:
def errors(f, fp_exact, x0):
    # Step sizes spanning many orders of magnitude
    H  = np.logspace(0, -16, 17)  # start at 1 down to 1e-16
    fp = fp_exact(x0)

    Ef = np.abs(fp_forward (f, x0, H) - fp)
    Eb = np.abs(fp_backward(f, x0, H) - fp)
    Ec = np.abs(fp_central (f, x0, H) - fp)

    return H, Ef, Eb, Ec

In [None]:
# Compare at a few points to see behavior across the domain
X0  = [0.3, np.pi/4, 1.7]
eps = np.finfo(float).eps

fig, axes = plt.subplots(1, len(X0), figsize=(12,4), sharey=True)
for ax, x0 in zip(axes, X0):
    H, Ef, Eb, Ec = errors(f, fp, x0)

    # Reference slopes: h and h^2 (scaled to match error at largest h for readability)
    s1 = Ef[0]/H[0]  # scale so line ~ same level at left
    s2 = Ec[0]/(H[0]**2)

    ax.loglog(H, s1*H,    'k--', lw=1, label=r'$\propto h$')
    ax.loglog(H, s2*H**2, 'k:',  lw=1, label=r'$\propto h^2$')

    ax.loglog(H, Ef, 'o-',  ms=3, label='Forward ($O(h)$)')
    ax.loglog(H, Eb, 's--', ms=3, label='Backward ($O(h)$)')
    ax.loglog(H, Ec, '^:' , ms=3, label='Central ($O(h^2)$)')

    ax.set_title(f'$x_0 = {x0:.3f}$')
    ax.set_ylim(1e-19, 1e+1)
    ax.set_xlim(max(H)*2, min(H)/2)
    ax.set_xlabel(r'Step size $h$')

    # A helpful visual: round-off typically appears near sqrt(eps)
    ax.axvline(eps**(1/2), color='k', alpha=0.5, lw=1)
    ax.text(eps**(1/2)*2, ax.get_ylim()[0]*2, r'$\epsilon^{1/2}$',
            va='bottom', ha='right', color='k')

    ax.axvline(eps**(1/3), color='k', alpha=0.5, lw=1)
    ax.text(eps**(1/3)*2, ax.get_ylim()[0]*2, r'$\epsilon^{1/3}$',
            va='bottom', ha='right', color='k')

axes[0].set_ylabel('Error')
axes[2].legend(loc='lower right', ncol=1)
plt.tight_layout()

From the plot above, we observe that
* For **large** $h$, **truncation error** dominates:
  error scales like $\mathcal{O}(h)$ (forward/backward) or
  $\mathcal{O}(h^2)$ (central).
* For **small** $h$, **round-off error** dominates:
  subtractive cancellation makes the difference $f(x+h)-f(x)$ noisy,
  so error grows.

The "best" $h$ is somewhere in the middle (often near
$\epsilon^{1/(n+1)}$ for $n$th-order methods), where total error is
minimized.
However, this depends on the local scales of $f, f'', f'''$, etc.

In [None]:
# HANDSON: Change `x0` and observe how the convergence plots change.
#          Specifically, what if `x0 = 0`?


In [None]:
# HANDSON: Find your optimal `h` at different points.
#          In the convergence plot, read off the $h$ where each curve
#          reaches its minimum.
#          How does it change with the local behavior of $f(x)$?


In [None]:
# HANDSON: Find your optimal `h` for different functions.
#          Specifically, replace $f(x)$ by $e^{-x^2}$ or $e^{3x}$.
#          How do the error curves change?
#          Which methods are more robust?


In [None]:
# HANDSON: Probe subtraction error.
#          For a fixed tiny `h=10e-12`, evaluate `(f(x+h)-f(x))/h`
#          as `x0` varies.
#          Where does the error spike?
#          Why?


### "High-Order Finite Differences" as in High-Order Schemes

High-order finite differences improve the accuracy order of a
derivative approximation by combining more sample points and
cancelling lower-order error terms via Taylor expansions.

This should not be confused with computing higher-order derivatives
(e.g., $f''$, $f^{(3)}$) that we will see later.
Here, we still approximate a first derivative $f'(x)$, but with
higher-order accuracy.

To derive high-order finite difference approximations, the standard
method is to use Taylor series expansion of the function around the
point of interest.
By considering multiple points symmetrically distributed around the
target point, it is possible to eliminate lower-order error terms,
thereby increasing the accuracy of the derivative approximation.

Specifically, consider approximating the first derivative $f'(x)$ with
fourth-order accuracy.
This requires that the truncation error be of order
$\mathcal{O}(h^4)$.
Expand the function $f$ at points $x - 2h$, $x - h$, $x + h$, and $x +
2h$ using the Taylor series around $x$:
\begin{align}
  f(x - 2h) &= f(x) - 2h f'(x) + \frac{(2h)^2}{2} f''(x) - \frac{(2h)^3}{6} f'''(x) + \frac{(2h)^4}{24} f''''(x) + \mathcal{O}(h^5), \\
  f(x -  h) &= f(x) -  h f'(x) + \frac{  h ^2}{2} f''(x) - \frac{  h ^3}{6} f'''(x) + \frac{  h ^4}{24} f''''(x) + \mathcal{O}(h^5), \\
  f(x +  h) &= f(x) +  h f'(x) + \frac{  h ^2}{2} f''(x) + \frac{  h ^3}{6} f'''(x) + \frac{  h ^4}{24} f''''(x) + \mathcal{O}(h^5), \\
  f(x + 2h) &= f(x) + 2h f'(x) + \frac{(2h)^2}{2} f''(x) + \frac{(2h)^3}{6} f'''(x) + \frac{(2h)^4}{24} f''''(x) + \mathcal{O}(h^5).
\end{align}

We will construct linear combinations of these expansions to eliminate
the lower-order terms up to $h^3$.
For example, subtract the expansion at $x - 2h$ from that at $x + 2h$
and adjust coefficients to isolate $f'(x)$:
\begin{align}
  f(x + 2h) - f(x - 2h) &= 4h f'(x) + \frac{8h^3}{3} f'''(x) + \mathcal{O}(h^5), \\
  f(x +  h) - f(x -  h) &= 2h f'(x) + \frac{h^3}{3} f'''(x) + \mathcal{O}(h^5).
\end{align}

It is now straightforward to eliminate the $f'''(x)$ term:
\begin{align}
  -f(x + 2h) + f(x - 2h) + 8f(x + h) - 8f(x - h) = 12h f'(x)  + \mathcal{O}(h^5).
\end{align}

Solving for $f'(x)$:
\begin{align}
  f'(x) \approx \frac{-f(x + 2h) + 8f(x + h) - 8f(x - h) + f(x - 2h)}{12h} + \mathcal{O}(h^4).
\end{align}
This leads to the fourth-order central difference formula for the first derivative.

In [None]:
def fp_central4(f, x, h):
    return (-f(x+2*h) + 8*f(x+h) - 8*f(x-h) + f(x-2*h))/(12*h)

In [None]:
def errors(f, fp_exact, x0):
    # Step sizes spanning many orders of magnitude
    H  = np.logspace(0, -16, 17)  # start at 1 down to 1e-16
    fp = fp_exact(x0)

    Ef = np.abs(fp_forward (f, x0, H) - fp)
    Eb = np.abs(fp_backward(f, x0, H) - fp)
    Ec = np.abs(fp_central (f, x0, H) - fp)
    Ec4= np.abs(fp_central4(f, x0, H) - fp)

    return H, Ef, Eb, Ec, Ec4

In [None]:
X0 = [0.3, np.pi/4, 1.7]

fig, axes = plt.subplots(1, len(X0), figsize=(12,4), sharey=True)
for ax, x0 in zip(axes, X0):
    H, Ef, Eb, Ec, Ec4 = errors(f, fp, x0)

    # Reference slopes: h and h^2 (scaled to match error at largest h for readability)
    s1 = Ef [0]/H[0]  # scale so line ~ same level at left
    s2 = Ec [0]/(H[0]**2)
    s4 = Ec4[0]/(H[0]**4)

    ax.loglog(H, s1*H,    'k--', lw=1, label=r'$\propto h$')
    ax.loglog(H, s2*H**2, 'k-.', lw=1, label=r'$\propto h^2$')
    ax.loglog(H, s4*H**4, 'k:',  lw=1, label=r'$\propto h^4$')

    ax.loglog(H, Ef, 'o-',  ms=3, label='Forward  ($O(h)$)')
    ax.loglog(H, Eb, 's--', ms=3, label='Backward ($O(h)$)')
    ax.loglog(H, Ec, '^-.', ms=3, label='Central  ($O(h^2)$)')
    ax.loglog(H, Ec4,'^:' , ms=3, label='Central4 ($O(h^4)$)')

    ax.set_title(f'$x_0 = {x0:.3f}$')
    ax.set_ylim(1e-19, 1e+1)
    ax.set_xlim(max(H)*2, min(H)/2)
    ax.set_xlabel(r'Step size $h$')

    # A helpful visual: round-off typically appears near sqrt(eps)
    ax.axvline(eps**(1/2), color='k', alpha=0.5, lw=1)
    ax.text(eps**(1/2)*2, ax.get_ylim()[0]*2, r'$\epsilon^{1/2}$',
            va='bottom', ha='right', color='k')

    ax.axvline(eps**(1/3), color='k', alpha=0.5, lw=1)
    ax.text(eps**(1/3)*2, ax.get_ylim()[0]*2, r'$\epsilon^{1/3}$',
            va='bottom', ha='right', color='k')

    ax.axvline(eps**(1/5), color='k', alpha=0.5, lw=1)
    ax.text(eps**(1/5)*2, ax.get_ylim()[0]*2, r'$\epsilon^{1/5}$',
            va='bottom', ha='right', color='k')

axes[0].set_ylabel('Error')
axes[2].legend(loc='lower right', ncol=1)
plt.tight_layout()

Does the 4th-order scheme converge as expected?
* Yes, in the truncation regime (larger $h$) the error curve is
  parallel to $h^4$.
* No, for very small $h$, the curve turns upward due to round-off and
  subtractive cancellation in differences like $f(x+h)-f(x-h)$.
* The "sweet spot" for the 4th-order stencil often occurs near $h \sim
  \epsilon^{1/5}$, not as small as you might guess.

In [None]:
# HANDSON: Change `x0` and observe how the convergence plots change.
#          Specifically, what if `x0 = 0` or `x0 = np.pi/2`?


In [None]:
# HANDSON: Find your optimal `h` at different points.
#          In the convergence plot, read off the $h$ where each curve
#          reaches its minimum.
#          How does it change with the local behavior of $f(x)$?


In [None]:
# HANDSON: Find your optimal `h` for different functions.
#          Specifically, replace $f(x)$ by $e^{-x^2}$ or $e^{3x}$.
#          How do the error curves change?
#          Which methods are more robust?


In [None]:
# HANDSON: Probe subtraction error.
#          For a fixed tiny `h=10e-12`, evaluate the numerators
#          as `x0` varies.
#          Where does the error spike?
#          Why?


Notes that
* Tabulated coefficients for many high-order stencils (central and
  one-sided) are widely available and easy to implement.
* Orders above **6th** are rarely useful in floating-point arithmetic
  because round-off and problem noise usually dominate before the
  asymptotic order helps.

### "High-Order Finite Differences" for Higher Derivatives

Finite differences extend naturally to higher derivatives.
A standard example is the second derivative using the central 3-point stencil.
Starting from Taylor expansions at $x \pm h$,
\begin{align}
  f(x+h) &= f(x) + h f'(x) + \tfrac{h^2}{2} f''(x) + \tfrac{h^3}{6} f^{(3)}(x) + \mathcal{O}(h^4),\\
  f(x-h) &= f(x) - h f'(x) + \tfrac{h^2}{2} f''(x) - \tfrac{h^3}{6} f^{(3)}(x) + \mathcal{O}(h^4),
\end{align}
adding eliminates the odd derivatives and gives
\begin{align}
  f(x+h) + f(x-h) = 2f(x) + h^2 f''(x) + \mathcal{O}(h^4),
\end{align}
so the 3-point central formula is
\begin{align}
  f''(x) \approx \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} \quad\text{with error } \mathcal{O}(h^2).
\end{align}

Below we implement this stencil, compare to the exact derivative for
$f(x)=\sin x$ (so $f''(x)=-\sin x$), and run a convergence study.
We also include a 5-point, $\mathcal{O}(h^4)$ stencil for higher
accuracy.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Test function and exact second derivative
f    = lambda x: np.sin(x)
fpp  = lambda x: -np.sin(x)

# 3-point central, O(h^2)
def fpp_central3(f, x, h):
    return (f(x+h) - 2*f(x) + f(x-h)) / (h**2)

# 5-point central, O(h^4)
def fpp_central5(f, x, h):
    return (-f(x+2*h) + 16*f(x+h) - 30*f(x) + 16*f(x-h) - f(x-2*h)) / (12*h**2)

We compare the numerical second derivative with the exact one on a
grid with a moderate step size $h$.
The 5-point stencil typically matches the exact curve noticeably
better for smooth functions.

In [None]:
X = np.linspace(0, 2*np.pi, 201)
h = 1  # "reasonable" step for visual comparison

plt.plot(X, fpp(X),                      label=r"Exact $f''(x)=-\sin x$")
plt.plot(X, fpp_central3(f, X, h), '--', label=f'3-point central (h={h:g})')
plt.plot(X, fpp_central5(f, X, h), ':',  label=f'5-point central (h={h:g})')

plt.xlabel('x')
plt.ylabel('Second derivative')
plt.legend()

In [None]:
# HANDSON: Adjust `h` in the above cell and observe how the finite
#          difference methods behave


As with first derivatives, we perform a convergence study.
We fix a point $x_0$ and sweep $h$ over many orders of magnitude.

In [None]:
def error2s(f, fpp_exact, x0):
    # Step sizes spanning many orders of magnitude
    H   = np.logspace(0, -16, 17)  # start at 1 down to 1e-16
    fpp = fpp_exact(x0)

    E3 = np.abs(fpp_central3(f, x0, H) - fpp)
    E5 = np.abs(fpp_central5(f, x0, H) - fpp)

    return H, E3, E5

In [None]:
# Compare at a few points to see behavior across the domain
X0  = [0.3, np.pi/4, 1.7]
eps = np.finfo(float).eps

fig, axes = plt.subplots(1, len(X0), figsize=(12,4), sharey=True)
for ax, x0 in zip(axes, X0):
    H, E3, E5 = error2s(f, fpp, x0)

    # Reference slopes: h and h^2 (scaled to match error at largest h for readability)
    s2 = E3[0]/(H[0]**2)  # scale so line ~ same level at left
    s4 = E5[0]/(H[0]**4)

    ax.loglog(H, s2*H**2, 'k--', lw=1, label=r'$\propto h^2$')
    ax.loglog(H, s4*H**4, 'k:',  lw=1, label=r'$\propto h^4$')

    ax.loglog(H, E3, 'o-',  ms=3, label='3-point central ($O(h^2)$)')
    ax.loglog(H, E5, 's--', ms=3, label='5-point central ($O(h^4)$)')

    ax.set_title(f'$x_0 = {x0:.3f}$')
    ax.set_ylim(1e-15, 1e+5)
    ax.set_xlim(max(H)*2, min(H)/2)
    ax.set_xlabel(r'Step size $h$')

    # A helpful visual: round-off typically appears near sqrt(eps)
    ax.axvline(eps**(1/3), color='k', alpha=0.5, lw=1)
    ax.text(eps**(1/3)*2, ax.get_ylim()[0]*2, r'$\epsilon^{1/3}$',
            va='bottom', ha='right', color='k')
    
    ax.axvline(eps**(1/5), color='k', alpha=0.5, lw=1)
    ax.text(eps**(1/5)*2, ax.get_ylim()[0]*2, r'$\epsilon^{1/5}$',
            va='bottom', ha='right', color='k')

axes[0].set_ylabel('Error')
axes[2].legend(loc='lower right', ncol=1)
plt.tight_layout()

Just like for the first derivatives, why are the convergence plots not
perfect?
How do the truncation and round-off errors behave here?

In [None]:
# HANDSON: Change `x0` and observe how the convergence plots change.
#          Specifically, what if `x0 = 0` or `x0 = np.pi/2`?


In [None]:
# HANDSON: Find your optimal `h` at different points.
#          In the convergence plot, read off the $h$ where each curve
#          reaches its minimum.
#          How does it change with the local behavior of $f(x)$?


In [None]:
# HANDSON: Find your optimal `h` for different functions.
#          Specifically, replace $f(x)$ by $e^{-x^2}$ or $e^{3x}$.
#          How do the error curves change?
#          Which methods are more robust?


In [None]:
# HANDSON: Probe subtraction error.
#          For a fixed tiny `h=10e-12`, evaluate the numerators
#          as `x0` varies.
#          Where does the error spike?
#          Why?
