# Eigenvalue Problems

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from scipy import integrate, linalg

## Introduction, concept and useful properties

A given direction in a vector space is determined by any nonzero vector pointing in that direction. Given an $n\times n$ matrix $\mathbf{A}$ representing a linear transformation on an $n$-dimensional vector space, we wish to find a nonzero vector $\mathbf{x}$ and a scalar $\lambda$ such that

$$
\mathbf{Ax}=\lambda\mathbf{x}
$$

Such a scalar $\lambda$ is called an **eigenvalue**, and $\mathbf{x}$ is a corresponding **eigenvector**. 

An eigenvector of a matrix determines a direction in which the effect of the matrix is particularly simple: The matrix expands or shrinks any vector lying in that direction by a scalar multiple, and the expansion or contraction factor is given by the corresponding eigenvalue $\lambda$. Thus, eigenvalues and eigenvectors provide a means of understanding the complicated behavior of a general linear transformation by decomposing it into simpler actions.

### Characteristic polynomial

For a square matrix $\mathbf{A}$, the equation $\mathbf{Ax}=\lambda\mathbf{x}$ is equivalent to

$$
(\mathbf{A}-\lambda\mathbf{I})\mathbf{x}=\mathbf{0}
$$

The eigenvalues of $\mathbf{A}$ are the values of $\lambda$ such that

$$
\mathrm{det}(\mathbf{A}-\lambda\mathbf{I})=0
$$

The polynomial $p(\lambda)=\mathrm{det}(\mathbf{A}-\lambda\mathbf{I})$ is called the **characteristic polynomial** of $\mathbf{A}$ and its roots are the eigenvalues of $\mathbf{A}$.

> **Example**
>
> Consider the matrix
> 
> $$\begin{bmatrix}
3 & 1  \\
1 & 3 
\end{bmatrix}$$
>
> The characteristic polynomial is
> 
> $$
\mathrm{det}\left(\begin{bmatrix}
3 & 1  \\
1 & 3 
\end{bmatrix}-\lambda \begin{bmatrix}
1 & 0  \\
0 & 1 
\end{bmatrix}\right)
=\mathrm{det}\left(\begin{bmatrix} 3-\lambda& 1 \\
1 &3-\lambda\end{bmatrix}\right)$$
>
> $$ = (3-\lambda)(3-\lambda)-(1)(1)=\lambda^2-6\lambda+8=0$$
>
> The roots of this polynomial (and hence the eigenvalues of $\mathbf{A}$) are 4 and 2.


In [None]:
# Compute eigenvalues with this technique
A = np.array([[3, 1], [1, 3]])


def char_pol(A):
    """compute the characteristic polynomial of a matrix

    Parameters
    ----------
    A: a square matrix

    Returns
    -------
    A string that shows the characteristic polynomial with X the eigenvalues.
    """

    # First we will check the dimensions of the array
    if np.shape(A)[0] != np.shape(A)[1]:
        raise TypeError("Matrix A must be a square matrix.")

    # Now we will use the .poly() function imported from numpy
    # to find the characteristic polynomial.
    return np.poly(A)


A = char_pol(A)


def format_poly(A):
    """Return the characteristic polynomial of matrix A formatted as a string."""
    string = ""
    n = len(A) - 1
    for el in A:
        string += f"{el}X^{n} + "
        n -= 1
    return string.rstrip("+ ")


poly = format_poly(A)

# The next function will compute the roots of the characteristic polynomial of A
# (i.e. the eigenvalues of A).
eigenvalues = np.roots(A)

print(f"The characteristic polynomial of matrix A is given by {poly}, ")
print(f"and the corresponding eigenvalues are {eigenvalues}.")

Although, in theory, this is a nice way to find the eigenvalues of a matrix $\mathbf{A}$, calculating the roots of its characteristic polynomial is not a good numerical way to find the eigenvalues of a matrix of nontrivial size for several reasons
- Computing the coefficients of the characteristic polynomial for a large matrix is in itself already a substantial task
- The coefficients of the characteristic polynomial can be highly sensitive to small perturbations in $\mathbf{A}$ which can render their computation instable
- Rounding errors in finding the characteristic polynomial can destroy the accuracy of the roots 
- Computing the roots of a polynomial of high degree is a nontrivial and substantial task

### Properties and transformations

Many numerical methods for computing eigenvalues and eigenvectors are based on reducing the original matrix to a simpler form, whose eigenvalues and eigenvectors are then easily determined. Thus, we need to identify what types of transformations preserve eigenvalues, and for what types of matrices the eigenvalues are easily determined.

- If $\mathbf{A}$ is symmetric/Hermitian, all its eigenvalues are real.
- **Shift**: if $\mathbf{Ax}=\lambda\mathbf{x}$ and $\sigma$ any scalar, then ($\mathbf{A}-\sigma \mathbf{I})\mathbf{x}=(\lambda-\sigma)\mathbf{x}$; The eigenvalues are shifted by $\sigma$, but the eigenvectors remain unchanged.
- **Inversion**: $\mathbf{A}^{-1}$ has the same eigenvectors as $\mathbf{A}$, and eigenvalues $1/\lambda$
- **Powers**: $\mathbf{A}^k$ has the same eigenvectors as $\mathbf{A}$, and eigenvalues $\lambda^k$
- **Polynomials**: for a general polynomial $p(t)$, $p(\mathbf{A})\mathbf{x}=p(\lambda)\mathbf{x}$. Thus the eigenvalues of a polynomial in a matrix $\mathbf{A}$ are given by the same polynomial, evaluated at the eigenvalues of $\mathbf{A}$ and the corresponding eigenvectors remain the same as those of $\mathbf{A}$.
- **Similarity**: A matrix $\mathbf{B}$ is *similar* to a matrix $\mathbf{A}$ if there exists an invertible matrix $\mathbf{T}$ such that
- 
$$
\mathbf{B}=\mathbf{T^{-1}}\mathbf{A}\mathbf{T}
$$

It follows that:

$$
\mathbf{By}=\lambda\mathbf{y}\Rightarrow\mathbf{T^{-1}}\mathbf{ATy}=\lambda\mathbf{y}\Rightarrow \mathbf{ATy}=\lambda\mathbf{Ty}
$$

In other words, $\mathbf{B}=\mathbf{T^{-1}}\mathbf{A}\mathbf{T}$ has the same eigenvalues as $\mathbf{A}$, but systematically transforms its eigenvectors.


## Calculating eigenvalues and eigenvectors

### Power iteration

This is a simple, but limited method that allows to estimate the dominant eigenvalue and its corresponding eigenvector.

It works by multiplying an arbitrary nonzero vector repeatedly by the matrix.

Assuming that $\mathbf{A}$ has a unique eigenvalue $\lambda_1$ of maximum modulus, with corresponding eigenvector $\mathbf{v}_1$, power iteration converges to a multiple of $\mathbf{v}_1$.

> **Proof:**
>
> Assume that we can express the starting vector $\mathbf{x}_0$ as a linear combination $\mathbf{x}_0=\sum_{j=1}^n\alpha_j\mathbf{v}_j$, with $\mathbf{v}_j$ the eigenvectors of $\mathbf{A}$.
>
>$$\begin{aligned}
\mathbf{x}_k &= \mathbf{A}\mathbf{x}_{k-1}=\mathbf{A}^2\mathbf{x}_{k-2}=\ldots=\mathbf{A}^k\mathbf{x}_{0} \\
&=\mathbf{A}^k\sum_{j=1}^n\alpha_j\mathbf{v}_j=\sum_{j=1}^n\alpha_j\mathbf{A}^k\mathbf{v}_j=\sum_{j=1}^n\lambda_j^k\alpha_j\mathbf{v}_j \\
&=\lambda_1^k\left(\alpha_1\mathbf{v}_1+\sum_{j=2}^n(\lambda_j/\lambda_1)^k\alpha_j\mathbf{v}_j\right)
\end{aligned}$$
>
> Here is $|\lambda_j/\lambda_1|$ < 1 since $\lambda_1$ is of maximum modulus. As a result, this factor will converge to 0 when k becomes large.

Power iteration usually works well in practice, but might fail for the following reasons:
- The starting vector $\mathbf{x}_0$ may have *no* component in the dominant eigenvector $\mathbf{v}_1$. In practice this is very unlikely and is mitigated after a few iterations due to rounding errors that introduce such a component.
- There may be more than 1 eigenvalue with the same maximum modulus, in which case the algorithm might converge to a linear combination of the corresponding eigenvectors.
- For a real matrix and real starting vector, the iteration can never converge to a complex vector.

Geometric growth of the components at each iteration risks overflow or underflow, so in practice the approximate eigenvector is rescaled to have norm 1 at every iteration. Then, $\mathbf{x}_k\rightarrow\mathbf{v}_1/\|\mathbf{v}_1\|_\infty$ and $\|\mathbf{y}_k\|_\infty\rightarrow\|\lambda_1\|$. With $\mathbf{A}\mathbf{x}_k =\mathbf{y}_k$ and using the infinity norm defined as $\|\mathbf{a}\|_\infty$ = $max(|a_1|$, $|a_2|$, ..., $|a_n|$).

The convergence rate of power iteration is linear (and proportional with $\|\lambda_2/\lambda_1\|$, where $\lambda_2$ is the eigenvalue with second largest modulus.

A straightforward implementation is shown below:

In [None]:
def power(A):
    """Normalized power iteration."""
    # x = np.random.random(len(A))
    x = np.array([0, 1])
    vectors = [x]
    for _ in range(15):
        y = A @ x
        x = y / linalg.norm(y, np.inf)
        print(f"[{x[0]:5.3f}, {x[1]:5.3f}]: {linalg.norm(y, np.inf):.3f}")
        vectors.append(x)
    return x, vectors


# Example
A = np.array([[1, 3], [3, 1]])
iteration = power(A)[1]
# the actual eigenvectors v_0 and v_1, solved by hand
eigenvectors = np.array([1, 1]), np.array([1, -1])

As expected, this converges to the eigenvalue 4, and the corresponding eigenvector `[1, 1]`.

### Inverse iteration

For some applications, we're interested in the smallest eigenvalue of a matrix. Then we can make use of the fact that the eigenvalues of $\mathbf{A}^{-1}$ are $1/\lambda$. This suggests to use power iteration on the inverse of $\mathbf{A}$, but as usual the inverse of $\mathbf{A}$ does not need to be calculated explicitly.

Instead, the equivalent system of linear equations is solved at each iteration using the triangular factors resulting from e.g. LU-factorization of $\mathbf{A}$, which need only to be calculated once. Using $\mathbf{L}$ and $\mathbf{U}$, we can then efficiently solve $\mathbf{Ay}=\mathbf{x}$ using forward and backward substitution.
(These functions are also used for the Rayleigh quotient iteration below.)

In [None]:
# define helper functions for the forward and backward substitution
def forward_substitution(L, b):
    n = len(L)
    x = np.zeros(n)
    for j in range(n):
        if L[j][j] == 0:  # stop if matrix is singular
            break
        x[j] = b[j] / L[j][j]
        for i in range(j, n):
            b[i] = b[i] - L[i][j] * x[j]
    return x


def backward_substitution(U, b):
    n = len(U)
    x = np.zeros(n)
    # Notice that the last value of range is exclusive,
    # which is very counter-intuitive for countdowns).
    for j in range(n - 1, -1, -1):
        if U[j][j] == 0:  # stop if matrix is singular
            break
        x[j] = b[j] / U[j][j]
        for i in range(0, j):
            b[i] = b[i] - U[i][j] * x[j]
    return x

Inverse iteration converges to the eigenvector corresponding to the largest eigenvalue of $\mathbf{A}^{-1}$, which is the smallest eigenvalue of $\mathbf{A}$.

In [None]:
def inverse_iter(A):
    num_iters = 50
    tol = 1e-10

    _, L, U = linalg.lu(A, permute_l=False)

    # Initialize a random starting vector x and normalize it
    x = np.random.random(len(A))
    x /= linalg.norm(x, np.inf)  # Normalize x to avoid scaling issues

    # Lists to store the sequence of approximate eigenvectors and eigenvalues
    eigvecs = [x.copy()]
    eigvals = [0]

    for _ in range(num_iters):
        # Solve the system A * y = x using LU decomposition, where A = L*U.
        # This involves forward and backward substitution.
        y = forward_substitution(L, x)  # Solves L * y = x for y
        y = backward_substitution(U, y)  # Solves U * y = y for y

        # Normalize the resulting vector to avoid numerical overflow or underflow
        x = y / linalg.norm(y, np.inf)

        # Append the current eigenvector and eigenvalue approximation
        eigvecs.append(x.copy())
        eigvals.append(linalg.norm(y, np.inf))

        # Check for convergence
        if np.abs(eigvals[-1] - eigvals[-2]) < tol:
            print("converged after", len(eigvals), "iterations")
            break

    return np.array(eigvecs), eigvals

In [None]:
A = np.array([[3, 1], [1, 3]])
inverse_iter(A)

As expected this converges to [-1, 1] which is the eigenvector corresponding to  
the dominant eigenvalue of $A^{-1}$ is 0.5.
This corroborates, what we already knew, i.e. the smallest eigenvalue of A is 2.

By shifting the matrix $\mathbf{A}$ to $\mathbf{A}-\sigma\mathbf{I}$, all eigenvalues are also shifted by $\sigma$.
In case of inverse iteration this approach gives some flexibility in which eigenvalue will be found. If we apply inverse iteration on the matrix $\mathbf{A}-\sigma\mathbf{I}$, the largest eigenvalue of its inverse will be found. The inverse of the eigenvalue gives than the smallest eigenvalue of $\mathbf{A}-\sigma\mathbf{I}$. If we now add $\sigma$ to the eigenvalue, we find the eigenvalue of $\mathbf{A}$ closest to $\sigma$. Also, when the shift is already a close approximation of the eigenvalue, the convergence is very rapid.

In [None]:
A = np.array([[3, 1], [1, 3]]) - np.array([[3.8, 0], [0, 3.8]])
inverse_iter(A)

When using shifted inverse iteration, the value obtained is the **inverse of the shifted eigenvalue**. To find the corresponding eigenvalue of $ \mathbf{A} $, first take the reciprocal of the result, then add the shift $ \sigma $ back. This final value is the eigenvalue of $\mathbf{A} $ closest to $\sigma $.

In the case of the example above, the eigenvalue is 1/5 + 3.8=4, as expected.

### Rayleigh quotient iteration

Given an approximate eigenvector $\mathbf{x}$ for a real matrix $\mathbf{A}$, finding the best estimate for the corresponding eigenvalue $\lambda$ can be considered as a linear least squares approximation problem:

$$
\mathbf{x}\lambda\cong\mathbf{Ax}
$$

It's solution, the **Rayleigh quotient**, is given by

$$
\lambda=\frac{\mathbf{x}^\intercal\mathbf{Ax}}{\mathbf{x}^\intercal\mathbf{x}}
$$

This is a better approximation for the eigenvalue than the one obtained at each stage in the power iteration algorithm.

Given an approximate eigenvector, the Rayleigh quotient provides a good estimate for the corresponding eigenvalue. Conversely, inverse iteration converges very rapidly to an eigenvector if an approximate eigenvalue is used as shift. When combining these ideas, we arrive at **Rayleigh quotient iteration**.

An example implementation is shown below.

In [None]:
def rayleigh_iter(A):
    # x = np.random.random(len(A))
    x = np.array([2, 0.05])
    eigvecs = [x.copy()]
    eigvals = [0]
    while True:
        shift = (x @ A @ x) / (x @ x)
        B = A - shift * np.identity(len(A))
        _, L, U = linalg.lu(B, permute_l=False)
        y = forward_substitution(L, x)
        y = backward_substitution(U, y)
        if linalg.norm(y, np.inf) != 0:
            x = y / linalg.norm(y, np.inf)
            eigvecs.append(x.copy())
            eigvals.append(shift)
        else:
            # the iteration will halt as the shifted matrix
            # becomes singular (eigenvalue = 0)
            break
    return np.array(eigvecs), eigvals


A = np.array([[3.0, 1.0], [1.0, 3.0]])
rayleigh_iter(A)

Note that we quickly converge to the eigenvector [1, 1] with eigenvalue 4(quicker than with power iteration).

### Deflation

The process of **deflation** removes a known eigenvalue from a matrix, so that further eigenvalues and eigenvectors can be determined. This process is similar to removing a known root $\lambda_1$ from a polynomial $p(\lambda)$ by dividing it out to obtain $p(\lambda)/(\lambda-\lambda_1)$.

This can be achieved by letting $\mathbf{u}_1$ be any vector such that $\mathbf{u}_1^\intercal \mathbf{x}_1=\lambda_1$.
Then the matrix $\mathbf{A}-\mathbf{x}_1\mathbf{u}_1^\intercal$ has eigenvalues $0,\lambda_2,\ldots,\lambda_n$.

> An **example** of a deflation procedure is shown below to find both eigenvalues of the matrix
>
> $$\begin{bmatrix}
3 & 1  \\
1 & 3 
\end{bmatrix}$$

Note that the eigenvector found depends on the choice of $\mathbf{u}_1$, when using this procedure and **the remaining eigenvectors of the deflated matrix are generally different from those of the original matrix**. This is why deflation is often limited to theoretical applications, and practical computations of multiple eigenvalues are usually performed with other methods, such as shifted inverse iteration.

We're not going to look deeper into this procedure because 
- it becomes increasingly cumbersome and numerically less accurate to find eigenvalues using deflation (so that inverse iteration using the estimated eigenvalues as a shift are necessary)
- there are better ways to find many eigenvalues of a matrix.


In [None]:
# We start by finding the largest eigenvalue of A using power iteration
# which gives us "4"
A = np.array([[3, 1], [1, 3]])
print("Original Matrix A:")
print(A)
print("\nApplying Power Iteration to A:")
power(A);

We now perform deflation to remove the largest eigenvalue $ \lambda_1 = 4 $ using two different choices of $\mathbf{u}_1 $.

**First Choice of $\mathbf{u}_1$**

We choose:

$$
\mathbf{u}_1 = \begin{bmatrix} 0 \\ 4 \end{bmatrix}, \quad \mathbf{x}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}
$$

In [None]:
u = np.array([[0], [4]])
x = np.array([[1], [1]])
print("\nFirst Choice for u:")
print("u =", u.flatten())
print("\nVerifying u^\intercal x = lambda_1:")
print("u^\intercal x =", np.dot(u.T, x))

# Perform deflation
print("\nDeflating A with x * u^\intercal:")
A_deflated = A - np.dot(x, u.T)
print(A_deflated)
print("\nApplying Power Iteration to Deflated A:")
power(A_deflated);

**Second Choice of $\mathbf{u}_1$**

Now we choose a different vector $ \mathbf{u}_1 $:

$$\mathbf{u}_1 = \begin{bmatrix} 2 \\ 2 \end{bmatrix}$$

Repeating the steps with this choice of $ \mathbf{u}_1 $:

In [None]:
A = np.array([[3, 1], [1, 3]])
print("\nSecond Choice for u:")
u = np.array([[2], [2]])
x = np.array([[1], [1]])
print("u =", u.flatten())
print("\nVerifying u^\intercal x = lambda_1:")
print("u^\intercal x =", np.dot(u.T, x))

# Perform deflation
print("\nDeflating A with x * u^\intercal:")
A_deflated = A - np.dot(x, u.T)
print(A_deflated)
print("\nApplying Power Iteration to Deflated A:")
power(A_deflated);

###  QR  Iteration

In practice, the fastest and most used method to find the eigenvalues of a matrix is **QR iteration**. 
Starting from a matrix $\mathbf{A}$, we define the following sequence:

$$\begin{aligned}
\mathbf{A}_m & = \mathbf{Q}_m\mathbf{R}_m \\
\mathbf{A}_{m+1} &= \mathbf{R}_m\mathbf{Q}_m
\end{aligned}$$

With $\mathbf{Q}$ an orthogonal matrix and an $\mathbf{R}$ an upper-triangular matrix. This sequence will converge to a triangular matrix with the eigenvalues of $\mathbf{A}$ on its diagonal, or a near-triangular form, which easily allows calculating the eigenvalues.

As an example, we use QR iteration on the matrix

$$\begin{bmatrix}
2.9766&0.3945&0.4198&1.1159\\
0.3945&2.7328&-0.3097&0.1129\\
0.4198&-0.3097&2.5675&0.6079\\
1.1159&0.1129&0.6079&1.7231
\end{bmatrix}$$

which has eigenvalues 1, 2, 3 and 4.


In [None]:
def qr_iter(A):
    """QR iteration"""
    for _ in range(10):
        q, r = linalg.qr(A)
        A = np.dot(r, q)
        print(A)
        print()
    return A


A = np.array(
    [
        [2.9766, 0.3945, 0.4198, 1.1159],
        [0.3945, 2.7328, -0.3097, 0.1129],
        [0.4198, -0.3097, 2.5675, 0.6079],
        [1.1159, 0.1129, 0.6079, 1.7231],
    ]
)
with np.printoptions(precision=2, suppress=True):
    qr_iter(A)

To speed up this procedure we can use shifts, similar to their use in the power method. The most straightforward choice as shift is the lower right element of the matrix, but depending on the specifics of the problem better shifts might exist. In the example below, note how the obtained off-diagonal entries converge faster to zero than in the case without shifts.

In [None]:
def qr_iter_shift(A):
    """QR iteration with shift."""
    for _ in range(9):
        shift = A[len(A) - 1][len(A) - 1]
        q, r = linalg.qr(A - shift * np.identity(len(A)))
        A = np.dot(r, q) + shift * np.identity(len(A))
        print(A)
        print()
    return A


A = np.array(
    [
        [2.9766, 0.3945, 0.4198, 1.1159],
        [0.3945, 2.7328, -0.3097, 0.1129],
        [0.4198, -0.3097, 2.5675, 0.6079],
        [1.1159, 0.1129, 0.6079, 1.7231],
    ]
)
with np.printoptions(precision=2, suppress=True):
    qr_iter_shift(A)

## Calculating the Singular Value Decomposition


Recall from the notebook about linear least squares that the **singular value decomposition (SVD)** of an $m \times n$ matrix $\mathbf{A}$ has the form

$$
\mathbf{A}=\mathbf{U\Sigma V^\intercal}
$$

where $\mathbf{U}$ is an $m \times m$ orthogonal matrix,  $\mathbf{V}$ is an $n \times n$ orthogonal matrix, and $\mathbf{\Sigma}$ is an $m \times n$ diagonal matrix, with 

$$
\sigma_{ij}=\begin{cases}
    0, & \text{for $i\neq j$}\\
    \sigma_i\geq 0, & \text{for $i=j$}
  \end{cases}
$$

The diagonal entries $\sigma_i$ are called the **singular values** of $\mathbf{A}$ and are usually ordered so that $\sigma_{i-1}\geq \sigma_{i}, i=2,\ldots,\mathrm{min}\{m,n\}$, i.e. from largest value (upper left) to smallest value (bottom right). The columns $\mathbf{u}_i$ of $\mathbf{U}$ and $\mathbf{v}_i$ of $\mathbf{V}$ are the corresponding left and right **singular vectors**.

We discussed some handy applications of the SVD in that notebook, but postponed the calculation of the decomposition matrices. Here, we revisit the concept because singular values and vectors are intimately related to eigenvalues and eigenvectors. The singular values of $\mathbf{A}$ are the non-negative square roots of the eigenvalues of $\mathbf{A^\intercal A}$, and the columns of $\mathbf{U}$ and $\mathbf{V}$ are orthonormal eigenvectors of $\mathbf{A A^\intercal}$ and $\mathbf{A^\intercal A}$, respectively.

> **Example**
>
> The singular value decomposition of the matrix 
>
> $$\mathbf{A} = \begin{bmatrix}
3 & 1  \\
1 & 3
\end{bmatrix}$$
is given by
>
> $$\mathbf{U\Sigma V^\intercal} = \begin{bmatrix}
\frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}
\end{bmatrix} \begin{bmatrix}
4 & 0 \\
0 & 2
\end{bmatrix} \begin{bmatrix}
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\
-\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}
\end{bmatrix}.
$$
> 
> This statement can be verified by explicitly calculating $\mathbf{U}$, $\mathbf{\Sigma}$ and $\mathbf{V}$. We begin with
>
> $$
\mathbf{A^\intercal A} = \mathbf{A A^\intercal} = \begin{bmatrix}
10 & 6 \\
6 & 10
\end{bmatrix}
$$
>
> which are equal here because $\mathrm{A}$ is a symmetric matrix. We can employ one of the methods discussed above to calculate its eigenvalues and eigenvectors. These are $\lambda_1 = \sigma_1^2 = 16$ with eigenvector $\mathbf{v}_1 = [1, 1]^\intercal$ and $\lambda_2 = \sigma_2^2 = 4$ with $\mathbf{v}_2 =[-1, 1]^\intercal$.
> The eigenvectors are easily converted to their orthonormal form, which results in
>
> $$
\mathbf{U} = \mathbf{V} = \begin{bmatrix}
\frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \\
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}
\end{bmatrix}.
$$
>
> Now we construct $\mathbf{\Sigma}$ as diag$\left(\sqrt{\lambda_1}, \sqrt{\lambda_2}\right)$ and transpose $\mathbf{V}$ in order to find the proposed SVD.

In [None]:
# code for the example

A = np.array([(3, 1), (1, 3)])

transA = A.transpose()
product = A @ transA
eigenvalues, U = np.linalg.eig(product)
eigenvalues = np.sqrt(eigenvalues)
Delta = np.diag(eigenvalues)

SVD = U @ Delta @ U.transpose()

print("A^\intercalA = AA^\intercal = ")
print(product)
print("\neigenvalues =")
print(eigenvalues)

print("\nU = V =")
print(U)

print("\nSVD =")
print(SVD)

In [None]:
# Or with SciPy's SVD
U, s, vh = linalg.svd(A)
SVD = U @ np.diag(s) @ vh
print(U, "\n\n", s, "\n\n", vh, "\n\n", SVD)

## Software

Until just a few years ago, QR iteration was the standard method for computing all of the eigenvalues (and optionally eigenvectors) of the resulting tridiagonal matrix. More recently, however, first divide-and-conquer and then relatively robust representation (RRR) methods (not shown in this course) have surpassed QR iteration in speed for computing all the eigenvectors. Implementations of both of these newer methods are available in LAPACK, but they do not yet have the decades-long record of reliability enjoyed by QR iteration. Thus, for now, the choice is between the speed of the newer methods, especially RRR, and the more proven dependability of QR iteration.

In `scipy` the most general method you can use is `linalg.eig`, which uses QR iteration. There exist similar methods to calculate an SVD. However, if your matrix has special properties, there are faster options that specifically make use of this information:

| Method                 | Description                                                                                           |
|:-----------------------|:------------------------------------------------------------------------------------------------------|
| `eig`                  | Solve an ordinary or generalized eigenvalue problem of a square matrix.                               |
|
| `eigvals`              | Compute eigenvalues from an ordinary or generalized eigenvalue problem.                               |
| `eigh`                 | Solve a standard or generalized eigenvalue problem for a complex Hermitian or real symmetric matrix.  |
| `eigvalsh`             | Solves a standard or generalized eigenvalue problem for a complex Hermitian or real symmetric matrix. |
| `eig_banded`           | Solve real symmetric or complex Hermitian band matrix eigenvalue problem.                             |
| `eigvals_banded`       | Solve real symmetric or complex Hermitian band matrix eigenvalue problem.                             |
| `eigh_tridiagonal`     | Solve eigenvalue problem for a real symmetric tridiagonal matrix.                                     |
| `eigvalsh_tridiagonal` | Solve eigenvalue problem for a real symmetric tridiagonal matrix.                                     |
|                        |
| `svd`                  | Compute the single decomposition matrices.                                                            |
| `svdvals`              | Compute singular values of a matrix.                                                                  |

Further documentation can be found [here](https://docs.scipy.org/doc/scipy/tutorial/linalg.html#eigenvalues-and-eigenvectors) and [here](https://docs.scipy.org/doc/scipy/tutorial/linalg.html#singular-value-decomposition).

An example of the use of `linalg.eig` is shown below.

In [None]:
A = np.array(
    [
        [2.9766, 0.3945, 0.4198, 1.1159],
        [0.3945, 2.7328, -0.3097, 0.1129],
        [0.4198, -0.3097, 2.5675, 0.6079],
        [1.1159, 0.1129, 0.6079, 1.7231],
    ]
)

la, v = linalg.eig(A)
l1, l2, l3, l4 = la
print(l1, l2, l3, l4)  # eigenvalues

print(v[:, 0])  # first eigenvector
print(v[:, 1])  # second eigenvector
print(v[:, 2])  # third eigenvector
print(v[:, 3])  # fourth eigenvector

## Physics Example: spring-and-mass system

> This example is partially inspired by the cc-licensed material from Michael Richmond found [here](http://spiff.rit.edu/classes/phys283/lectures/eigen_ii/eigen_ii.html).

Consider the following system consisting of 2 masses, connected by identical springs fixed to a wall at the sides

In [None]:
def mk_spring(x1, x2, steps=12, height=1.0):
    l = x2 - x1
    nd = np.sqrt(max(0, height**2 - (l**2 / steps**2))) / 2

    def rx(i):
        return x1 + (l * (2 * i - 1)) / (2 * steps)

    def ry(i):
        return nd * (i % 2 * 2 - 1)

    s = [(rx(i), ry(i)) for i in range(1, steps + 1)]
    s = [(x1, 0), *s, (x2, 0)]

    return np.array(s)


def draw_springs(spring_configs):
    # The use of matplotlib to draw the spring system is considered
    # "spielerei" and not part of the course material.
    springs = [
        mk_spring(x1, x2, steps=s, height=h) for (x1, x2, s, h) in spring_configs
    ]

    # Plot settings
    plt.close("springs")
    fig, ax = plt.subplots(num="springs")
    ax.axis("off")
    ax.set_ylim(-1, 1)

    # Walls
    ax.axvline(x=0, color="k", linewidth=3)
    ax.axvline(x=23, color="k", linewidth=3)

    # Springs
    for spring in springs:
        ax.plot(*spring.T, "k")
        ax.plot(*spring[0], "ko", markersize=7)
        ax.plot(*spring[-1], "ko", markersize=7)

    # Labels
    for idx, spring in enumerate(springs[1:]):
        ax.plot(spring[0][0], spring[0][1], "ko", markersize=7)
        ax.text(spring[0][0], spring[0][1] - 0.12, f"M{idx+1}")


draw_springs([(0, 7, 12, 1), (7, 16, 12, 1), (16, 23, 12, 1)])

Let's call $x_1$ and $x_2$ the displacements of $M_1$ and $M_2$, respectively, from their equilibrium positions.

The forces (which define the accelerations) acting on each mass are

$$\begin{aligned}
  F_1=M_1\frac{d^2x_1}{dt^2} & =-kx_1 +k(x_2-x_1) \\
  F_2=M_2\frac{d^2x_2}{dt^2} & =-k(x_2-x_1)-k x_2
\end{aligned}$$

This can be written as the following matrix equation

$$
\begin{bmatrix}
-\frac{2k}{M_1}&\frac{k}{M_1}\\
\frac{k}{M_2}&-\frac{2k}{M_2}
\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}
=\begin{bmatrix}\frac{d^2x_1}{dt^2}\\\frac{d^2x_2}{dt^2}\end{bmatrix}
$$

We're looking for a specific combination of $x_1$ and $x_2$ for which $\mathbf{Ax}=\lambda\mathbf{x}$ with $\lambda$ the eigenvalue and $\mathbf{x}$= $[a$  $b]^\intercal$ the corresponding eigenvector.


When comparing this to our original matrix equation, this means that 

$$
\begin{bmatrix}
-\frac{2k}{M_1}&\frac{k}{M_1}\\
\frac{k}{M_2}&-\frac{2k}{M_2}
\end{bmatrix}\begin{bmatrix}a\\b\end{bmatrix}=\lambda
\begin{bmatrix}a\\b\end{bmatrix}
$$

We can move the right-hand side to the left and end up with

$$
\begin{bmatrix}
-\frac{2k}{M_1}&\frac{k}{M_1}\\
\frac{k}{M_2}&-\frac{2k}{M_2}
\end{bmatrix}\begin{bmatrix}a\\b\end{bmatrix}-\lambda
\begin{bmatrix}a\\b\end{bmatrix}=\begin{bmatrix}0\\0\end{bmatrix}
$$

Writing these equations out explicitly gives

$$\begin{aligned}
 \left(\frac{-2k}{M_1}-\lambda\right) a + \frac{k}{M_1}b= & 0 \\
 \frac{k}{M_2} a - \left(\frac{2k}{M_2}+\lambda\right)b= & 0
\end{aligned}$$

The latter can be written as

$$a=\frac{M_2}{k}\left(\frac{2k}{M_2}+\lambda\right)b=\left(2+\frac{M_2\lambda}{k}\right)b$$

and be filled in in the former to give

$$\left(\frac{-2k}{M_1}-\lambda\right) \left(2+\frac{M_2\lambda}{k}\right)b + \frac{k}{M_1}b=0$$


$$-\frac{M_2}{k}\lambda^2-\left(2+\frac{2M_2}{M_1}\right)\lambda-\frac{3k}{M_1}=0$$

multiply by $-\frac{k}{M_2}$

$$\lambda^2+2k\left(\frac{M_1+M_2}{M_1M_2}\right)\lambda+\frac{3k^2}{M_1M_2}=0$$

Solving this for $\lambda$ gives

$$\lambda=\frac{-2k\left(\frac{M_1+M_2}{M_1M_2}\right)\pm\sqrt{4k^2\left(\frac{M_1+M_2}{M_1M_2}\right)^2-4\frac{3k^2}{M_1M_2}}}{2}$$

Or,

$$\lambda=\frac{-k}{M_1M_2}\left[(M_1+M_2)\pm\sqrt{(M_1-M_2)^2+M_1M_2}\right]$$

For simplicity, let's assume that $M_1=M_2$ so this reduces to $\lambda=-k/M$ and $\lambda=-3k/M$.


Using these $\lambda$ in the following set of equations

$$\begin{aligned}
  \left(\frac{-2k}{M_1}-\lambda\right) a + \frac{k}{M_1}b= & 0 \\
  \frac{k}{M_2} a - \left(\frac{2k}{M_2}+\lambda\right)b= & 0
\end{aligned}$$

gives

$$\begin{aligned}
  \left(\frac{-2k}{M}+\frac{k}{M}\right) a + \frac{k}{M}b= & 0 \\
  \frac{k}{M} a - \left(\frac{2k}{M}-\frac{k}{M}\right)b= & 0
\end{aligned}$$ 

and

$$\begin{aligned}
  \left(\frac{-2k}{M}+3\frac{k}{M}\right) a + \frac{k}{M}b= & 0 \\
  \frac{k}{M} a - \left(\frac{2k}{M}-3\frac{k}{M}\right)b= & 0
\end{aligned}$$ 

which can be solved as $a=b$ and $a=-b$, so the corresponding eigenvectors are $[1$  $1]^\intercal$ and $[1$  $-1]^\intercal$

We could have saved ourselves all this work if we would just have asked scipy:

For instance, for $k=M_1=M_2=1$, we would find

In [None]:
A = np.array([(-2, 1), (1, -2)])

In [None]:
la, v = linalg.eig(A)
print(la, "\n", v)

Which are indeed the expected eigenvalues of -1 and -3 and eigenvectors $[1$  $1]^\intercal$ and $[1$  $-1]^\intercal$ (normalized to 1)

>You could find a simple animation for the three eigenmodes of the oscillator [here](https://www.acs.psu.edu/drussell/Demos/multi-dof-springs/multi-dof-springs.html). The animations and text on the page are Â©2004-2013 by Daniel A. Russell.


If we now look at this combination of the original equations in our set 

$$\begin{aligned}
  F_1=M\frac{d^2x_1}{dt^2} & =-kx_1 +k(x_2-x_1)  \\
  F_2=M\frac{d^2x_2}{dt^2} & =-k(x_2-x_1)-k x_2
\end{aligned}$$

we find for the eigenvector $[1 1]^\intercal$:

$$M\frac{d^2x_1}{dt^2}+M\frac{d^2x_1}{dt^2}=-k (x_1+x_2)$$

and for eigenvector $[1 -1]^\intercal$:

$$M\frac{d^2x_2}{dt^2}-M\frac{d^2x_1}{dt^2}=-3k (x_2-x_1)$$


when introducing the variables $s_1=(x_1+x_2)$ and $s_2=(x_2-x_1)$ this results in the following equations

$$\begin{aligned}
  \frac{d^2s_1}{dt^2} & =-\frac{k}{M}s_1 \\
  \frac{d^2s_2}{dt^2} & =-\frac{3k}{M}s_2
\end{aligned}$$

which can easily be solved as

$$\begin{aligned}
  x_1+x_2=s_1 & =A \cos\left(\sqrt{k/M}t+\phi\right) \\
  x_2-x_1=s_2 & =A \cos\left(\sqrt{3k/M}t+\phi\right)
\end{aligned}$$

which shows that solving $A\mathbf{s}=\lambda\mathbf{s}$ very elegantly give you the dynamical equations that describe this system.

### Dynamical problem

Let's say we would want to know the position of the first block at time $t = 5s$, given the following initial conditions.
- the mass equals $M = 1\,\mathrm{kg}$
- the constant of the springs  $k = 1\,\mathrm{N/m}$
- at $t=0$, the first mass is at position $x_1 = 2\,\mathrm{m}$
- at $t = 0$ the second mass is at position $x_2 = -1\,\mathrm{m}$
- at $t=0$, the starting velocity is $v_1 = -1\,\mathrm{m/s}$
- at $t=0$, the staring velocity is $v_2 = 1\,\mathrm{m/s}$

We can find the constants of integration by plugging these conditions into $s_1$ and $s_2$.

$$
s_1(0) = 2 - 1 = A_1 \cos\left(\sqrt{1/1} \cdot 0+\phi_1\right)
$$

$$
\implies \left\lbrace \begin{aligned}
    A_1 &= 1\,\mathrm{m} \\
    \phi_1 &= 0
\end{aligned} \right.
$$

By using the same method in $s_2$ and its time derivative, we find;

$$
\left\lbrace \begin{aligned}
  A_2 &= -3.21\,\mathrm{m} \\
  \phi_2 &= 0.364\,\mathrm{rad}
\end{aligned}\right.
$$

Such that,

$$\begin{aligned}
  s_1(t) & = (1\,\mathrm{m}) \cos\left(\omega_1 t \right) \\
  s_2(t) & = (-3.21\,\mathrm{m}) \cos\left(\omega_2 t +0.364\,\mathrm{rad} \right)
\end{aligned}$$

We want to know the positions of the actual boxes, so we need the expressions for $x_1(t)$ and $x_2(t)$ instead of $s_1(t)$ and $s_2(t)$.

$$
\begin{aligned}
x_1(t) &= \frac{1}{2} \left[ s_1(t) - s_2(t) \right] \\
&= (0.5\,\mathrm{m}) \cos(\omega_1 t) + (1.61\,\mathrm{m}) \cos(\omega_2 t + 0.364\,\mathrm{rad})\\
\\
x_2(t) &= \frac{1}{2} \left[ s_1(t) + s_2(t) \right] \\
&= (0.5\,\mathrm{m}) \cos(\omega_1 t) - (1.61\,\mathrm{m}) \cos(\omega_2 t + 0.364\,\mathrm{rad})\\
\end{aligned}
$$

To find the positions of the masses we define the functions found with the initial conditions:

In [None]:
def x1(t):
    k = 1
    M = 1
    pos = 0.5 * np.cos(np.sqrt(k / M) * t) + 1.6 * np.cos(
        np.sqrt(3 * k / M) * t + 0.364
    )
    return pos


def x2(t):
    k = 1
    M = 1
    pos = 0.5 * np.cos(np.sqrt(k / M) * t) - 1.6 * np.cos(
        np.sqrt(3 * k / M) * t + 0.364
    )
    return pos


print("Mass one has position x =", x1(5), "at time t = 5s.")
print("Mass two has position x =", x2(5), "at time t = 5s.")

In [None]:
def plot_solution():
    """The positions of the masses over time for our initial conditions."""
    plt.close("solution")
    fig, ax = plt.subplots(figsize=(7, 4), num="solution")
    t = np.arange(0, 50, 0.1)
    ax.plot(t, x1(t))
    ax.plot(t, x2(t))

    ax.set_xlim([0, 50])
    ax.set_ylim([-3, 3])
    ax.legend(("Block 1", "Block 2"))
    ax.set_xlabel("Time $t$ (s)")
    ax.set_ylabel("displacement from equlibrium $x$ (m)")


plot_solution()

#### Alternative solution

Let's solve the same problem by integrating the differential equations (see notebook integrating ODE's).

Adapted from <https://scipy-cookbook.readthedocs.io/items/CoupledSpringMassSystem.html>

In [None]:
def vectorfield(w, t, p):
    """Defines the differential equations for the coupled spring-mass system.

    Parameters
    ----------
    w
        vector of the state variables: w = [x1,y1,x2,y2].
    t
        time.
    p
        vector of the parameters: p = [m1,m2,k1,k2].
    """
    x1, y1, x2, y2 = w
    m1, m2, k1, k2 = p

    # Create f = (x1',y1',x2',y2'):
    f = [y1, (-k1 * x1 + k2 * (x2 - x1)) / m1, y2, (-k2 * (x2 - x1) - k2 * x2) / m2]
    return f

The code above explicitly implements the following set of 1st order differential equations, equivalent to the 2 2nd order ODE's given by Newton's law we initially began the example with:

$$
\left\lbrace\begin{aligned}
\frac{dx_1}{dt} &= y_1 \\[4pt]
\frac{dy_1}{dt} &= \frac{-k_1 x_1 + k_2 (x_2 - x_1)}{m_1} \\[4pt]
\frac{dx_2}{dt} &= y_2 \\[4pt]
\frac{dy_2}{dt} &= \frac{-k_2 (x_2 - x_1) - k_2 x_2}{m_2}
\end{aligned}\right.
$$

Below, it will be solved using methods we saw in the ODE notebook.

In [None]:
def solve_ode():
    """Use ODEINT to solve the differential equations defined by the vector field."""

    # Parameter values
    # Masses:
    m1 = 1.0
    m2 = 1.0
    # Spring constants
    k1 = 1
    k2 = 1

    # Initial conditions
    # x1 and x2 are the initial displacements; y1 and y2 are the initial velocities
    x1 = 2.0
    y1 = -1.0
    x2 = -1.0
    y2 = 1.0

    # ODE solver parameters
    abserr = 1.0e-8
    relerr = 1.0e-6
    stoptime = 50.0
    numpoints = 1000

    # Create the time samples for the output of the ODE solver.
    # I use a large number of points, only because I want to make
    # a plot of the solution that looks nice.
    t = [stoptime * float(i) / (numpoints - 1) for i in range(numpoints)]

    # Pack up the parameters and initial conditions:
    p = [m1, m2, k1, k2]
    w0 = [x1, y1, x2, y2]

    # Call the ODE solver.
    return integrate.odeint(vectorfield, w0, t, args=(p,), atol=abserr, rtol=relerr)


wsol = solve_ode()

In [None]:
def plot_alternative_solution():
    plt.close("alternative")
    fig, ax = plt.subplots(figsize=(7, 4), num="alternative")

    t = np.arange(0, 50, 0.1)
    ax.plot(t, wsol[::2, 0])
    ax.plot(t, wsol[::2, 2])

    ax.set_xlim([0, 50])
    ax.set_ylim([-3, 3])
    ax.legend(("Block 1", "Block 2"))
    ax.set_xlabel("Time $t$ (s)")
    ax.set_ylabel("displacement from equilibrium $x$ (m)")


plot_alternative_solution()