# Symmetric Matrices and the Spectral Decomposition

If $\mathbf{A}$ is a real and symmetric matrix and has $n$ distinct and
real eigenvalues, then there are $n$ orthogonal, or orthonormal if
properly normalized, corresponding eigenvectors, which span
$\mathbb{R}^{n}$. This is equivalent to saying that if
$\mathbf{A}=\mathbf{A}^{T}$ has $n$ distinct and real eigenvalues, then
the corresponding eigenvectors form an orthogonal, or orthonormal, basis
of $\mathbb{R}^{n}$.

Let $\lambda_{1}$ and $\lambda_{2}$ be eigenvalues of $\mathbf{A}$, real
and symmetric, with $\lambda_{1} \neq \lambda_{2}$. Let $\mathbf{u}_{1}$
and $\mathbf{u}_{2}$ be the corresponding eigenvectors. Let
$<\cdot, \cdot>$ denote the canonic inner product, such that
$\mathbf{u}_{1} \perp \mathbf{u}_{2} \Leftrightarrow<\mathbf{u}_{1}, \mathbf{u}_{2}>=0$.
From the definition of inner product and adjoint transformations,

$$<\mathbf{A} \mathbf{u}_{1}, \mathbf{u}_{2}>=<\mathbf{u}_{1}, \mathbf{A}^{\mathrm{T}} \mathbf{u}_{2}>=<\mathbf{u}_{1}, \mathbf{A} \mathbf{u}_{2}>$$

because $\mathbf{A}=\mathbf{A}^{\mathrm{T}}$, or equivalently,
$\mathbf{A}$ is self-adjoint. Using the definition of eigenvectors and
eigenvalues, the equation above can be written as

$$\begin{aligned}
<\lambda_{1} \mathbf{u}_{1}, \mathbf{u}_{2}> & =<\mathbf{u}_{1}, \lambda_{2} \mathbf{u}_{2}> \\
\lambda_{1}<\mathbf{u}_{1}, \mathbf{u}_{2}> & =\lambda_{2}<\mathbf{u}_{1}, \mathbf{u}_{2}>
\end{aligned}$$

But as $\lambda_{1} \neq \lambda_{2}$ by assumption, then the equality
is satisfied if, and only if, $<\mathbf{u}_{1}, \mathbf{u}_{2}>=0$,
i.e., $\mathbf{u}_{1} \perp \mathbf{u}_{2}$.

A consequence of the theorem above is that if
$\mathbf{A}: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ is induced by a
symmetric matrix which possesses $n$ distinct eigenvalues, then its
eigenvectors form an orthogonal basis of $\mathbb{R}^{n}$, therefore
$\mathbf{A}$ is diagonalizable. In fact, we shall prove that we can find
a set of orthogonal, or orthonormal, eigenvectors for any symmetric
matrix, regardless of the values and multiplicities of its eigenvalues.
This means that any symmetric matrix is diagonalizable via a similarity
transformation constructed  
with the orthonormal eigenvectors. The following theorem states that any
square matrix $\mathbf{A}$ can be transformed in a triangular matrix
$\mathbf{T}$, similar to $\mathbf{A}$, by a similarity transformation
$\mathbf{P A P}{ }^{\mathrm{T}}$. As a consequence of the theorem, a
corollary states that if $\mathbf{A}$ is symmetric, then the transformed
matrix $\mathbf{T}$ is diagonal, which leads to the conclusion that any
symmetric matrix can be diagonalized.  
(**Schur’s Theorem**) Let
$\mathbf{A}: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ be a linear
operator in $\mathbb{R}^{n}$ induced by the square matrix $\mathbf{A}$.
Matrix $\mathbf{A}$ can be transformed into a triangular matrix by an
orthogonal similarity transformation.

According to Schur’s Theorem, A has at least one eigenvector
$\mathbf{v}_{1}$ associated with an eigenvalue $\lambda_{1}$, such that
$\mathbf{A v}_{1}=\lambda_{1} \mathbf{v}_{1}$. We also know that there
exists an orthogonal transformation $\mathbf{P}_{1}$ capable of
reflecting $\mathbf{v}_{1}$ onto the direction of vector
$\mathbf{e}_{1}$, the vector whose first element is equal to one and all
other elements are equal to zero, ${ }^{3}$ i.e.,

$$\mathbf{P}_{1} \mathbf{v}_{1}=\left\|\mathbf{v}_{1}\right\|_{2} \mathbf{e}_{1}$$

Therefore,

$$\mathbf{P}_{1} \mathbf{A} \mathbf{P}_{1}^{\mathrm{T}} \mathbf{P}_{1} \mathbf{v}_{1}=\lambda_{1} \mathbf{P}_{1} \mathbf{v}_{1} \Rightarrow \mathbf{P}_{1} \mathbf{A} \mathbf{P}_{1}^{\mathrm{T}} \mathbf{e}_{1}=\lambda_{1} \mathbf{e}_{1}$$

because $\left\|\mathbf{v}_{1}\right\|_{2} \neq 0$. Given the structure
of vector $\mathbf{e}_{1}$,

$$\mathbf{P}_{1} \mathbf{A} \mathbf{P}_{1}^{\mathrm{T}}=\left[\begin{array}{cc}
\lambda_{1} & \star \\
0 & \\
\vdots & \mathbf{A}_{1} \\
0 &
\end{array}\right]$$

where $\star$ represents the remaining elements of the first row and
$\mathbf{A}_{1}$ is some square matrix with $n-1$ rows and $n-1$
columns. We can invoke Theorem ?? again to apply a second transformation
$\mathbf{P}_{2}$,

$$\mathbf{P}_{2}=\left[\begin{array}{lll}
1 & 0 & \cdots \\
\vdots & & 0 \\
\overline{\mathbf{P}}_{2} \\
0 & &
\end{array}\right]$$

such that matrix
$\overline{\mathbf{P}}_{2} \mathbf{A}_{1} \overline{\mathbf{P}}_{2}^{\mathrm{T}}$
has has the same format as matrix
$\mathbf{P}_{1} \mathbf{A} \mathbf{P}_{1}^{\mathrm{T}}$, i.e., with only
the first element in the first column different from zero:

? =0 [1]

??=0 [2]

$\mathbf{P}_{2} \mathbf{P}_{1} \mathbf{A} \mathbf{P}_{1}^{\mathrm{T}} \mathbf{P}_{2}^{\mathrm{T}}=\left[\begin{array}{ccc}\lambda_{1} & & \star \\ 0 & \lambda_{2} & \star \\ 0 & 0 & \\ \vdots & \vdots & \mathbf{A}_{2} \\ 0 & 0 & \end{array}\right]$  
in which $\lambda_{2}$ is eigenvalue of $\mathbf{A}_{1}$. We can
continue the procedure until matrix
$\mathbf{P}_{n-1} \cdots \mathbf{P}_{1} \mathbf{A} \mathbf{P}_{1}^{\mathrm{T}} \cdots \mathbf{P}_{n-1}^{\mathrm{T}}=\mathbf{P A} \mathbf{P}^{\mathrm{T}}$
is triangular. Notice that each transformation $\mathbf{P}_{i}$ is
orthogonal, therefore the composite transformation $\mathbf{P}$ is also
orthogonal.

From Schur’s Theorem, stated above, we can define Schur’s decomposition,
as follows: Let $\mathbf{A}: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$
be a linear operator in $\mathbb{R}^{n}$ induced by the square matrix A.
Using Theorem ??, we can say that

$$\mathbf{A}=\mathbf{P}^{\mathrm{T}} \mathbf{T} \mathbf{P}$$

is the Schur’s decomposition of $\mathbf{A}$, where $\mathbf{P}$ is
orthogonal and $\mathbf{T}$ is triangular.

Any symmetric matrix can be diagonalized by an orthogonal similarity
transformation.

This proof is a direct consequence of Theorem ??. If A is symmetric,
i.e., $\mathbf{A}=\mathbf{A}^{\mathrm{T}}$, then
$\mathbf{P A} \mathbf{P}^{\mathrm{T}}$ is also symmetric. But the
theorem says that we can choose $\mathbf{P}$ such that
$\mathbf{P A} \mathbf{P}^{\mathrm{T}}$ is triangular. In this case,
$\mathbf{P A} \mathbf{P}^{\mathrm{T}}$ is both triangular and symmetric,
therefore diagonal.

We can rewrite the statement of the corollary above as follows: if
$\mathbf{A}=\mathbf{A}^{\mathrm{T}}$, then $\exists \mathbf{P}$, an
orthogonal transformation, i.e.,
$\mathbf{P}^{\mathrm{T}} \mathbf{P}=\mathbf{I}$, such that
$\mathbf{P A P}^{\mathrm{T}}=\boldsymbol{\Lambda}$. Indeed, the diagonal
matrix $\boldsymbol{\Lambda}$ is similar to matrix $\mathbf{A}$ and
shares its eigenvalues, as stated by Theorem ??. In this case, we have
$\mathbf{A} \mathbf{P}^{\mathrm{T}}=\mathbf{P}^{\mathrm{T}} \boldsymbol{\Lambda}$.
From the definition of eigenvalues and eigenvectors, matrix
$\mathbf{V}=\mathbf{P}^{\mathrm{T}}$ contains the eigenvectors of
$\mathbf{A}$. We can write another corollary summarizing these last
conclusions.

Let $\mathbf{A}: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ be a linear
transformation induced by a symmetric matrix. There exists a set of
orthonormal eigenvectors of $\mathbf{A}$ which span $\mathbb{R}^{n}$.  
The proof is a consequence of the theorem and the corollary. From
$\mathbf{P A} \mathbf{P}^{\mathrm{T}}=\boldsymbol{\Lambda}$, or,
equivalently,
$\mathbf{A} \mathbf{P}^{\mathrm{T}}=\mathbf{P}^{\mathrm{T}} \boldsymbol{\Lambda}$,
and from Theorem ??, matrix $\boldsymbol{\Lambda}$ contains the
eigenvalues of $\mathbf{A}$. We can define
$\mathbf{V}=\mathbf{P}^{\mathrm{T}}$, an orthogonal matrix, which
contains the eigenvectors of $\mathbf{A}$ on its $n$ columns. As
$\mathbf{V}$ is orthogonal, its columns are orthonormal, therefore span
$\mathbb{R}^{n}$.

Notice that the Corollary above is more general than Theorem ??. It
states that any symmetric matrix has a set of orthonormal eigenvectors
which span $\mathscr{U}$, whether its eigenvalues are distinct, or not.

[1] ${ }^{3} \mathbf{P}_{1}=\mathbf{I}-2 \mathbf{h} \mathbf{h}^{\mathrm{T}} /\left(\mathbf{h}^{\mathrm{T}} \mathbf{h}\right)$,
with
$\mathbf{h}=\mathbf{v}_{1}-\left\|\mathbf{v}_{1}\right\|_{2} \mathbf{e}_{1}$
operates such transformation. $\mathbf{P}_{1}$ is orthogonal, i.e.,
$\mathbf{P}_{1}^{\mathrm{T}} \mathbf{P}_{1}=\mathbf{I}$, but it is also
symmetric, therefore $\mathbf{P}_{1} \mathbf{P}_{1}=\mathbf{I}$.

[2] ${ }^{3} \mathbf{P}_{1}=\mathbf{I}-2 \mathbf{h} \mathbf{h}^{\mathrm{T}} /\left(\mathbf{h}^{\mathrm{T}} \mathbf{h}\right)$,
with
$\mathbf{h}=\mathbf{v}_{1}-\left\|\mathbf{v}_{1}\right\|_{2} \mathbf{e}_{1}$
operates such transformation. $\mathbf{P}_{1}$ is orthogonal, i.e.,
$\mathbf{P}_{1}^{\mathrm{T}} \mathbf{P}_{1}=\mathbf{I}$, but it is also
symmetric, therefore $\mathbf{P}_{1} \mathbf{P}_{1}=\mathbf{I}$.