## Unit 4 - Geometry & InnerProduct Spaces

### Vector Geometry

Basic Properties of Inner Product Spaces
Suppose $V, <\cdot,\cdot>$ ($V$ together with an inner product) is an inner product space over $\mathbb{R}$. Then the following results hold:

1) Norm Axioms for $|| \cdot ||= \sqrt{<\cdot,\cdot>}$ (square root as an inner product of something and itself) ($||\cdot||$ is the norm function)

* $\forall x,y \in V, \forall a \in \mathbb{R}$
  * $|| x || > 0$ for $x != 0 $
  * $|| ax || = |a| \cdot ||x||$
  * $|| x + y || <= ||x|| + ||y||$
     
2) Cauchy-Schwartz Inequality:

* $||<x,y>|| <= ||x|| \cdot ||y||$

3) Parallelogram Law

* $ ||x + y||^2 + ||x-y||^2 = 2(||x||^2 + ||y||^2)$
* Called the parallelogram law because you can draw a parallelogram between two vectors 
* Important theoretically to distinguish norms that come from inner products vs. norms that don't come from inner products - the norm comes from an inner product iff it satisfies the parallelogram law

4) Pythagorean Theorem (linear algebrea version)
* If $<x,y> = 0$ then $||x-y||^2 = ||x||^2 + ||y||^2 = ||x+y||^2$
* The inner product of two vectors is 0 if they are at right angles to one another


### Inner Product Spaces and Examples

An inner product space is a vector space together with an inner product function $< \cdot, \cdot>: V \rightarrow \mathbb{R}$ that satisfies three axioms. 

1) Symmetry: $<u,v> = <v,u>$
2) Linearity (in the 1st argument): $<u_1+u_2, v> = <u_1,v> + <u_2,v>$ and $<au,v>=a<u,v>$
3) Positive Definiteness: $<u,u> >= 0$ and  $<u,u> = 0$ iff $u = \vec{0}$

From any inner product, we can define notions of length, distance and directional correlation (similarity)

* length(u) $= \sqrt{<u,u>}$
* distance(u,v) $= \sqrt{<u-v,u-v>}$
* directional_correlation(u,v) $= \frac{<u,v>}{ \sqrt{<u,u><v,v>}}$

Examples:

$\mathbb{R}^n$ prototypical example
* $<x,y> = x^Ty = \sum_{k=1}^nx_ky_k$
* This is the definition used in the first unit to define inner product, matrix vector product, matrix multiplication, etc.

We can also define for any diagonal matrix $D$, another inner product on $\mathbb{R}^n$
* $<x,y> = x^TDy = \sum_{k=1}^nx_kd_{kk}y_k$
* In data science, you use this version all the time since this give a notion of distance. Many times in data science, you have pieces of data that don't have the same weight. In a housing data set, when predicting prices, maybe the location feature should be weighed more highly that the other features. This method basically uses linear algebra to apply a diagonal matrix to increase/decrease the weights of the features in the data set. 

In $\mathbb{R}[x]$ a typical inner product is

* $<p(x), q(x)> = \int_{-1}^{1}p(x)q(x)dx$
* This inner product satisfies the three axioms

In $\mathbb{R}^{n*n}$:

* $<A,B> = Trace(A^TB)$

### Orthonormal Sets of Vectors

This section describes a process for taking any basis in an inner product space and converting it to an orthonormal space that has similar geometric properties to the standard basis in $\mathbb{R}^n$

Let $V,<\cdot,\cdot>$ be an inner product space. A set of vectors $u_1..u_m$ are called orthogonal if $<u_i, u_j> = 0$ for $i<>j$ (a set of vectors that are pairwaise perpendicular to each other). We say $u_1..u_m$ are orthonormal if $<u_i,u_i> = 1 $ for each $i=1..m$.

Example: In $\mathbb{R}^n$, $e_1,..e_n$ are orthonormal. They are orthonormal because if you take the inner product of any two of them (pairwise with a different subscript), the result is always 0. If you take them pairwise with the same subscript, the result is always 1. 

Theorem: Given any set of vectors $v_1..v_n$ for an inner product space $V$, there exists a separate set of vectors $u_1..u_n$ such that $u_1..u_n$ are orthonormal and Span($u_1..u_n$) = Span($v_1,v_n$).

Proof:

Gram-Schmidt Process (Algorithm)

```
Input v_1..v_n
Set w_1 = v_1
For k=2...n
    Set w_k = v_k - (Sum from j=1 to k ( (<v_j, w_j> / <v_j, v_j>) * v_j ))
    # this is the orthogonalization step
For k=1...n
    Set u_k = w_k / (|| w_k |||)
    # this is the orthornormalization step
Output u_1...u_n

# this will always result in a set of vectors that is orthonormal to the original set

```

Example: 

$V = \mathbb{R}[x]_{<=2}$ (Polynomials with degree <= 2)

$<p(x),q(x)> = \int_{0}^{1}p(x)q(x)dx$

Starting basis: {$1,x,x^2$} - this is not orthogonal or orthonormal

Orthogonalization step works out to: {$1,x-\frac{1}{2},x^2-x-\frac{1}{6}$}

Orthonormalization step: {$1, 2\sqrt{3}(x-\frac{1}{2}), 6\sqrt{5}(x^2-x-\frac{1}{6}) $

### Adjoints

Discuss an inner product space application called an adjoint, a generalization of the notion of a transpose of a linear map between two vector spaces.

Let $V, W$ be inner product spaces and let $T \in L(V,W)$ ($T$ is a linear map from V to W). 

We want to define a new linear map that's the transpose of $T$. We don't want to define $T$ in terms of its matrix coefficients, we want to define $T$ in terms of its properties.

Notice: For every $w \in W$, the function $<w, \cdot>: W \rightarrow \mathbb{R} \in L(W,\mathbb{R})$. The function $<w, \cdot> is a linear map from $W$ to $\mathbb{R}$

Since $T$ is a function from $V$ to $W$. If we compose $<w, \cdot>$ with $T$, we get a function (linear map) from $V$ to $\mathbb{R}$:

$<w, \cdot> \circ T:V \rightarrow \mathbb{R} \in L(V, \mathbb{R})$

We know, from earlier lessons on linear maps, that every linear map from $V \rightarrow \mathbb{R}$ can be represented as an inner product. It has the form:

$<v,\cdot>$ for some $v \in V$

So, we can use these two functions, $<w, \cdot> \circ T$ and $<v,\cdot>$ to define the adjoint of $T$ by 

$T^*: W \rightarrow V$

$T^*(w)$ = the $v \in V$ such that $<w, \cdot> \circ T = <v, \cdot>$

Another way to say this is: $<w, T(u)> = <v,u> $for every $u \in V$

**Properties of Adjoints**

Given any bases  $v_1..v_n$ of $V$ and $w_1..w_n$ of $W$:
* $M(T,\vec{v}, \vec{w}) = [M(T^*, \vec{w},\vec{v})]^{Transpose}$

**Special Cases**

Definition: A linear map $T \in L(V)$ is called self-adjoint if $T=T^*$

Note: The matrix of a sefl-adjoint linear map is symmetric - the matrix is equal to it's transpose.

**Spectral Theorem**

If $T \in L(V)$ is self-adjoint, then $V$ has an orthonormal basis consisting of eigenvectors of $T$.

Note: Relative to this basis, $M(T)$ is diagonal (with the eigenvalues on the diagonal)

### Singular Value Decomposition

One of the most important decompositions that applies to linear maps on inner product spaces. It can be applied to any linear map on an inner product space and relies on adjoints. 

Let $T \in L(V)$. $T$ is called positive if $T$ is self-adjoint (its matrix is symmetric) and all of its eigenvalues are positive. 

Def: For $T$ positive, define $\sqrt{T}$ to be the unique positive linear map $R \in L(V)$ such that $R \circ R = T$

Example:

$\sqrt{\left[\begin{matrix} 1 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 9 \end{matrix}\right]} = \left[\begin{matrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{matrix}\right]$ 

This is a positive matrix because all of the eigenvalues are positive

Fact: For any $T \in L(V)$, the linear operator $T^* \circ T$ is positive. $T^*$ is the adjoint of $T$. Composing $T$ with its adjoint always gives a positive matrix.

Definiton: A linear map $S \in L(V)$ is called an isometry if $||Sv|| = ||v||$ for every $v \in V$. An isometry is a linear operator that never changes the norms of any vector. It never changes it's distance from the origin. Isometries have many other properties:
* Preserves orthonormal bases - It doesn't the fact that the original bases are perpendicular to each other
* $S^*S = I = SS^*$ (This means the adjoint of an isometry is also its inverse)

Definition: Singular Values of $T$ are the eigenvalues of $\sqrt{T^*T}$



**Single Value Decomposition Theorem**: 

Every linear map $T \in L(V)$ can be factored as $T=U \sum V$ where $U$ and $V$ are isometries and the matrix of $\sum$ is diagonal with singular values of $T$ on the diagonal