## Unit 3 - Linear Maps on Vector Spaces


### Vector Spaces

Vector spaces are a generalization of the properties of real vectors. Formally, a vector space is a set $V$ together with:
* vector addition $+:VxV \rightarrow V$
* scalar multiplication $ \cdot : \mathbb{R} x V \rightarrow V $
* additive inverse $-: V \rightarrow V $
* zero vector $ 0 \in V$

That satsfies the following axioms:

$\forall x,y,z \in V$ and $\forall a,b \in \mathbb{R}$

1) $+$ is associative: $x+(y+z) = (x+y)+z$
2) $+$ is commutative: $x+y = y+x$
3) $0$ is an additive identity: $x+0 = x = 0+x$
4) $-$ is an inverse for $+$: $x+(-x) = 0 = -x + x $

$<V, + , -, 0>$ is an abelian group

Scalar multiplication respects:

5) vector addition: $a \cdot (x+y) = a \cdot x + a \cdot y$
6) real number addition: $(a + b) \cdot x = a \cdot x + b \cdot x$
7) real number multiplication: $(a \cdot_{\mathbb{R}} b) \cdot_{sm} x = a \cdot_{sm} (b \cdot_{sm} x)$
   NOTE: $a \cdot b$ is multiplication by real numbers (since $a, b \in \mathbb{R}$). Multiplying by $x$ is scalar multiplication since $x \in V$. On the right hand side, the $a \cdot b$ is also scalar multiplication.
8) $1_{\mathbb{R}}$: $1_{\mathbb{R}} \cdot x = x$ NOTE: without this axiom, you could have a vector space where scalar multiplication trivially assigned everything to zero and this guarntees that doesn't happen. 

The remainder of the unit relies on these axioms. 

$\mathbb{R}^2$ is all possible real-valued 2-tuples - a 2-dimensional real coordinate space
$\mathbb{R}^3$ is all possible real-valued 3-tuples - a 3-dimensional real coordinate space
$\mathbb{R}^n$ is all possible real-valued n-tuples - a n-dimensional real coordinate space

Protoypical Example of a Vector Space:

$<\mathbb{R}^n, +_{\mathbb{R}^n}, \cdot_{sm}, -_{\mathbb{R}^n}, 0_{\mathbb{R}}>$

Other Examples of Vector Spaces:

* $\mathbb{R}^{n*m}$ : The set of real $mxn$ matrices is an example of a vector space because you add them, scalar multiply them, you have additive inverses for each of them, and a 0 matrix. Since those operations satisfy the 8 axioms, the set of real $mxn$ matrices is a vector space.
* $L(\mathbb{R}^n \rightarrow \mathbb{R}^m)$ : This is linear maps (functions) from $\mathbb{R}^n to \mathbb{R}^m$. This is also a vector space because it satisfies the axioms. 
* $\mathbb{R}[x]_{<=n}$ : The set of all polynomial functions with real number coefficients where the degree of the polynomial is <= n. 
* $\mathbb{R}[x]$ is an example of a vector space that is not finite dimensional. It is the set of all polynomial functions with real number coefficients. It can be arbitrarily long. 


### Linear Combinations

**Example from Khan Academy**

$v_1, v_2, .. v_n \in \mathbb{R}^n$

A linear combination of the vectors $v$ means we scale these vectors by constants $c \in \mathbb{R}$:

$c_1v_1 + c_2v_2 + ... + c_nv_n$

Example of a Linear Combination

$a = \left(\begin{matrix} 1 \\ 2 \end{matrix}\right)$
$b = \left(\begin{matrix} 0 \\ 3 \end{matrix}\right)$

$3a + 2b = \left(\begin{matrix} 3 \\ 0 \end{matrix}\right)$ is one linear combination of $a$ and $b$.

The set of all linear combinations is the **span**. In this case, Span$(a, b) = \mathbb{R}^2$ but there are many cases where this isn't true. 

A case where this is not true is:
$a = \left(\begin{matrix} 2 \\ 2 \end{matrix}\right)$
$b = \left(\begin{matrix} -2 \\ -2 \end{matrix}\right)$

Here, Span($a,b$) falls along a single line because they are colinear. 

**UCB**

Suppose $V$ is a vector space over $\mathbb{R}$ and $U \subseteq V$

Define 

$Span(U) = {\sum_{k=1}^n a_ku_k: a_k \in \mathbb{R}, u_k \in U$}$ (The set of all linear combinations of vectors in $U$)\

We always have $Span(U) \subseteq V$

Definition - We say "$U$ spans $V$" if $Span(U) = V$

Definition - We say $V$ is finite-dimensional if there exists a finite subset $U$ \subseteq such that $U$ spans $V$

For $U$ to span $V$ there has to be some way to get every element of the vector space by taking linear combination of the elements of $U$.

Examples:

$\mathbb{R}^3$ is finite dimensional because the set of vectors:\
$U$ =
$\Bigg\{
\left(\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}\right),
\left(\begin{matrix} 0 \\ 1 \\ 0 \end{matrix}\right),
\left(\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}\right)
\Bigg\}
$ spans $V$

Every vector $v \in V$ $\left(\begin{matrix} a_1 \\ a_2 \\ a_3 \end{matrix}\right)$ can be expressed as: \
$a_1 \cdot \left(\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}\right) + a_2 \cdot \left(\begin{matrix} 0 \\ 1 \\ 0 \end{matrix}\right) + a_3 \cdot \left(\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}\right)$ (Every vector in $V$ can be writte as a linear combination of elements of $U$ so $U$ spans $V$, $U$ spans $\mathbb{R}^3$, and since a finite set spans $U$, $U$ is finite dimensional.)

Also: 

$\Bigg\{
\left(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}\right)
\left(\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}\right)
\left(\begin{matrix} 0 \\ 1 \\ 0 \end{matrix}\right)
\left(\begin{matrix} 0 \\ 1 \\ 1 \end{matrix}\right)
\left(\begin{matrix} 1 \\ 0 \\ 0 \end{matrix}\right)
\left(\begin{matrix} 1 \\ 0 \\ 1 \end{matrix}\right)
\left(\begin{matrix} 1 \\ 1 \\ 0 \end{matrix}\right)
\left(\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}\right)
\Bigg\}$ spans $V$

Example of a not finite dimensional $V$

$\mathbb{R}[x]$ is not finite dimensional ($\mathbb{R}[x]$ is polynomials in one variable with coefficients in $\mathbb{R}$) 

It contains {$1,x^2,x^3,x^4,x^5,...$} which is an infinite set $\in \mathbb{R}[x]$) - There's no way to get each of these elements as a linear combination from a finite subset of real polynomials. For example, if you have a finite set of polynomials, you can only have up to the maximum degree of those polynomials. So if the maximum degree is 1000, you can never have $x^1001$ as a linear combination of polynomials whose degree is $<1000$

### Linear Independence and Bases

**Khan Academy Example:**

The span of $\Bigg\{
\left[\begin{matrix} 2 \\ 3 \end{matrix}\right],
\left[\begin{matrix} 4 \\ 6 \end{matrix}\right]
\Bigg\}$ is all the points of a single line since [4,6] is a multiple of [2,3]

We say they are colinear so their span reduces to a single line. You can't represent everyting in $\mathbb{R}^2$ with these two vectors. We call this a linearly dependent set. The same goes for any vectors with number of dimensions $\mathbb{R}^n$

**UCB:**

Suppose $V$ is a vector space over $\mathbb{R}$ and that $U \subseteq V$

Define: We say that U is linearly dependent over $\mathbb{R}$ if there exists nonzero $a_1..a_n \in \mathbb{R}$ and nonzero $u_1..u_m \in U$ with $\sum_{k=1}^na_ku_k = 0$

This means there is some nonzero linear combination of these vectors that sum to 0

Define: We say $U$ is linearly independent over $\mathbb{R}$ if $U$ is not linearly dependent, equivalently:

* The usable definition for linearly independent is: Whenever we have $0 = \sum_{k=1}^na_ku_k = 0$ for nonzero $u_1..u_n \in U$ this implies a_k = 0 for $k=1..n$

**Khan Academy Example:**

Is the following set of vectors $U$ linearly dependent or linearly independent?

$U$ = $\Bigg\{
\left[\begin{matrix} 2 \\ 1 \end{matrix}\right],
\left[\begin{matrix} 3 \\ 2 \end{matrix}\right]
\Bigg\}$

For $U$ to be linearly dependent, there must be some nonzero $c_1, c_2$ such that

$c_1 \cdot \left[\begin{matrix} 2 \\ 1 \end{matrix}\right] + c_2 \cdot \left[\begin{matrix} 3 \\ 2 \end{matrix}\right] = 0$

If the only way to satisfy the above equation is to set $c_1, c_2$ to zero, then they are linearly independent. 

$2c_1 + 3c_2 = 0$\
$c_1 + 2c_2 = 0$

$c_1 + 3/2c_2 = 0$ (multiply top equation by $1/2$)\
$c_1 + 2c_2 = 0$

$-1/2c_2 = 0$ (subtract the bottom equation from the top)\
$c_2 = 0$

$c_1 + 2(0) = 0 $ (substitute $c_2$ into the original equation)\
$c_1 = 0$

$U$ is linearly independent because the only solution to this equation requires $c_1, c_2$ to be zero. This also means that Span($U$) $= \mathbb{R}^2$

#### Bases
Define: $U$ is a basis for $V$ if:
1) $U$ spans $V$
2) $U$ is linearly indepndent over $\mathbb{R}$

#### What are we saying with all this?
* Linear Combinations are a way to combine $n$ vectors to produce another vector
* Spans define reachability criteria and address the question - Can we get all elements in the vector space?
* Linear independence says that there's only one way to get the 0 vector by taking a linear combination of the set.  
* If we have a set that meets both span and linear independence, it's a bases.

Examples:

**Example of linearly dependent**

$\Bigg\{
\left(\begin{matrix} 1 \\ 1 \end{matrix}\right),
\left(\begin{matrix} 0 \\ 1 \end{matrix}\right),
\left(\begin{matrix} 1 \\ 0 \end{matrix}\right)
\Bigg\} = U \subseteq \mathbb{R}^2$ is linearly dependent. For this to be true, we need to be able to write the 0 vector as a combination of these three vectors using non-zero coefficients. One way to do that is:

$1 \cdot \left(\begin{matrix} 1 \\ 1 \end{matrix}\right) + (-1) \cdot \left(\begin{matrix} 0 \\ 1 \end{matrix}\right) + (-1) \cdot \left(\begin{matrix} 1 \\ 0 \end{matrix}\right)  = 0 $

**Example of linearly independent**

$\Bigg\{
\left(\begin{matrix} 1 \\ 1 \end{matrix}\right),
\left(\begin{matrix} 0 \\ 1 \end{matrix}\right)
\Bigg\} = U \subseteq \mathbb{R}^2$ is linearly independent. How can we show that?

Suppose we have a linear combination of these and it's equal to zero

Suppose $a_1 \cdot \left(\begin{matrix} 1 \\ 1 \end{matrix}\right) + a_2 \cdot \left(\begin{matrix} 0 \\ 1 \end{matrix}\right) = \left(\begin{matrix} a_1 \\ a_1 + a_2 \end{matrix}\right) = \left(\begin{matrix} 0 \\ 0 \end{matrix}\right)$

We can solve this small linear system and find that $a_1 = 0$ and $a_2 = 0$. This meets the definition of linearly independent because the definition states that any time we get 0 as a linear combination of vectors, then the only way for that to happen is for the coefficients to be zero. 

$\Bigg\{
\left(\begin{matrix} 1 \\ 1 \end{matrix}\right),
\left(\begin{matrix} 0 \\ 1 \end{matrix}\right)
\Bigg\} = U \subseteq \mathbb{R}^2$ is a bases for $\mathbb{R}^2$

To show that, let $\left(\begin{matrix} a_1 \\ a_2 \end{matrix}\right) \in \mathbb{R}^2$ and now we need to show we can get $\left(\begin{matrix} a_1 \\ a_2 \end{matrix}\right)$ as a linear combination of $\Bigg\{
\left(\begin{matrix} 1 \\ 1 \end{matrix}\right),
\left(\begin{matrix} 0 \\ 1 \end{matrix}\right)
\Bigg\}$ 

$a_1 \cdot \left(\begin{matrix} 1 \\ 1 \end{matrix}\right) + (a_2 - a_1) \cdot \left(\begin{matrix} 0 \\ 1 \end{matrix}\right) = \left(\begin{matrix} a_1 \\ a_2 \end{matrix}\right)$ - This shows that any vector in $\mathbb{R}^2$ can be written as a linear combination of the vectors $[1,1]$ and $[0,1]$ therefore this set spans the whole vector space and is also a bases. 



### The Dimension Theorem

A fundamental result that says that the cardinality of any two bases of a vector space is the same. 

Theorem - Any two bases of a vector space have the same cardinality.

Definition (based on the theorem) - The dimension of a vector space is the unique cardinality of any basis. (This says that bases for a given vector space must have the same number of elements - you can't have one base that has 2 elements and another that has 3)

Example - The dimension of $\mathbb{R}^n$ is $n$ since {$e_1, e_2,... e_n$} is a basis which has $n$ elements. So, every basis of $\mathbb{R}^n$ has $n$ elements. 

**Proof Sketch:**

Lemma: If $T$ is linearly independent over $\mathbb{R}$ and $S$ spans $V$ and $T, S \subseteq V$ the $|T| <= |S|$ (cardinality of T <= cardinality of S)

Enumerate $T = t_1, t_2, t_3, ...$ (not assuming this is finite)

Since $S$ spans $V$, we can write:

$t_1 = \sum_{k=1}^ma_ks_k$ with each $s_k \in S$ and each $a_k != 0 \in \mathbb{R}$

Solve for $s_1$ 

$s_1 = \frac{t_1 - \sum_{k=1}^ma_ks_k}{a_1}$

Bucket analogy - an injective function from $T$ into $S$ - For each $t_i$ I can always find an $s_i in S$ but not in $T$ such that when I write $t_i = \sum_{k=1}^na_ku_k$ with $u_k \in S'$ then there is some $s_i \in S$ but not in T that I can pop out of the bucket.

TODO - ? Explain this better

Now, we need to prove the dimension theorem from the above Lemma. Suppose $B_1, B_2$ are both bases. The Lemma says that the Linearly Independent set can never be bigger than the Spanning set. 

We can think of this in 2 ways:
1) $B_1$ is linearly independent and $B_2$ spans $V$ which implies $|B_1 <= |B_2|$
2) $B_2$ is linearly independent and $B_1$ spans $V$ which implies $|B_2 <= |B_1|$

**The Main Idea** - For a given vector space, the number of elements in a basis is unique and is an invariant of that vector space. 

#### Representations of Linear Maps

We will take a linear map of a vector space and represent it as a matrix. Representations of linear maps of vector spaces depend on a basis. We also had to derive the dimension theorem so we would know that the size of our matrix is fixed. 

Suppose $U, V$ are vector spaces over $\mathbb{R}$

Definition: A linear map (aka. function) from $U$ to $V$ is a function $T: U \rightarrow V$ satisfying the linear condition for all $u_1, u_2 \in U$ and $a \in \mathbb{R}$:
* $T(u_1 + u_2) = T(u_1) + T(u_2)$
* $T(au_1)  = aT(u_1)$

The set of all linear maps from $U$ into $V$ is denoted by $L(U,V)$

In Unit 1, we saw that if $U = \mathbb{R}^n$ and $W = \mathbb{R}^m$ then every linear map in $L(\mathbb{R}^n,\mathbb{R}^m)$ can be represented uniquely as a matrix $A \in \mathbb{R}^{m*n}$

* For $\mathbb{R}^n$, we have a natural notion of a basis ($e_1, e_2, e_3, ... e_n$) which have nice geometric properties but we can't quite use those properties yet. For now, we stick with addition and scalar multiplication.

Theorem:

If $u_1, u_2, ... u_n$ is a basis for $U$ and $w_1, w_2,... w_n$ is a basis for $W$ and $T \in L(U, W)$ ($T$ is a linear map from $U$ into $W$) then $T$ has a matrix representation in the bases $u_1, u_2, ... u_n$ $w_1, w_2,... w_n$ denoted $M(T, (u_1, u_2, ... u_n), (w_1, w_2,... w_n)) \in \mathbb{R}^{m*n}$

* This is analagous to the situation from Unit 1 where the vector spaces were $\mathbb{R}^n$ and $\mathbb{R}^m$ and we got an $\mathbb{R}^{m*n}$ matrix

The coefficients are $[a_{ij}]_{i=1..m,j=1...m}$ where $T(u_j) = a_{ij}w_i$
* This is slightly different than the definition for real matrices we covered in Unit 1: $a_{ij} = <T(u_j),w_i>$ 
* The Unit 1 definition is not defined yet because we haven't defined inner product on an arbitrary vector space

**The Main Idea** - If you have any linear map (function) between two vector spaces and the vector spaces are finite dimensional then, with respect to any fixed bases of those vector spaces, there is a matrix representation of that linear map. 

#### Differentiation as a Linear Map

We will construe the calculus differentiation operator as a linear map and represent it as a matrix to show an application of the representation theorem for linear maps

Let $V = \mathbb{R}[x]_{<=3}$ = The set of all polynomials in one variable with coefficients in $\mathbb{R}$ with degree $<= 3$ (no exponents larger than 3). It can be represented by the set:
* {$a_0 + a_1x + a_2x^2 + a_3x^3 : a_0, a_1, a_2, a_3 \in \mathbb{R} $}
* This is a vector space because:
  * It contains the zero vector
  * The result of addition lies within the space (additive inverse)
  * The result of scalar multiplication lies within the space (scalar multiplication)
  
What is an examle of a basis for this vector space?

Claim: {$1,x,x^2,x^3$} is a basis for $V$. Why?

Need to show 2 things:
1) That the set spans $V$ (i.e. every polynomial can be written as a linear combination of the elements). This is obvious by the definition. We can get any degree <= 3 polynomial by multiplying each term by its coefficient.
2) That the elements are linearly independent. For any linear combination of these elements to equal 0, the only way for that to happen is by setting all coefficients equal to 0. That's also true in this case because if any of the coefficients are non-zero, you won't get the zero polynomial. 

Let $W = \mathbb{R}[x]_{<=2}$

Define $T:V \rightarrow W$ by $T(P(x)) = \frac{d}{dx}P(x)$
* $T$ is the derivative of $P(x)$

I claim $T \in L(V, W)$ ($T$ is a linear function from $V \rightarrow W$) so $T$ must preserve addition and scalar multiplication. We need to check to see if that's true

* Addition
  * $T(P(x) + Q(x)) = \frac{d}{dx}(P(x) + Q(x)) = \frac{d}{dx}P(x) + \frac{d}{dx}Q(x) = T(P(x)) + T(Q(x))$ (since the derivative operator respects sums)
* Scalar Multiplication
  * $T(aP(x)) = \frac{d}{dx}aP(x) = a \frac{d}{dx}P(x) = aT(P(x))$ (since the derivative respects multiplication)

So, $T$ is a linear map. 

What is the matrix of $T$ in the bases {$1, x, x^2, x^3$} = {$v_1, v_2, v_3, v_4$} of $V$ and {$1, x, x^2$} = {$w_1, w_2, w_3$} of $W$?

Use the definition of matrix construction: $T(v_j) = \sum_{i=1}^ma_{ij}w_i$ (so we will write out what $T$ does to a basis for $v$ in terms of the basis for $w$ and from that set of equations, we can read off the coefficients of the matrix)

$T(v_1) = T(1) = \frac{d}{dx}1 = 0$ (since the derivative of 1 is 0)
* There's only one way to write 0 as the linear combination of $a_{ij}w_i$: set all $a_{ij}$ elements to 0. 

$T(v_2) = T(x) = \frac{d}{dx}x = 1$ = $w_1$ (so $a_{ij} = 1$)

$T(v_3) = T(x^2) = \frac{d}{dx}x^2 = 2x = 2w_2$ (so $a_{ij} = 2$)

$T(v_4) = T(x^3) = \frac{d}{dx}x^3 = 3x^2 = 3w_3$ (so $a_{ij} = 3$)

Since $T: V \rightarrow W$, $T$ will be a 3x4 matrix. $T(v_1)$ is going to give us the first column of the matrix. $T(v_2)$ gives the second column and so on.

$M(T) = \left[\begin{matrix} 0&1&0&0 \\ 0&0&2&0 \\ 0&0&0&3 \\ \end{matrix}\right]$

This is the matrix of the differentiation operator. Let's check this actually works.

$P(x) = 7 + 5x - 2x^2 + x^3$

Instead of taking the derivative using calculus, we can use linear algebra and the matrix we just created. We represent the polynomial above as a vector of coefficients. 

$\left[\begin{matrix} 0&1&0&0 \\ 0&0&2&0 \\ 0&0&0&3 \\ \end{matrix}\right]
* 
\left(\begin{matrix} 7 \\ 5 \\ -2 \\ 1 \end{matrix}\right)
= 
\left(\begin{matrix} 5 \\ -4 \\ 3 \end{matrix}\right)
$

$\frac{d}{dx}P(x) = 5 - 4x + 3x^2$

Another Example

$S: W \rightarrow V$ by $S(q(x)) = \int q(x)dx$

\* is indeterminate

$M(S) = \left[\begin{matrix} *&*&* \\ 1&0&0 \\ 0&\frac{1}{2}&0 \\ 0&0&\frac{1}{3} \end{matrix}\right]$

Note: $T \circ S = I_w = \left[\begin{matrix} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{matrix}\right]$

The composition of $T$ and $S$ is the identity matrix of $W$. This is the Fundamental Theorem of Calculus (If you first integrate, then differentiate, you get back the original result).

$S \circ T = \left[\begin{matrix} 0&*&*&* \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1 \end{matrix}\right]$

If you start with a function, take it's derivative and then take its anti-derivative, you don't get back the original function because of the indeterminants. You lose the constant terms. 

**The Main Idea** - This example shows how you can use linear algebra to do basic calculus operations and it shows how to represent a linear map (function) as a matrix with respect to a basis.