## Unit 1 - Linear Functions

### What is a Function?

Definition of a function - A function $f$ is a rule that assigns to each input $x$ in its domain an output $f(x)$ in its codomain

**Terms:**

* Domain - Set of all inputs
* Codomain - Superset of all outputs
* Range - Set of all outputs

**Notation:**
    
* $ f: \mathbb{R}^n \rightarrow \mathbb{R}^m $
 * $\mathbb{R}^n = $ Domain of $f$ 
 * $\mathbb{R}^m = $ Codomain of $f$
 
 $\mathbb{R}$: real numbers \
 $\mathbb{R}^n$: n-dimensional vector space. For example $[3,5,8]$ is a 3-element vector. $[3,5,8] \in \mathbb{R}^3$

**Formally:** 

A function is a triple <$f$, D, C> where:

D = domain\
C = codomain\
$ f \subseteq D * C $ satisfying the function property $ (\forall x \in D) (\exists! y \in C)$ such that $(<x,y> \in f)$

Meaning - f is an element of DxC satisfying the function property that for every x in D, there exists a unique y in C such that the pair (x,y) is in f. 

**Representing a function:**

Formulas:\
 $f(x) = x^2$\
 $y = x^2$\
 $x \rightarrow x^2$\
 $x^2$
 
Words:\
Multiply the input by itself and output the result

Sets:\
$ { (x,y): y = x^2} $

Graphs:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# create 1000 equally spaced points between -10 and 10
x = np.linspace(-10, 10, 100)

# calculate the y value for each element of the x vector
y = x**2

fig, ax = plt.subplots()

# graph spines
ax.spines['left'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_position('zero')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

ax.plot(x, y); # the semicolon supppresses the text output of from matplotlib

### What is a Linear Function?

In single variable calculus, a "linear" function has the form:\
$y=mx +b$\
Where m = slope and b = y-intercept

In [None]:
m = 1
b = 0
y = m*x + b

fig, ax = plt.subplots()

# graph spines
ax.spines['left'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_position('zero')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

ax.plot(x, y)
plt.plot(0, 0, marker='o', markersize=10, color="red")
ax.text(.5, .5, 'b', size=15)

plt.plot(3, 0, marker='o', markersize=10, color="red")
ax.text(3.5,0.5, 'x1', size=15)

plt.plot(0, 3, marker='o', markersize=10, color="red")
ax.text(0.5,3.5, 'y1', size=15)

plt.plot(3, 3, marker='o', markersize=10, color="red")
ax.text(3.5,3.5, '(x1, y1)', size=15);

plt.plot(7, 0, marker='o', markersize=10, color="red")
ax.text(7.5,0.5, 'x2', size=15)

plt.plot(0, 7, marker='o', markersize=10, color="red")
ax.text(0.5,7.5, 'y2', size=15)

plt.plot(3, 3, marker='o', markersize=10, color="red")
ax.text(3.5,3.5, '(x1, y1)', size=15);

plt.plot(7, 7, marker='o', markersize=10, color="red")
ax.text(7.5,7.5, '(x2, y2)', size=15);

# plot the rise and run
plt.plot([3,3], [3,7])
plt.plot([3,7], [7,7])

ax.text(10,10, 'y = mx + b', size=15);

m = rise / run\
$m = (y2 - y1) / (x2 - x1)$

Another property of "linear" functions - line segments map proprtionally to line segments - a point 2/3 of the way between segment (x1,x2) will be 2/3 of the way between (y1,y2)

In [None]:
m = 1
b = 0
y = m*x + b

fig, (ax, ax2) = plt.subplots(2,1)

# graph spines
ax.get_yaxis().set_visible(False)
ax.spines['left'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_position('zero')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('none')

ax.set_xlim([-10, 10])

ax.plot(3, 0, marker='o', markersize=10, color="red")
ax.text(3.5,0, 'x1', size=15)

ax.plot(7, 0, marker='o', markersize=10, color="red")
ax.text(7.5,0, 'x2', size=15)

ax.plot([3,7], [0,0], color="red", linewidth=6)

ax.text(12,0, 'x', size=15);

# --------------

# graph spines
ax2.get_yaxis().set_visible(False)
ax2.spines['left'].set_color('none')
ax2.spines['right'].set_color('none')
ax2.spines['bottom'].set_position('zero')
ax2.spines['top'].set_color('none')
ax2.xaxis.set_ticks_position('bottom')
ax2.yaxis.set_ticks_position('none')

ax2.set_xlim([-10, 10])

ax2.plot(7, 0, marker='o', markersize=10, color="red")
ax2.text(7.5,0, 'y2', size=15)

ax2.plot(3, 0, marker='o', markersize=10, color="red")
ax2.text(3.5,0, 'y1', size=15)

ax2.plot([3,7], [0,0], color="red", linewidth=6)

ax2.text(12,0, 'y', size=15);

Generalize this property to higher dimensions

**Definition of Affine**

A function $ f: \mathbb{R}^n \rightarrow \mathbb{R}^m $ is called affine if for every $u,v \in \mathbb{R}^n$ and for every $ t \in [0,1]$\
$f((1-t)u + tv) = (1-t)f(u) + tf(v)$

![Affine](static/unit1_affine.png)

$ f(1/3u + 2/3v) = 1/3f(u) + 2/3f(v)$

**Definition of Linear**

A function $ f: \mathbb{R}^n \rightarrow \mathbb{R}^m $ is called linear if $f(u + v) = f(u) + f(v)$ and $f(au) = af(u)$ for every $u,v \in \mathbb{R}^n$ and for every $a \in \mathbb{R}$

This is referred to later as **preservation of addition and scalar multiplication** (since a is scalar)

Excercise: Show that $f$ is linear iff $f$ is affine and $f(0) = 0$

**Is $ f(x) = x^2$ affine? No**

$u = 2$\
$v = 4$\
$t = .75$

$f(.25*2 + .75*4) = .25*f(2) + .75*f(4)$\
$f(3.5) = .25*4 + .75*16$\
$12.25 != 13$

**Is it linear? No**\
$f(2 + 4) = $f(2) + f(4)\
$36 != 20$

**Is $ f(x) = 1/2x + 2$ affine? Yes**

$u = 2$\
$v = 4$\
$t = .75$

$f(.25*2 + .75*4) = .25*f(2) + .75*f(4)$\
$f(3.5) = .25*3 + .75*4 $\
$3.75 = 3.75$

**Is it linear? No (it's true, it's not linear by the definition above)** \
$f(2+4) = f(2) + f(4)$\
$5 != 3 + 4$

$f(0) = 2$

### Affine Functions as Linear Functions

* Linear algebra focuses on linear functions
* We can use linear algebra to study affine functions
* An affine function is just like a linear function except it doesn't satisfy the property that $f(0) = 0$ (an affine function is kind of like linear function plus a constant - $f(0)$ is some constant instead of 0)

1) If $ g: \mathbb{R}^n \rightarrow \mathbb{R}^m $ is a fixed affine function, then $g(x) - g(0)$ is a linear function. We can apply linear algebra theorems to $g(x)-g(0)$ then add back $g(0)$.
2) For the class of affine functions from $\mathbb{R}^n \rightarrow \mathbb{R}^m$: This class is isomorphic to the class of linear functions from $\mathbb{R}^n+1 \rightarrow \mathbb{R}^m$

**The translation**\
$g:\mathbb{R}^n \rightarrow \mathbb{R}^m$ is affine\
implies: $g(x) = l(x) + c$ (Note: $x \in \mathbb{R}^n)$\
therefore: $f(x,y) = l(x) + cy$ (Note: $x \in \mathbb{R}^n, y \in \mathbb{R})$\

Using the earlier example of an affine function\
$g(x) = 1/2x + 2$ is affine\
$l(x) = 1/2x$\
$c = 2$\
therefore:\
$f(x,y) = 1/2x + 2y$\
$f(0,0) = 2$ y-intercept


### Operations on Real Matrices

Definition: A real matrix is a rectangular array of real numbers 

m rows\
n columns

\begin{bmatrix}
 a_{11} & a_{12} & ... & a_{1n}\\
 a_{21} & a_{22} & ... & a_{2n}\\
 a_{m1} & a_{m2} & ... & a_{mn}\\
\end{bmatrix}

If $A \in \mathbb{R}^{n*m}$, we sometimes write $A = [a_{ij}]_{j=1...n}^{i=1...m}$\
Each $a_{ij}$ is an entry of $A$ representing the number in the ith row and jth column

### Primary Operations on Matrices

#### Matrix Addition

Addition is done element-wise 

$
\left[\begin{matrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{21} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{matrix}\right]
+ 
\left[\begin{matrix} b_{11} & b_{12} & b_{13}\\ b_{21} & b_{21} & b_{23}\\ b_{31} & b_{32} & b_{33}  \end{matrix}\right]
= 
\left[\begin{matrix} a_{11}+b_{11} & a_{12}+b_{12} & a_{13}+b_{13}\\ a_{21}+b_{21} & a_{21}+b_{22} & a_{23}+b_{23}\\ a_{31}+b_{31} & a_{32}+b_{32} & a_{33}+b_{33}  \end{matrix}\right]
$

**Caveat:** You can't add matrices of different dimensions

Example\
$
\left[\begin{matrix} 1 & 2 & 3\\ 4 & 5 & 6\\ 7 & 8 & 9 \end{matrix}\right]
+ 
\left[\begin{matrix} 0 & 1 & 0\\ 1 & 0 & 1\\ 0 & 1 & 0 \end{matrix}\right]
= 
\left[\begin{matrix} 1 & 3 & 3\\ 5 & 5 & 7\\ 7 & 9 & 9 \end{matrix}\right]
$


In [None]:
# Matrix addition in Python
import numpy as np

a = [[1,2,3],[4,5,6],[7,8,9]]
b = [[0,1,0],[1,0,1],[0,1,0]]

c = np.add(a,b)
c


#### Matrix Scalar Multiplication

Also performed element-wise similar to addition

$2 * \left[\begin{matrix} 0 & 1 & 0\\ 1 & 0 & 1\\ 0 & 1 & 0 \end{matrix}\right] 
= \left[\begin{matrix} 0 & 2 & 0\\ 2 & 0 & 2\\ 0 & 2 & 0 \end{matrix}\right] $

In [None]:
# Scalar multiplication in Python
import numpy as np

a = 2
b = [[0,1,0],[1,0,1],[0,1,0]]

c = np.multiply(a,b)
c

### Summation and Applications

Definition:

$\sum_{k=1}^n a_k = a_1 + a_2 + ... a_n$

More formally: Given a sequence <$a_k$>, Let $m, n$ be integers, we'll define $\sum_{k=m}^n a_k$ by induction on n.

Base Case: if $m > n$, then $\sum_{k=m}^n a_k = 0$\
Inductive Case: Assume $m>=n$ and that $\sum_{k=m}^n a_k$ is defined\
Define: $\sum_{k=m}^{n+1} a_k$ = $\sum_{k=m}^n a_k + a_{n+1}$

**Applications:**

**Inner product** on $\mathbb{R}^n$ (multiply element-wise and sum the results)\
<$u,v$> = $\sum_{k=1}^n u_k * v_k$

Example of inner product:\
$u = [1,2,3]$\
$v = [0,7,0]$\
$<u,v> = (1*0) + (2*7) + (3*0) = 14$

The inner product results in a scalar value.

**Properties of Linear Functions (linear functions preserve additivity)** \
If $u_1, u_2, ... u_p \in \mathbb{R}^n$ and $f$ is a linear function with domain $\mathbb{R}^n$ then:\
$f(\sum_{k=1}^p u_k) = \sum_{k=1}^p f(u_k)$

**Vector Components - Important** \
for $i=1...n$ define $e_i$ as the vector in $\mathbb{R}^n$ with 1 in the ith position and 0 elsewhere.\
$e_1 = [1,0,0,0...n]$\
$e_2 = [0,1,0,0...n]$\
$e_n = [0,0,0,0...1]$

With this definition, we can do 2 things:
1) For any $x \in \mathbb{R}^n$, we can write $\vec{x} = \sum_{k=1}^n x_k*\vec{e_k}$
   * $\vec{e_k}$ is a vector
   * $x_k$ is the inner product defined below
   * Takeaway: A vector, $\vec{x}$, can be rewritten as the sum of each of its coefficients ($x_k$) multiplied by a basis vector ($e_i$)
2) $x_k = <x, e_k>$
   * This allows you to get each individual vector element
   * Takeaway: If you have a vector, $\vec{x}$, and want to get the kth element, you can express that through the inner product with $e_i$


A very simple example:\
$ x = [1,2,3] $\
$ n = 3 $\
$ e_1 = [1,0,0]$\
$ e_2 = [0,1,0]$\
$ e_3 = [0,0,1]$

$ x_1 = <x, e_1> = (1*1) + (2*0) + (3*0) = 1$\
$ x_2 = <x, e_2> = (1*0) + (2*1) + (3*0) = 2$\
$ x_3 = <x, e_3> = (1*0) + (2*0) + (3*1) = 3$

$ \vec{x} = (1 * [1,0,0]) + (2 * [0,1,0]) + (3 * [0,0,1])$\
$ \vec{x} = [1,0,0] + [0,2,0] + [0,3,0]$\
$ \vec{x} = [1,2,3]$

In [None]:
# Inner product of two vectors in Python
import numpy as np

a = [1,2,3]
b = [0,7,0]

c = np.inner(a,b)
c


### Vectors as Single-Valued Linear Functions

Theorem: Every linear function $f: \mathbb{R}^n \rightarrow \mathbb{R} $ can be **uniquely** represented as: \
$f(u) = <a,u>$ (inner product) for some $a \in \mathbb{R}^n$

Earlier we saw that a linear function is any function that preserves addition and scalar multiplication. 

This theorem says that every linear function can be succinctly represented by one unique, specific vector in $\mathbb{R}^n $ (i.e. there's a one-to-one correspondence between vectors and linear functions)

Proof Sketch:
* Since $f$ is linear, $f(u+v) = f(u) + f(v)$ and $f(au) = af(u)$ for all $u,v \in \mathbb{R}^n$ and for all $a \in \mathbb{R}$
* Define $a_i = f(e_i)$ for i=1...n
* Claim that the vector $\vec{a} = [a_1, a_2...a_n]$ represents $f$
* Need to show that $f(u) = <a,u>$ for every $u \in \mathbb{R}^n$
* Let $u \in \mathbb{R}^n$
* Express $u$ as $ u = \sum_{k=1}^n u_k*\vec{e_k}$ (by Vector Components)
* $ f(u) = f(\sum_{k=1}^n u_k*\vec{e_k})$ (rewriting $u$ according to vector components)
* $ f(u) = \sum_{k=1}^n f(u_k,*\vec{e_k})$ (by preservation of additivity of linear functions)
* $ f(u) = \sum_{k=1}^n u_kf(\vec{e_k})$ (by preservation of scalar multiplication)
* $ f(u) = \sum_{k=1}^n u_k * a_k$ (by our first definition)
* $ f(u) = <a,u> $ (the previous result is the exact definition of inner product) 
* To prove unqueness, suppose $ f(u) = <b, u>$ for some $b \in \mathbb{R}^n$ and for every $u \in \mathbb{R}^n$
* $f(e_k) = <b, e_k> = b_k$ (check what happens when you apply $f$ to $e_k$)
* But, we defined $a_k = f(e_k)$ so $a_k = b_k$ for $k=1...n$

### Operations on Real Vectors

Definition: A real vector is an ordered list of real numbers. For example, $v = [2, 3, 3.14, 2.5] \in \mathbb{R}^4$ 

Primary Operations on Real Vectors (by example):

**Vector Addition**: \
$
\left[\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}\right]
+ 
\left[\begin{matrix} 4 \\ 5 \\ 6 \end{matrix}\right]
= 
\left[\begin{matrix} 5 \\ 7 \\ 9 \end{matrix}\right]
$

Addition is done element-wise.

**Scalar Multiplication**: \
$
2.5 *
\left[\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}\right]
= 
\left[\begin{matrix} 2.5 \\ 5 \\ 7.5 \end{matrix}\right]
$

**Inner Product**: \
$
\langle
\left[\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}\right]
, 
\left[\begin{matrix} 4 \\ 5 \\ 6 \end{matrix}\right]
\rangle
= 
(1*4) + (2*5) + (3*6) = 32
$

*We don't typically define a vector multiplication operation - we don't multiply vectors component-wise Why not?* \
**We think of vectors as representations of linear functions.** Adding two linear functions gives you another linear function (vector addition). Multiplying a linear function by a scalar gives another linear function (scalar multiplication). Composition of two linear functions provides a value (inner product) which corresponds to applying the function. **Multiplying two linear functions gives a quadratic function, not a linear function.** 

In [None]:
import numpy as np

# Vector addition
a = [1,2,3]
b = [4,5,6]

c = np.add(a,b)
c

In [None]:
import numpy as np

# Scalar addition
a = 2.5
b = [1,2,3]

c = np.multiply(a,b)
c

In [None]:
import numpy as np

# Inner Product
a = [1,2,3]
b = [4,5,6]

c = np.inner(a,b)
c

### The Matrix-Vector Product

If $ A \in \mathbb{R}^{m*n}$ and $v \in \mathbb{R}^n$:\
Define matrix-vector product $Av$ to be the vector in $\mathbb{R}^{m}$:\
&nbsp;&nbsp;&nbsp;&nbsp; $\sum_{i=1}^m (\sum_{j=1}^n a_{ij}v_j)\vec{e_i}$ (this is the definition for proofs)\
where $A = [a_{ij}]_{i=1..m, j=1..n}$

The intuition...\

$\sum_{i=1}^m<A_i,v>e_i$

$m$ is the rows, $n$ is columns\
$A = \left[\begin{matrix} a_{11} & a_{12} & a_{1n}\\ a_{21} & a_{21} & a_{2n}\\ a_{m1} & a_{m2} & a_{mn} \end{matrix}\right]$

Define $A_i$ = $i$th row of $A$ for $i=1...m$\
$A_i = [a_{i1}, a_{i2} ... a_{in}]$ ( $A_i$ is a row vector)

$Av$ = $
\left(
\begin{matrix} <A_1, v> \\ <A_2, v> \\ <A_m, v> \end{matrix}
\right)
$

Example - (2x3) matrix (1x3) matrix:

$\left[\begin{matrix} 1 & 2 & 3\\ 4 & 5 & 6 \end{matrix}\right]$
$\left(\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}\right)$
= 
$\left(\begin{matrix} <[1, 2, 3], [1, 1, 1]> \\ <[4, 5, 6], [1, 1, 1]> \end{matrix}\right)$
=
$\left(\begin{matrix} 1 + 2 + 3 \\ 4 + 5 + 6 \end{matrix}\right)$
= 
$\left(\begin{matrix} 6 \\ 15 \end{matrix}\right)$

Notice if $A \in \mathbb{R}^{1*n}$ then the matrix-vector product corresponds to the inner product. See below:

$\left[\begin{matrix} 1 & 2 & 3 \end{matrix}\right]$
$\left(\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}\right)$
= 
$\left(\begin{matrix} <[1, 2, 3], [1, 1, 1]> \end{matrix}\right)$
=
$\left[\begin{matrix} 1 + 2 + 3 \end{matrix}\right]$
= 
$\left[\begin{matrix} 6 \end{matrix}\right]$
= 6



In [None]:
import numpy as np

# Matrix vector product (dot product)
a = [[1,2,3],[4,5,6]]
b = [1,1,1]

c = np.dot(a,b)
c

In [None]:
import numpy as np

# Matrix vector product (dot product)
a = [1,2,3]
b = [1,1,1]

c = np.dot(a,b)
c

### Matrices as Linear Functions

Theorem: If $f: \mathbb{R}^n \rightarrow \mathbb{R}^m $ is a linear function then $f$ is uniquely represented by a matrix $A \in \mathbb{R}^{m*n}$ such that $f(x) = Ax$ for every $x \in \mathbb{R}^n$

Every linear function has a unique representation as a matrix.

Proof Sketch:

* Since $f$ is linear, for $x,y \in \mathbb{R}^n$
  * $f(x+y) = f(x) + f(y)$
  * $f(ax) = af(x)$
* Define $for i=1..m, j=1..n$
  * $a_{ij} = <f(\vec{e_j}), e_i>$, $f(\vec{e_j}) \in \mathbb{R}^m$ and $e_i \in \mathbb{R}^n$
    * This means $a_{ij}$ is the $i$th component of $f(\vec{e_j})$
    * This is just a way to get the $a_{ij}$ element of a matrix, similar to what we did for vector components earlier)
    * Another way to write the same thing is $f(\vec{e_j}) = \sum_{i=1}^na_{ij}e_i$
* Claim that the matrix $A = [a_{ij}]_{i=1..m, j=1..n}$ represents $f$ - Need to show that $\forall x \in \mathbb{R}^n$, $f(x) = Ax$
* Let $x \in \mathbb{R}^n$. Write $x = \sum_{j=1}^nx_j\vec{e_j}$ (due to vector components)
* $f(x) = f(\sum_{j=1}^nx_j\vec{e_j})$
* $f(x) = \sum_{j=1}^nf(x_j\vec{e_j})$ (due to preservation of addition)
* $f(x) = \sum_{j=1}^nx_jf(\vec{e_j})$ (due to preservation of scalar multiplication because $x_j$ is a constant)
* $f(x) = \sum_{j=1}^nx_j\sum_{i=1}^ma_{ij}e_i$ (rewrite $f(\vec{e_j})$ from the definition earlier in the proof)
* $f(x) = \sum_{i=1}^m(\sum_{j=1}^na_{ij}x_j)e_i$ (just moving things around)
* $f(x) = Ax$ (from the definition of matrix vector product)

Uniqueness is left as an exercise

### Composition of Linear Functions

Suppose $f: \mathbb{R}^n \rightarrow \mathbb{R}^m $ and $g: \mathbb{R}^p \rightarrow \mathbb{R}^n $. $f$ and $g$ are linear functions.

Then the composition $(f \circ g): \mathbb{R}^p \rightarrow \mathbb{R}^m$ is defined by $(f \circ g)(\vec{u}) = f(g(\vec{u}))$ (first apply $g$, then apply $f$) 

Claim $(f \circ g)$ is linear

Proof Sketch:
* Let $u,v \in \mathbb{R}^p$ and let $a \in \mathbb{R}$

Additivity
* $(f \circ g)(u+v) = f(g(u + v))$
* $(f \circ g)(u+v) = f(g(u) + g(v)))$ (because g is linear, it can be separated out)
* $(f \circ g)(u+v) = f(g(u)) + f(g(v)))$ (because f is linear)
* $(f \circ g)(u+v) = (f \circ g)(u) + (f \circ g)(v)$ (apply definition of composition - this says that the composition preserves additivity)

Scalar Multiplication
* $(f \circ g)(au) = f(g(au))$ (because g is linear, it can be separated out)
* $(f \circ g)(au) = f(ag(u))$ (because of scalar multiplication)
* $(f \circ g)(au) = af(g(u))$ (because f is linear)
* $(f \circ g)(au) = a(f \circ g)(u)$ (apply definition of composition - this says that the composition preserves scalar multiplication)

***These two proofs show that $(f \circ g)$ is linear if $f$ and $g$ are both linear. Since $f \circ g$ is linear, the main result of this unit is that every linear function from $\mathbb{R}^m \rightarrow \mathbb{R}^n$ can be uniquely represented by a matrix $C \in \mathbb{R}^{m*p}$***

How to discover $C$?

Let's say you know the matrix of $f$ and $g$. How would you find the matrix of $C$?

Suppose
* $f(u) = Au$ for $u \in \mathbb{R}^n$
* $g(v) = Bv$ for $v \in \mathbb{R}^p$

Where 
* $A = [a_{ij}]_{i=1..m,j=1..n}$
* $B = [b_{jk}]_{j=1..n,j=1..p}$

So
* $(f \circ g)(v) = f(g(v))$
* $(f \circ g)(v) = f(Bv)$ (rewriting $g(v)$)
* $(f \circ g)(v) = f(\sum_{j=1}^n(b_{jk}v_k)e_j)$ (by definition of matrix vector product) 
* Let $x = \sum_{j=1}^n\sum_{k=1}^p(b_{jk}v_k)e_j$ (by variable substitution since the matrix vector product is a vector in $\mathbb{R}^n$)
  * $x_j = \sum_{k=1}^pb_{jk}v_k$
* $(f \circ g)(v) = f(x)$ (substituting the $x$ variable we created in the prior step)
* $(f \circ g)(v) = Ax$ (by definiton of matrix vector product)
* $(f \circ g)(v) = \sum_{i=1}^m(\sum_{j=1}^na_{ij}x_j)e_i$ (rewrite $Ax$ in matrix notation)
* $(f \circ g)(v) = \sum_{i=1}^m(\sum_{j=1}^na_{ij} \sum_{k=1}^pb_{jk}v_k)e_i$ (substitute x_j from the variable created earlier)
* $(f \circ g)(v) = \sum_{i=1}^m\sum_{k=1}^p(\sum_{j=1}^na_{ij}b_{jk})v_ke_i$ (reorder the summations since $C$ should be in $\mathbb{R}^{m*p}$)
* Let $c_{ik} = \sum_{j=1}^na_{ij}b_{jk}$ (create a new variable from the prior step)
  * $ C = [c_{ik}]_{i=1..m,k=i..p}$
* $(f \circ g)(v) = \sum_{i=1}^m\sum_{k=1}^pc_{ik}v_ke_i$ (by variable substitution) 
* $(f \circ g)(v) = Cv$ (this is the exact definiton of matrix multiplication) 

The result - If you want to discover the matrix $C$, which is the composition of $f$ and $g$, it is the matrix given by:
* $c_{ik} = \sum_{j=1}^na_{ij}b_{jk}$
  * The $ik$th entry is the inner product of the $i$th row of $A$ with the inner product of the $j$th row of $B$
* $ C = [c_{ik}]_{i=1..m,k=i..p}$ 

Later, we will define matrix multiplication to correspond to the composition of linear functions

### The Matrix Product (Matrix Multiplication)

Definition:

For:\
$A \in \mathbb{R}^{m*n}$ ($A$ represents a linear function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$)\
$B \in \mathbb{R}^{n*p}$ ($B$ represents a linear function $g: \mathbb{R}^{p} \rightarrow \mathbb{R}^{n}$)

Define the matrix product $AB$ to be the matrix $C \in \mathbb{R}^{m*p}$ representing $(f \circ g): \mathbb{R}^p \rightarrow \mathbb{R}^m$

If: 
* $A = [a_{ij}]_{i=1..m,j=1..n}$
* $B = [b_{jk}]_{j=1..n,j=1..p}$

Then :
* $ C = [c_{ik}]_{i=1..m,k=i..p}$ where $c_{ik} = \sum_{j=1}^na_{ij}b_{jk}$

An Alternative Characterization\
If we define for $i = 1...m$ and for $k=1..p$
* $A_i$ = the $i$th row of $A$
* $B_k$ = the $k$th column of $B$

Then $c_{ik} = <A_i,B_k>$ (This is the formalization of the matrix product for an example of multiplying two matrices together.)

**Example (2x3) * (3x4) Matrix:**

The number of columns in the first matrix must match the number of rows in the second matrix. The result will be a 2x4 ($m*p$) matrix.

$
\left[\begin{matrix} 1 & 2 & 3\\ 4 & 5 & 6 \end{matrix}\right]
* 
\left[\begin{matrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{matrix}\right]
= 
\left[\begin{matrix} 1+3 & 2 & 1+3 & 2 \\ 4+6 & 5 & 4+6 & 5 \end{matrix}\right]
= 
\left[\begin{matrix} 4 & 2 & 4 & 2 \\ 10 & 5 & 10 & 5 \end{matrix}\right]
$

In words:\
Take the inner product of the first row of $A$ and the each column of $B$. This becomes the top row of $C$. Take the inner product of the next row of $A$ and each column of $B$ - this becomes the next row of $C$... and so on. $C$ will have the same number of rows as $A$ and the same number of columns as $B$



In [None]:
import numpy as np

# Matrix Product
a = [[1,2,3], [4,5,6]]
b = [[1,0,1,0], [0,1,0,1], [1,0,1,0]]

c = np.dot(a,b)
c

### Recap

#### Unit 1: High-Level Objectives
* Motivate and define what is a linear function on real vectors
* Explain how to represent linear functions using real matrices
* Identify matrix multiplication with composition of linear functions

What is a function? 
* A function f is a rule that assigns to each element x in its domain an outputf(x) in its codomain
  * Domain: set of inputs
  * Codomain: superset of outputs
  * Range: set of all outputs
* Functions can be represented by formulas, words, sets, and graphs

What Is a Linear Function?
* A linear function preserves addition and scalar multiplication on real vectors ( R ninto→ R m)
* Affine function: maps line segments proportionally to line segments; motivated by functions with two-dimensional graphs that are lines
  * A function $ f: \mathbb{R}^n \rightarrow \mathbb{R}^m $ is called affine if for every $u,v \in \mathbb{R}^n$ and for every $ t \in [0,1]$\
$f((1-t)u + tv) = (1-t)f(u) + tf(v)$
  * A linear function is an affine function that maps zero to zero

Affine Functions as Linear Functions
* Affine functions are linear functions plus a constant
* Affine functions with an n-dimensional domain are equivalent to linear functions with an(n+1)-dimensional domain

Operations on Real Vectors
* Real vectors (elements of Rn)
  * Can be added to each other
* Can be multiplied by a real number
* To get inner product of two vectors:
  * Pairwise multiply the components
  * Add the results
  
Summation and Applications
* Summation: succinctly representing formulas
* Formulas involving vector components and the ith basis vector in Rn:
  * $x_k = <x, e_k>$
  * For any $x \in \mathbb{R}^n$, we can write $\vec{x} = \sum_{k=1}^n x_k*\vec{e_k}$
* Where we can obtain the ith component of that vector by taking the product of that vector with the ith basis element of Rn
* We can express any vector as the sum of the kth component times the kth elementary basis vector in Rn

Vectors as Single-Valued Linear Functions
* Form for all single-valued linear functions $f : \mathbb{R}^n \rightarrow \mathbb{R}$:
  * $f(x) = <a,x>$ for some real vector $a$
  
Operations on Real Matrices
* Real matrices (elements of $\mathbb{R}^{m*n}$) can be added or multiplied by a real number

Matrix Vector Product
* For $ A \in \mathbb{R}^{m*n}$ and $v \in \mathbb{R}^n$:
  * $\sum_{i=1}^m (\sum_{j=1}^n a_{ij}v_j)\vec{e_i}$
* If $A_i$ denotes the ith row of $A$, the matrix-vector product expression is:
  * $\sum_{i=1}^m<A_i,v>e_i$
  
Matrices as Linear Functions
* Form for every function $f : \mathbb{R}^n \rightarrow \mathbb{R}^m$:
  * $f(v)=Av$ (for some unique matrix $A \in \mathbb{R}^{m*n}$)
  
Composition of Linear Functions
* Composition of two linear functions is linear.
* The matrix of a composition of linear functions can be expressed in terms of the matrices of the component functions.

Matrix Product
* The product of matrices is defined to match composition of functions.