In [2]:
import numpy as np

### Linear transformations as matrices

> The goal of this section is to show students how to think about linear transformations as matrices and build up to a more abstract concept of **linear operators** which we will use in the matrix calculus section below.

Thoughts on what to include:

- Define what a linear map is and how this leads to the definition of linear.

- Discuss the difference between [linear transformation and linear operator](https://math.stackexchange.com/questions/487933/what-is-the-difference-between-linear-transformation-and-linear-operator) in the context the audience will hear it in for matrix calculus

- Show some very basic examples of an arbitrary linear map you defined from R^2 -> R^2

- Then, give examples of transformations like rotations

- Show more advanced examples of linear transformations (moving from different spaces. We have transformations for R^n -> R^m so show the more general/abstract way to think about this then give some examples of R^4 -> R^2 and **what condition may make this linear** and from R^3 -> R^5, etc. What other general things can we say about linearity between two different vector spaces of different dimensions? What can we say about linearity between two vector spaces of the same dimension?

- Why does this motivate linear transformations as matrices? 

- How does this relate to matrix calculus?

When we say something is [linear](https://en.wikipedia.org/wiki/Linearity) in linear algebra we mean it satisfies two properties:

- Additivity: f(x + y) = f(x) + f(y)
- Scalar multiplication: f($\alpha$x) = $\alpha$f(x) $\forall$ $\alpha$

If I define a function f to be the following matrix:

f = $\begin{bmatrix}
1 & 2 \\
7 & 3
\end{bmatrix}$

And I define two column vectors, x and y, to be:

x = $\begin{pmatrix} 1 \\ 4 \end{pmatrix}$
y = $\begin{pmatrix} 2 \\ 5 \end{pmatrix}$

Then additivity tells me that doing f(x + y) = f(x) + f(y). We can confirm this is linear in numpy. (The reason this is linear is not because this is random but there's some property about these objects being in a vector space that is ...


Also note: Here I am defining a function as a matrix. (I am doing this because we can think about this as a linear map from R^2 -> R^2 so we can define f: R^2 -> R^2. And we will see that we can define functions as matrices). 

**Notation in python: @ is the same thing as typing np.matmul(), * will be the element-wise product for vectors, + is the element-wise addition for vectors.**

In [19]:
# Check additivity

f = np.array([[1, 2],
             [7, 3]])

x = np.array([[1], [4]])

y = np.array([[2], [5]])

# LHS of linearity
LHS = f @ (x + y)
print(f"LHS: {LHS}")

# RHS of linearity
RHS = (f @ x) + (f @ y)
print(f"RHS: {RHS}")

#assert LHS.all() == RHS.all()

LHS: [[21]
 [48]]
RHS: [[21]
 [48]]


In [29]:
# Check scalar multiplication (also called homogeniety of degree?)

#alpha = 4

# We can see the same thing for an arbitrary alpha value (just to show you there's nothing special about 4)
alpha = np.random.randint(0, 10)

LHS = f @ (alpha * x)
RHS = alpha * (f @ x)

print(f"LHS: {LHS}")
print(f"RHS: {RHS}")

LHS: [[18]
 [38]]
RHS: [[18]
 [38]]


### Show the matrix dot product is equal to tr(A^T* B)


This confirms our intuition about the element-wise multiplication equaling the tr(A^T) * B.

*Would be nice to explain more about the trace operator and some of the properties like cyclic property.* What is my hypothesis on why the trace operator appears so often. 

Look at this [link](https://math.stackexchange.com/questions/4453933/why-is-the-trace-of-a-matrix-important).

In [2]:
A = np.array([[3, 4],
             [4, 5]])

B = np.array([[3, 8],
             [2, 3]])


element_wise = A * B
element_wise

array([[ 9, 32],
       [ 8, 15]])

In [3]:
sums = 0
for element in np.nditer(element_wise):
    sums += element
sums

64

In [4]:
AT = np.transpose(A)
ATB = np.matmul(AT, B)


traceAB = np.trace(ATB)
traceAB

64

In [5]:
dA = np.array([[3, 6],
              [2, 1]])

# m x n (2 x 2)
x = np.array([[3, 4],
             [4, 2]])

# n x m (2x2)
y = np.array([[3, 2],
             [2, 5]])

y_t = np.transpose(y)

(x @ y_t) * dA

array([[ 51, 156],
       [ 32,  18]])

In [12]:
### Check this


x_t = np.transpose(x)

df = x_t @ dA @ y
df

array([[ 95, 144],
       [100, 162]])

Weighted dot product with I

In [10]:
I = np.identity(2)

x_t @ y

array([[17, 26],
       [16, 18]])

In [11]:
# Yes, this weighting with I gives us the same answer

x_t @ I @ y

array([[17., 26.],
       [16., 18.]])

### Explain the Jacobian matrix and determinant

### Explain the Kronecker product and give an example with Jacobian

- Explain the indices notation versus Kronecker product and the interestingness of the matrix operation

### Explain chain rule for matrix calculus