Tracking third deliverable:

Goals: Have a completed notebook by **April 27** and a completed presentation "5 things about matrix calculus in 15 minutes" by **May 1.**

TODO:

- Linear transformations as matrices section (Motivate d's as lin ops)
    - Ask Sarah questions during Tuesday class (ask to speak first) to complete the mapping from different m, n. 
    - Include the key fact that when we think about linear transformations as matrices we have a key reason for why matmul isn't communitive **and** this also relates to why we can't take derivatives of A^2 as 2A. (Maybe show an example that this only works unless commute like I or something else?)

- Briefly discuss how the intuition for linear transformations as matrices motivates derivatives as linear operators (show a single-variable calculus example)
    - Begin with a table of the functions you will review: scalar-valued functions (take in a vector and return a scalar), vector-valued functions (take in a vector and return a vector), scalar-valued functions with matrices as input. 
    - At this point mention `f'(x)[dx]` where we will be thinking about arbitrary infinitesimal changes in dx as a vector.
    - Thinking about this as a vector transitions into scalar-valued functions. Show the gradient (we know we want a scalar output, we know [dx] is a column vector so what can we multiply dx by to get a scalar? The basic linear algebra answer is: "a row vector" but *what* are the components of this row vector? Since we know f' is a linear operator that takes in a vector we take the derivative of this vector which in multivariable calculus language means we take the component-wise derivatives of each component in the vector. That's how we know this is grad f since this linear operator treatment literally gives us the gradient. 
    - **Next is showing example 10 from the project notes** and this is a big slide because it changes our perspective of derivatives from component-wise things to *matrices.* Note to the audience that this grad f isn't a rule for *all* matrices, right? We are just thinking about how to get intuition for what grad f is in an example. 
    - In vector-valued functions we apply the same intuition of f' as a linear operator. Now we want vectors out so we can think about our components (n in, m out) so we need a matrix that's m x n. We learned that this is the Jacobian (computes partial derivatives in each direction). 

- Product rule and chain rule for functions on "arbitrary vector spaces"
    - **Question for Sarah:** How do I understand the product rule and the chain rule (I understand it from the linear operator perspective (another chance to show the class linear transformations as matrices) but I do not understand it from the defn of the derivative. 
    - (Thursday question: f(A) = A^3. Why derivative)

- Jacobian vs. Kronecker product (I think this is where we use the notebook and Julia)
    - In this section I am most interested in showing that the Jacobian is ugly when we compute component-wise derivatives. It's easy to make Julia do it for both symbolic and numerical examples *but* we would like to "write the Jacobian without explicitly writing it" -- Alan Edelman. 
    - This motivates the Kronecker product A ⊗ B. Interesting, personally, because this is an operation where all 4 input/output pairs can be different numbers *and* we are doing this as the product of two matrices. Most of our typical LinAlg content was dimension specific because we were defining matrix-vector or matrix-matrix operations and needed these to be the same. 
    - Need to show the Kronecker identity (Prop 27) to the class so they know how we get the equivalence (I think this is a key thing to understand)

In [1]:
# Define Julia packages
using LinearAlgebra, Symbolics

In [2]:
# Define variables
@variables a, b, c, d

X = [a b; c d]

2×2 Matrix{Num}:
 a  b
 c  d

In [3]:
X^2

2×2 Matrix{Num}:
 a^2 + b*c  a*b + b*d
 a*c + c*d  b*c + d^2

In [4]:
# This defines the function and our "Y" is X^2?

jac(Y, X) = Symbolics.jacobian(vec(Y), vec(X))

jac (generic function with 1 method)

In [5]:
# I think we would get the same answer if we took the Jacobian by hand. Right, the partial derivatives...
# ... of each term in X^2 are in the first row. 1,1 entry of X^2 is the first row of J
# ... and 2,1 entry of X^2 is the second row of J, etc. 

J = jac(X^2, X)

4×4 Matrix{Num}:
 2a      b      c   0
  c  a + d      0   c
  b      0  a + d   b
  0      b      c  2d

In [6]:
begin 
    I2 = [1 0; 0 1]
    kron(I2,X) + kron(X', I2)
end

4×4 Matrix{Num}:
 2a      b      c   0
  c  a + d      0   c
  b      0  a + d   b
  0      b      c  2d

In [17]:
# Symbolic representation
SymB = [a b; c d]

kron(I2, SymB)

4×4 Matrix{Num}:
 a  b  0  0
 c  d  0  0
 0  0  a  b
 0  0  c  d

In [11]:
B = rand(2, 2)

2×2 Matrix{Float64}:
 0.251488  0.68133
 0.814871  0.655163

In [13]:
kron(I2, B)

4×4 Matrix{Float64}:
 0.251488  0.68133   0.0       0.0
 0.814871  0.655163  0.0       0.0
 0.0       0.0       0.251488  0.68133
 0.0       0.0       0.814871  0.655163

In [25]:
# A kron I
@variables c1 , c2, c3, c4

A = [a b; c d]
C = [c1; c2; c3; c4]
vC = vec(C)
kron(A, I2) * vC

4-element Vector{Num}:
 a*c1 + b*c3
 a*c2 + b*c4
 c*c1 + c3*d
 c*c2 + c4*d

### Linear transformations as matrices

> The goal of this section is to show students how to think about linear transformations as matrices and build up to a more abstract concept of **linear operators** which we will use in the matrix calculus section below.

Thoughts on what to include:

- Discuss the difference between [linear transformation and linear operator](https://math.stackexchange.com/questions/487933/what-is-the-difference-between-linear-transformation-and-linear-operator) in the context the audience will hear it in for matrix calculus. Define both of these (this will transition into the arbitrary linear map content)

- Show some very basic examples of an arbitrary linear map you defined from R^2 -> R^2
    - **Ask Sarah: If we think about linear transformations as matrices then isn't it also intuitive to think about them as functions?** So it is fine to use function notation at the beginning?

- Then, give examples of transformations like rotations or projections (e.g. my linear approx talk)

- Share the intuition behind advanced examples of linear transformations (moving from different spaces. We have transformations for R^n -> R^m so show the more general/abstract way to think about this then give some examples of R^4 -> R^2 and **what condition may make this linear** and from R^3 -> R^5, etc. What other general things can we say about linearity between two different vector spaces of different dimensions? What can we say about linearity between two vector spaces of the same dimension?
    - I think this may be as simple as verifying the conditions of our transformation work between R^n -> R^m for different m and n. **Confirm.**

- Why does this motivate linear transformations as matrices?  

- How does this relate to matrix calculus?


#### Definition of a linear transformation and brief example in numpy

The word *transformation* in linear transformation comes because we are taking a vector (or matrix) as input to our function (which is defined by a matrix) and transforming it from one vector space to another vector space. In some cases we can transform the object from the same vector space to the same vector space. [Rotation matrices](https://academicflight.com/articles/kinematics/rotation-formalisms/rotation-matrix/) are an example of this type of linear transformation. Another word for this specific transformation where we transform an object from one vector space to the same vector space is an endomorphism. 

Anyway, the word *[linear](https://en.wikipedia.org/wiki/Linearity)* in mean our transformation satisfies two properties:

- Additivity: f(x + y) = f(x) + f(y)
- Scalar multiplication: f($\alpha$x) = $\alpha$f(x) $\forall$ $\alpha$

It is not a coincidence that these two criteria for a linear transformation are also two of the most important criteria to have a vector space. 

Now, if we look at an example of this transformation, I define a function f to be the following matrix:

f = $\begin{bmatrix}
1 & 2 \\
7 & 3
\end{bmatrix}$

And I define two column vectors, x and y, to be:

x = $\begin{pmatrix} 1 \\ 4 \end{pmatrix}$
y = $\begin{pmatrix} 2 \\ 5 \end{pmatrix}$

Then additivity tells me that doing f(x + y) = f(x) + f(y). We can confirm this is linear in numpy. See the first and second code block below.

**Notation in python: @ is the same thing as typing np.matmul(), * will be the element-wise product for vectors, + is the element-wise addition for vectors.**

#### Step back: What do the entries of a matrix actually tell us?

In our matrix f the column vectors of this matrix are *basis vectors* of our space. If we refer to the [standard basis](https://mathworld.wolfram.com/StandardBasis.html) then this matrix f moves the standard basis vectors by [1, 7] and [2, 3] respectively. The transformation part comes in again because we can imagine entries of x and y, these 2x1 column vectors, as *any* two real numbers. And multiplying x or y by f gives us the 

In [19]:
# Check additivity

f = np.array([[1, 2],
             [7, 3]])

x = np.array([[1], [4]])

y = np.array([[2], [5]])

# LHS of linearity
LHS = f @ (x + y)
print(f"LHS: {LHS}")

# RHS of linearity
RHS = (f @ x) + (f @ y)
print(f"RHS: {RHS}")

#assert LHS.all() == RHS.all()

LHS: [[21]
 [48]]
RHS: [[21]
 [48]]


In [29]:
# Check scalar multiplication (also called homogeniety of degree?)

#alpha = 4

# We can see the same thing for an arbitrary alpha value (just to show you there's nothing special about 4)
alpha = np.random.randint(0, 10)

LHS = f @ (alpha * x)
RHS = alpha * (f @ x)

print(f"LHS: {LHS}")
print(f"RHS: {RHS}")

LHS: [[18]
 [38]]
RHS: [[18]
 [38]]


In [3]:
# Define a symbolic array

@variables a b c d
A = [a b 
    c d]

2×2 Matrix{Num}:
 a  b
 c  d

In [1]:
A * tr()

LoadError: UndefVarError: `variables` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

### Show the matrix dot product is equal to tr(A^T* B)


This confirms our intuition about the element-wise multiplication equaling the tr(A^T) * B.

*Would be nice to explain more about the trace operator and some of the properties like cyclic property.* What is my hypothesis on why the trace operator appears so often. 

Look at this [link](https://math.stackexchange.com/questions/4453933/why-is-the-trace-of-a-matrix-important).

In [2]:
A = np.array([[3, 4],
             [4, 5]])

B = np.array([[3, 8],
             [2, 3]])


element_wise = A * B
element_wise

array([[ 9, 32],
       [ 8, 15]])

In [3]:
sums = 0
for element in np.nditer(element_wise):
    sums += element
sums

64

In [4]:
AT = np.transpose(A)
ATB = np.matmul(AT, B)


traceAB = np.trace(ATB)
traceAB

64

In [5]:
dA = np.array([[3, 6],
              [2, 1]])

# m x n (2 x 2)
x = np.array([[3, 4],
             [4, 2]])

# n x m (2x2)
y = np.array([[3, 2],
             [2, 5]])

y_t = np.transpose(y)

(x @ y_t) * dA

array([[ 51, 156],
       [ 32,  18]])

In [12]:
### Check this


x_t = np.transpose(x)

df = x_t @ dA @ y
df

array([[ 95, 144],
       [100, 162]])

Weighted dot product with I

In [10]:
I = np.identity(2)

x_t @ y

array([[17, 26],
       [16, 18]])

In [11]:
# Yes, this weighting with I gives us the same answer

x_t @ I @ y

array([[17., 26.],
       [16., 18.]])

### Explain the Jacobian matrix and determinant

### Explain the Kronecker product and give an example with Jacobian

- Explain the indices notation versus Kronecker product and the interestingness of the matrix operation

### Explain chain rule for matrix calculus