In [1]:
# Define Julia packages
using LinearAlgebra, Symbolics

In [2]:
# Define Python packages
import numpy as np
from scipy.differentiate import jacobian

### Chain rule on arbitrary vector spaces example

First, I want to show an example of the linear transformations fact I discussed from [slides 8-12](https://docs.google.com/presentation/d/1zZBsD1vl5pKEUobh44VGZDVOXnkF7r7LBsyj4JcBk7g/edit?usp=sharing). 

I am defining three matrices A, B, C and a vector, x. At first I am multiplying ABC which we know is a composition equivalent to writing (a(b(c(x))). After multiplying these three matrices I apply this matrix with the variable name, `composition` to the vector x. 

Then, I change the strategy and multiply the respective matrix (A, B, C) to each output of the previous composition. For example: I multiply C(x) first and assign this a new variable name. Then I multiply B by the new variable name to get *yet* another matrix with a new variable name. Lastly I multiply A by this new matrix. 

I've used an assert statement to confirm that the output of the two approaches is the same. The first approach, multiply all matrices then apply it to x is my preferred approach. I believe this is also the standard way to do it.

Because we established these approaches are the same I will use the first approach in the Jacobian examples. 

In [34]:
# Define 3 matrices
#TODO: Later convert these constants to random ints: np.random.randint(x, y)

m = 3
q = 5
p = 5
n = 4

# Set a seed 
np.random.seed(0)

A = np.random.rand(m, q)
B = np.random.rand(q, p)
C = np.random.rand(p, n)
x = np.random.rand(n, 1)

In [51]:
composition = A @ B @ C

In [48]:
composition @ x

array([[3.49606573],
       [3.8873989 ],
       [3.26430787]])

In [53]:
c_to_x = C @ x
b_to_Cx = B @ c_to_x
A_to_BCx = A @ b_to_Cx
A_to_BCx

array([[3.49606573],
       [3.8873989 ],
       [3.26430787]])

In [52]:
assert (composition @ x).all() == A_to_BCx.all()

In [57]:
# Try example

def f(x):
    x1, x2, x3 = x
    return [x1, 5*x3, 4*x2**2 - 2*x3, x3*np.sin(x1)]

In [58]:
def df(x):
    x1, x2, x3 = x
    one = np.ones_like(x1) # ?
    return [[one, 0*one, 0*one],
                [0*one, 0*one, 5*one],
                [0*one, 8*x2, -2*one],
                [x3*np.cos(x1), 0*one, np.sin(x1)]]

In [64]:
df = df(x)
df

[[array(1), 0, 0],
 [0, 0, 5],
 [0, 16, -2],
 [-0.9899924966004454, 0, 0.1411200080598672]]

In [65]:
rng = np.random.default_rng()
x = rng.random(size=3)
res = jacobian(f, x)
res

TypeError: 'list' object is not callable

### Jacobian in Kronecker Product notation

In [5]:
# Define variables
@variables a, b, c, d

X = [a b; c d]

2×2 Matrix{Num}:
 a  b
 c  d

In [6]:
X^2

2×2 Matrix{Num}:
 a^2 + b*c  a*b + b*d
 a*c + c*d  b*c + d^2

In [7]:
# This defines the function and our "Y" is X^2?

jac(Y, X) = Symbolics.jacobian(vec(Y), vec(X))

jac (generic function with 1 method)

In [8]:
# I think we would get the same answer if we took the Jacobian by hand. Right, the partial derivatives...
# ... of each term in X^2 are in the first row. 1,1 entry of X^2 is the first row of J
# ... and 2,1 entry of X^2 is the second row of J, etc. 

J = jac(X^2, X)

4×4 Matrix{Num}:
 2a      b      c   0
  c  a + d      0   c
  b      0  a + d   b
  0      b      c  2d

In [9]:
begin 
    I2 = [1 0; 0 1]
    kron(I2,X) + kron(X', I2)
end

4×4 Matrix{Num}:
 2a      b      c   0
  c  a + d      0   c
  b      0  a + d   b
  0      b      c  2d

In [23]:
# Symbolic representation
SymB = [a b; c d]

kron(I2, SymB)

4×4 Matrix{Num}:
 a  b  0  0
 c  d  0  0
 0  0  a  b
 0  0  c  d

In [22]:
B = rand(2, 2)

2×2 Matrix{Float64}:
 0.797831  0.737042
 0.170769  0.511479

In [21]:
kron(I2, B)

4×4 Matrix{Float64}:
 0.750043  0.179112  0.0       0.0
 0.80159   0.841292  0.0       0.0
 0.0       0.0       0.750043  0.179112
 0.0       0.0       0.80159   0.841292

In [20]:
# A kron I
@variables c1 , c2, c3, c4

A = [a b; c d]
C = [c1; c2; c3; c4]
vC = vec(C)
kron(A, I2) * vC

4-element Vector{Num}:
 a*c1 + b*c3
 a*c2 + b*c4
 c*c1 + c3*d
 c*c2 + c4*d

### Quick examples of two properties of trace and *why* these hold

In [30]:
@variables e, f, g, h

B = [e f; g h]

2×2 Matrix{Num}:
 e  f
 g  h

In [32]:
tr(A*B)

a*e + b*g + c*f + d*h

In [33]:
A*B

2×2 Matrix{Num}:
 a*e + b*g  a*f + b*h
 c*e + d*g  c*f + d*h

In [35]:
B*A

2×2 Matrix{Num}:
 a*e + c*f  b*e + d*f
 a*g + c*h  b*g + d*h

In [36]:
tr(B*A)

a*e + b*g + c*f + d*h

In [38]:
@variables i, j, k, l

C = [i j; k l]

2×2 Matrix{Num}:
 i  j
 k  l

In [39]:
tr(A*B*C)

(a*e + b*g)*i + (a*f + b*h)*k + (c*e + d*g)*j + (c*f + d*h)*l

In [43]:
tr(C*A*B)

(a*i + c*j)*e + (a*k + c*l)*f + (b*i + d*j)*g + (b*k + d*l)*h

In [40]:
tr(B*C*A)

a*(e*i + f*k) + b*(g*i + h*k) + c*(e*j + f*l) + d*(g*j + h*l)

### Show the matrix dot product is equal to tr(A^T* B)


This confirms our intuition about the element-wise multiplication equaling the tr(A^T) * B.

*Would be nice to explain more about the trace operator and some of the properties like cyclic property.* What is my hypothesis on why the trace operator appears so often. 

Look at this [link](https://math.stackexchange.com/questions/4453933/why-is-the-trace-of-a-matrix-important).