In [1]:
# setup environment
import numpy as np
import numpy.linalg as la
import sympy

# 6 Transform

**NOTE**: I will not copy all the precise definitions (please Google if needed) here. The explainations below are my own words for easy understanding.

## 6.1 Kernal and Range

**Transform** means a function that convert input $x$ to output $y$. Please note $x$ and $y$ are not necessarily single values or of same dimension, or even numbers.

**Domain** is the range of all possible input $x$.

**Codomain** is the range of all possible $y$. Basically it means where $y$ is from (e.g. all integers), ***not necessarily*** each $y$ in codomain must be able to get from the function of interest.

---

Consider the `int()` function:

In [2]:
x = 3.24
y = int(x)
print(f'int({x}) = {y}')

int(3.24) = 3


Its domain is real number, but the codomain is integer.

---

A result of a function is called a **image**, and a set of images is called **range**. Range is a ***subset*** of codomain.

Simply put, range is all $y$ that can be get from the function of interest.

***3 types of relationships*** between domain, codomain and range:  
1. **Injective**: each $x$ points to one specific (no repetition) $y$.
    - Each $x$ must have one $y$.
    - Different $x$ have different $y$.
    - NOT necessarily each $y$ (in codomain) can find its $x$.
2. **Surjective**: range = codomain.
    - Each $y$ must have its $x$ (maybe more than one $x$)
3. **Bijective**: One-to-one correspondece between $x$ and $y$
    - Each $x$ has its one specific $y$
    - Each $y$ has its one specific $x$
    - **Bijective** = **Injective** AND **Surjective**

---

A transformation of linear combination is called **linear transformation**.

In below context, transformation means linear transformation.

$T:V\to W$ ($T$ means transformation, $V$ and $W$ are vector spaces), below criterias must hold:  
1. $T(v_1+v_2)=T(v_1)+T(v_2)$ for any vectors $v_1$ and $v_2$ in $V$
2. $T(\alpha v)=\alpha T(v)$ for any scalar $\alpha$

---

For a linear transformation, when the range is all 0, the domain is called the **kernel**.

How to understand **kernel**?  
- Kernel means all the vectors in $V$ that will be mapped to 0 in $W$ by $T$, which is effectively the ***nullspace*** of matrix $T$.
- Kernel is related to dimension shrinkage after transformation.
    - Consider $T(x,y,z)\to (0,y,z)$, basically this transformation is to squeeze the whole $x$ axis line into one zero point.
    - Thus all the vectors lie on $x$ axis form the kernel space
    - $ker(T)=Span\{<1,0,0>\}$
    - Dimension: $V=3$, $W=2$, $ker(T)=1$
- ***dim domain + dim kernel = dim codomain***

---

## 6.2 Linear transformation

Not all linear combinations are linear transformations. Key point is the transformation should **NOT** move 0 vector.

Consider below transformation, is it linear?  
$$T(<x_1,x_2,x_3>)=<2x_1+x_3, -4x_2>$$  
We can rewrite the equation as below:  
$$\begin{bmatrix}
2 & 0 & 1 \\
0 & -4 & 0
\end{bmatrix} \begin{bmatrix}
x_1 \\
x_2 \\
x_3
\end{bmatrix} = \begin{bmatrix}
y_1 \\
y_2
\end{bmatrix}$$

In [3]:
A = sympy.Matrix([[2,0,1],
                  [0,-4,0]])
A.rref()

(Matrix([
 [1, 0, 1/2],
 [0, 1,   0]]),
 (0, 1))

The rref of the standard matrix A is a linearly dependent containing one free variable, and a linear combination is established.

In [4]:
v1 = np.array([3,6,9])
v2 = np.array([2,5,8])
c = 6

def trans1(x):
    y1 = 2*x[0] + x[2]
    y2 = -4 * x[1]
    return(np.array([y1, y2]))

print('T(v1) + T(v2) = T(v1 + v2) ?')
print(np.equal(trans1(v1+v2), trans1(v1)+trans1(v2)))

print('\nc*T(v1) = T(c*v1) ?')
print(np.equal(6*trans1(v1), trans1(6*v1)))

T(v1) + T(v2) = T(v1 + v2) ?
[ True  True]

c*T(v1) = T(c*v1) ?
[ True  True]


This transformation satisfies the requirement of linear transformation.

In [5]:
trans1([0,0,0])

array([0, 0])

Note the zero vector is still zero vector after transformation, though space dimension is changed.

---

Test if the equation below is a linear transformation:  
$$T(<x_1,x_2,x_3>)=<4x_1+2x_2,0,x_1+3x_3-2>$$  
We can rewrite it in the form below:  
$$\begin{bmatrix}
4 & 2 & 0 & 0 \\
0 & 0 & 0 & 0 \\
1 & 0 & 3 & -2
\end{bmatrix} \begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
1
\end{bmatrix} = \begin{bmatrix}
y_1 \\
y_2 \\
y_3
\end{bmatrix}$$

In [6]:
A = sympy.Matrix([[4,2,0,0],
                  [0,0,0,0],
                  [1,0,3,-2]])
A.rref()

(Matrix([
 [1, 0,  3, -2],
 [0, 1, -6,  4],
 [0, 0,  0,  0]]),
 (0, 1))

Two leading variables depend not only on another free variable, but on constant as well:  
$$\begin{align*}
x_1 &= -3x_3+2\\
x_2 &= 6x_3-4
\end{align*}$$

In [7]:
def trans2(x):
    y1 = 4*x[0]+2*x[1]
    y2 = 0
    y3 = x[0]+3*x[2]-2
    return(np.array([y1,y2,y3]))

print('T(v1) + T(v2) = T(v1 + v2) ?')
print(np.equal(trans2(v1+v2), trans2(v1)+trans2(v2)))

print('\nc*T(v1) = T(c*v1) ?')
print(np.equal(6*trans2(v1), trans2(6*v1)))

T(v1) + T(v2) = T(v1 + v2) ?
[ True  True False]

c*T(v1) = T(c*v1) ?
[ True  True False]


This linear combination is **NOT** a linear transformation.

In [8]:
trans2([0,0,0])

array([ 0,  0, -2])

Note the zero vector is ***shifted*** by this transformation.

### 6.2.1 Special Linear Transform
#### 6.2.1.1 Linear transformation in the same dimension

If a square matrix is reversible, then this linear transformation is bijective, one-to-one mapping of vectors in the same space:  
$$Ax=y\to A^{-1} y=x$$

#### 6.2.1.2 Rotate a certain angle

$$R_\theta = \begin{bmatrix}
\cos(\theta) & -\sin(\theta) \\
\sin(\theta) & \cos(\theta)
\end{bmatrix}$$
In 2d linear space, above matrix transforms vectors counterclockwise by angle $\theta$. 

In [9]:
# rotate vector b counterclockwise by 45 deg
theta = np.deg2rad(45)
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta), np.cos(theta)]])
b = np.array([[3], [2]])
print(np.dot(R, b))

[[0.70710678]
 [3.53553391]]


## 6.3 Orthogonal set and projection
### 6.3.1 Orthogonal set

In a vector set, if the vectors are perpendicular to each other, then it is an **orthogonal set**:  
$$\text{set} \{u_1,u_2,\dots ,u_n\}\in R^n\\
u_i\cdot u_j=0, i\neq j$$

It is easy to prove that an dimension $n$ orthogonal set is linearly independent and forms base vectors of $R^n$.

It is easy to understand. In a 3d space, (x, y, z) axis naturally form one orthogonal set. Any other orthogonal set can be linearly transformed from these axis by rotating the space.

### 6.3.2 Orthogonal Projection

Consider 2 vectors $u$ and $z$, we can split $z$ into another 2 vectors:  
$$z=\hat u + v$$  
- $\hat u$ is aligned with $u$
- $v$ is perpendicular to $u$

This process is called **orthogonal projection** of $z$ on $u$.

Calculation can be done with below equations:  
$$\begin{align*}
\hat u&=\frac{z\cdot u}{\left\| u \right\|^2}\cdot u = \frac{z\cdot u}{u\cdot u}\cdot u \\
v &= z - \hat u = z-\frac{z\cdot u}{u\cdot u}\cdot u
\end{align*}$$

In [10]:
z = np.array([7,6])
u = np.array([4,2])
zu = np.dot(z.T, u)
uu = np.dot(u.T, u)
hat_u = zu/uu*u
v = z - hat_u

print('    z =', z)
print('    u =', u)
print('hat_u =', hat_u)
print('    v =', v)

    z = [7 6]
    u = [4 2]
hat_u = [8. 4.]
    v = [-1.  2.]


### 6.3.3 Orthonormal

An orthogonal set is **orthonormal** if all the vectors are normal (unit, length of 1).

Simple example: all the unit vectors on axis x, y, z, etc. form orthonormal set.

Orthonormal matrix $U$ has an interesting feature:  
$$U^\text T \cdot U=I \Longleftrightarrow U^\text T = U^{-1}$$

It is easy to prove (not shown here). We can use this feature to check if a matrix is orthonormal.

Transformation by orthonormal matrix simply rotate the vectors in the space, perserving their length and angle between each other:  
- $\left\| Ux \right\|=\left\| x \right\|$ (preserve size)
- $(Ux)\cdot (Uy)=x\cdot y$ (preserve angle between $x$ and $y$)
- $x\cdot y=0 \to (Ux)\cdot (Uy)=0$ (preserve orthogonal relationship)

### 6.3.4 Gram-Schmidt Process

**Gram-Schmidt Process** is an algorithm to generate orthogonal or orthonormal basis in a subspace (non zero) of $R^n$.  
$$W=Span\{x_1, x_2, \dots , x_n\} \to \{v_1, v_2, \dots , v_n\}$$  
Where $X$ is a basis set of subspace $W$ and we want to calculate orthogonal basis set $V$.  
1. $v_1 = x_1$
2. $v_2 = x_2 - \frac{x_2\cdot v_1}{v_1\cdot v_1}v_1$
    - Take $x_2$ orthogonal projection on $v_1$, keep only the orthogonal part and assign it as $v_2$
3. $v_3 = x_3 - \frac{x_3\cdot v_1}{v_1\cdot v_1}v_1 - \frac{x_3\cdot v_2}{v_2\cdot v_2}v_2$
    - Take $x_3$ orthogonal projection on $v_1$ and $v_2$, keep only the orthogonal part and assign it as $v_3$
4. Continue this way to find all $v_n$
5. If you want orthonormal basis, simply rescale all $v_n$

---

Calculate orthogonal basis vectors from the basis set $\{x_1,x_2,x_3\}$ of $W$, which is a subspace of $R^4$.  
$$x_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}, 
x_2 = \begin{bmatrix} 0 \\ 1 \\ 1 \\ 1 \end{bmatrix}, 
x_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \\ 1 \end{bmatrix}$$

In [11]:
def orthoCoefS(x, y):
    """Coefficient of x orthogonal projection on y"""
    return np.dot(x.T, y) / np.dot(y.T, y)

x1 = np.array([1,1,1,1]).reshape(-1,1)
x2 = np.array([0,1,1,1]).reshape(-1,1)
x3 = np.array([0,0,1,1]).reshape(-1,1)

v1 = x1
v2 = x2 - orthoCoefS(x2, v1)*v1
v3 = x3 - orthoCoefS(x3, v1)*v1 - orthoCoefS(x3, v2)*v2

print(np.hstack([v1, v2, v3]))

[[ 1.         -0.75        0.        ]
 [ 1.          0.25       -0.66666667]
 [ 1.          0.25        0.33333333]
 [ 1.          0.25        0.33333333]]


## 6.4 Similarity transformation

For 2 squre matrices $A$ and $B$ of same dimension, if there exists another reversible matrix $P$ satisfying below criteria, then $A$ and $B$ are said to be **similar**, and the transformation below is called a **similarity transformation**.  
$$A=PBP^{-1}\equiv P^{-1}AP=B$$

How to understand similarity transformation?  
- 3b1b video provides a great explaination, please watch if you haven't yet.
- Simply put, $A$ and $B$ are same transformation under different basis set. $P$ is the transformation to translate matrix from $B$ basis to $A$ basis. $P^{-1}$ is the to translate matrix from $A$ basis back to $B$ basis.
- Since $A$ and $B$ do same thing under different basis, of course they are similar.

If you are still confusing, try to understand with below example:  
- Define:
    - $A$: a linear algebra student who only reads **English**.
    - $B$: another student who only reads **Chinese**.
    - $P$: translate **Chinese to English**.
    - $p^{-1}$: translate **English back to Chinese**.
    - $x$: a linear algebra problem in ***English***
- Can you see why $A=PBP^{-1}$ or $A\cdot x=PBP^{-1}\cdot x$?
    - $A\cdot x$: student $A$ solves problem $x$
    - $PBP^{-1}\cdot x$: student $B$ solves problem $x$, but because $B$ cannot directly read $x$ in English, this process takes 3 steps:
        1. $P^{-1}\cdot x$: translate $x$ from English to Chinese
        2. $BP^{-1}\cdot x$: student $B$ solves translated $x$ in Chinese
        3. $PBP^{-1}\cdot x$: translate $B$'s solution back to English
    - Both $A\cdot x$ and $PBP^{-1}\cdot x$ show same final results: getting problem $x$ solved in English. So $A\cdot x=PBP^{-1}\cdot x$, or $A=PBP^{-1}$.

To test if you have really digested this concept, think about this question:  
- If $A$ and $B$ are similar, then they have same rank and eigenvalues.
- Is above statement correct? Why?

### 6.4.1 Diagonalization

Transform matrix $A$ into a diagonal matrix $D$ via a similarity transformation is called **diagonalization** of $A$:  
$$D=P^{-1}AP$$

Why do we want to find $A$'s diagonalization form?  
1. You know diagonal matrix is very easy to calculate:
    - **Rank**: number of non zero diagonal elements.
    - **Trace**: sum of diagonal elements.
    - **Eigenvalues**: Non zero diagonal elements
    - ... (many others)
2. You also know (or you don't?) similar matrices share same:
    - **Rank**, **trace**, **eigenvalues**, ...
3. $A^k$ is also similar to $D^k$, and exponential power of $D$ is super easy to calculate:  
$$D^k=\begin{bmatrix}
{d_1}^k & 0 & \dots & 0 \\
0 & {d_2}^k & \cdots  & 0 \\
\vdots  & \vdots  & \ddots  & \vdots  \\
0 & 0 & \dots & {d_n}^k
\end{bmatrix}$$
- Now you know why it is a good idea to diagonalize $A$.

Now the question is how to diagonalize a matrix. In Python you can do it easily using `sympy`:

In [12]:
M = sympy.Matrix([[3,-2,4,-2],
                  [5,3,-3,-2],
                  [5,-2,2,-2],
                  [5,-2,-3,3]])
M.diagonalize() # return P, D

(Matrix([
 [0, 1, 1,  0],
 [1, 1, 1, -1],
 [1, 1, 1,  0],
 [1, 1, 0,  1]]),
 Matrix([
 [-2, 0, 0, 0],
 [ 0, 3, 0, 0],
 [ 0, 0, 5, 0],
 [ 0, 0, 0, 5]]))

If you are interested in how to do it mathematically, please read: --> [Link](https://yutsumura.com/how-to-diagonalize-a-matrix-step-by-step-explanation/)