# Matrix Properties

## Frobenius Norm

To calculate the Frobenius norm of a matrix $\boldsymbol{X}$, we take the square root of the sum of the squares of its elements:

$$\|\boldsymbol{X}\|_F = \sqrt{\sum_{i,\ j}X_{i,\ j}^2}$$

- It measures the "size" of the matrix in terms of Euclidean distance (similar to the $L^2$ norm for vectors)
    - It's the sum of the magnitudes of the elements of all the vectors in the matrix

### Frobenius Norm with Python

In [52]:
import numpy as np
import torch
import tensorflow as tf

Note that with PyTorch and TensorFlow, the `.norm` function requires the data type of the tensor to be a float.

In [53]:
X = np.array([[1,2],[3,4]])
X_pt = torch.tensor(X, dtype=torch.float32)
X_tf = tf.Variable(X, dtype=tf.float32)

In [54]:
np.linalg.norm(X)

5.477225575051661

In [55]:
torch.norm(X_pt)

tensor(5.4772)

In [56]:
tf.norm(X_tf)

<tf.Tensor: shape=(), dtype=float32, numpy=5.477226>

## Matrix Multiplication

Matrix multiplication is a fundamental operation in linear algebra. It involves multiplying two matrices together to produce a new matrix.

To perform a matrix multiplication between $\boldsymbol{A}$ and $\boldsymbol{B}$, the number of *columns* in $\boldsymbol{A}$  must match the number of *rows* in $\boldsymbol{B}$.

The resulting matrix will have the same number of *rows* as $\boldsymbol{A}$ and the same number of columns as $\boldsymbol{B}$.

For example, if we have two matrices $\boldsymbol{A}$ with shape $(m, n)$ and $\boldsymbol{B}$ with shape $(n, p)$, the resulting matrix $\boldsymbol{C}$ will have shape $(m, p)$.

### Matrix Multiplication (Matrix by Vector)

As an example, let's multiply a matrix $\boldsymbol{X}$ with a vector $\boldsymbol{B}$:

$$\boldsymbol{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad \boldsymbol{B} = \begin{bmatrix} 5 \\ 6 \end{bmatrix}$$
$$\boldsymbol{A}\boldsymbol{B} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 \\ 6 \end{bmatrix} = \begin{bmatrix} 1*5 + 2*6 \\ 3*5 + 4*6 \end{bmatrix} = \begin{bmatrix} 17 \\ 39 \end{bmatrix}$$

### Matrix Multiplication (Matrix by Vector) using Python

In [57]:
A = np.array([[3, 4], [5, 6], [7, 8]])
b = np.array([1, 2])
A_pt = torch.tensor([[3, 4], [5, 6], [7, 8]])
b_pt = torch.tensor([1, 2])
A_tf = tf.Variable([[3, 4], [5, 6], [7, 8]])
b_tf = tf.Variable([1, 2])

Even though the `.dot` function is used for dot products, which are between vectors, it can also be used for matrix multiplication in NumPy:

In [58]:
np.dot(A, b)

array([11, 17, 23])

In [59]:
torch.matmul(A_pt, b_pt)

tensor([11, 17, 23])

In [60]:
tf.linalg.matvec(A_tf, b_tf)

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([11, 17, 23])>

### Matrix Multiplication (Matrix by Matrix)

In [61]:
C = np.array([[1, 9], [2, 0]])
C_pt = torch.tensor([[1, 9],[2, 0]])
C_tf = tf.Variable([[1, 9], [2, 0]])

In [62]:
np.dot(A, C)

array([[11, 27],
       [17, 45],
       [23, 63]])

Matrix multiplication is not commutative, meaning that the order of the matrices matters. For example, $\boldsymbol{A}\boldsymbol{C} \neq \boldsymbol{C}\boldsymbol{A}$.

`np.dot(C, A)` will throw an error because the number of columns in $\boldsymbol{C}$ (2) does not match the number of rows in $\boldsymbol{A}$ (3):

`ValueError: shapes (2,2) and (3,2) not aligned: 2 (dim 1) != 3 (dim 0)`

In [63]:
torch.matmul(A_pt, C_pt)

tensor([[11, 27],
        [17, 45],
        [23, 63]])

In [64]:
tf.matmul(A_tf, C_tf)

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[11, 27],
       [17, 45],
       [23, 63]])>

## Symmetric Matrices

A special matrix case with the following properties:
- Square matrix ($m = n$)
- $\boldsymbol{X} = \boldsymbol{X}^T$

In [65]:
X_symm = np.array([[0, 1, 2], [1, 7, 8], [2, 8, 9]])
X_symm

array([[0, 1, 2],
       [1, 7, 8],
       [2, 8, 9]])

In [66]:
X_symm.T

array([[0, 1, 2],
       [1, 7, 8],
       [2, 8, 9]])

### Identity Matrices

A special case of a symmetric matrix:
- Every element along the main diagonal is 1
- All other elements are 0
- Notation: $\boldsymbol{I}_n$ where $n$ is the number of rows and columns
- $n$-length vector $\boldsymbol{v}$ will remain unchanged when multiplied by $\boldsymbol{I}_n$

In [67]:
I = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
I

tensor([[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1]])

In [68]:
x_pt = torch.tensor([25, 2, 5])
x_pt

tensor([25,  2,  5])

In [69]:
torch.matmul(I, x_pt)

tensor([25,  2,  5])

In [70]:
M_q = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
V_q = torch.tensor([[-1, 1, -2], [0, 1, 2]]).T # this is the same as V_q = torch.tensor([[-1, 0], [1, 1], [-2, 2]])

In [71]:
torch.matmul(I, V_q[:,0])

tensor([-1,  1, -2])

In [72]:
torch.matmul(M_q, V_q[:,0])

tensor([ -3,  -9, -15])

In [73]:
torch.matmul(M_q, V_q)

tensor([[ -3,   5],
        [ -9,  14],
        [-15,  23]])

## Matrix Inversion

It's a convenient approach to solve systems of linear equations.

- The matrix inverse of $\boldsymbol{X}$ is denoted by $\boldsymbol{X}^{-1}$
    - Satisfies the equation $\boldsymbol{X}^{-1}\boldsymbol{X} = \boldsymbol{I}_n$

As an example:

$$
\begin{bmatrix}
\begin{array}{l|l}
    y_1 & a + b x_{1,1} + c x_{1,2} + \dots + m x_{1,m} \\
    y_2 & a + b x_{2,1} + c x_{2,2} + \dots + m x_{2,m} \\
    \vdots & \vdots \hspace{2em} \vdots \hspace{3em} \vdots \hspace{5.5em} \vdots\\
    y_n & a + b x_{n,1} + c x_{n,2} + \dots + m x_{n,m}
\end{array}
\end{bmatrix}
$$

Can be solved in matrix form and the formula represented as $\boldsymbol{y} = \boldsymbol{X}\boldsymbol{w}$.

- the outcomes $\boldsymbol{y}$ are known
- the features (predictors) $\boldsymbol{X}$ are known
- the vector or weights $\boldsymbol{w}$ contains the unknowns, the parameters to be estimated

Viewed as a matrix equation:

$$
\begin{equation*}
\begin{bmatrix}
    y_1 \\
    y_2 \\
    \vdots \\
    y_n
\end{bmatrix}
=
\begin{bmatrix}
    1 & x_{1,1} & x_{1,2} & \cdots & x_{1,m} \\
    1 & x_{2,1} & x_{2,2} & \cdots & x_{2,m} \\
    \vdots & \vdots & \vdots & \hspace{2em}  & \vdots \\
    1 & x_{n,1} & x_{n,2} & \cdots & x_{n,m}
\end{bmatrix}
\begin{bmatrix}
    a \\
    b \\
    c \\
    \vdots \\
    m \vphantom{\vdots}  % Aligns "m" vertically with matrix entries
\end{bmatrix}
\end{equation*}
$$

Assuming $\boldsymbol{X}^{-1}$ exists, matrix inversion can solve for the weights $\boldsymbol{w}$:

- $\boldsymbol{X}\boldsymbol{w} = \boldsymbol{y}$
- $\boldsymbol{X}^{-1}\boldsymbol{X}\boldsymbol{w}=\boldsymbol{X}^{-1}\boldsymbol{y}$
- $\boldsymbol{I}_n\boldsymbol{w}=\boldsymbol{X}^{-1}\boldsymbol{y}$
- $\boldsymbol{w}=\boldsymbol{X}^{-1}\boldsymbol{y}$

### Matrix Inversion with Python

In [74]:
X = np.array([[4, 2], [-5, -3]])
X

array([[ 4,  2],
       [-5, -3]])

In [75]:
Xinv = np.linalg.inv(X)
Xinv

array([[ 1.5,  1. ],
       [-2.5, -2. ]])

In [76]:
np.dot(Xinv, X)

array([[1.00000000e+00, 0.00000000e+00],
       [1.77635684e-15, 1.00000000e+00]])

In [77]:
y = np.array([4, -7])
y

array([ 4, -7])

In [78]:
w = np.dot(Xinv, y)
w

array([-1.,  4.])

In [79]:
np.dot(X, w)

array([ 4., -7.])

In [80]:
X_pt = torch.tensor(X, dtype=torch.float32)
Xinv_pt = torch.inverse(X_pt)
Xinv_pt

tensor([[ 1.5000,  1.0000],
        [-2.5000, -2.0000]])

In [81]:
X_tf = tf.Variable(X, dtype=tf.float32)
Xinv_tf = tf.linalg.inv(X_tf)
Xinv_tf

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[ 1.4999998,  0.9999998],
       [-2.4999995, -1.9999996]], dtype=float32)>

Matrix inversion can only be calculated if:

- All the columns are linearly independent
- The matrix is square
    - A square matrix avoids overdetermination: $n_{rows} > n_{col}$, i.e.: $n_{equations} > n_{dimensions}$


To check whether a matrix is linearly independent:

In [82]:
# import numpy as np

def is_lin_independent_np(matrix):
    # Input validation
    if not isinstance(matrix, np.ndarray):
        raise TypeError("Input must be a numpy array")

    if matrix.size == 0:
        return False  # Empty matrix

    # Calculate rank and compare with number of columns
    rank = np.linalg.matrix_rank(matrix)
    return rank == matrix.shape[1]

In [83]:
# import torch

def is_lin_independent_torch(matrix):
    # Input validation
    if isinstance(matrix, torch.Tensor):
        tensor = matrix
    else:
        try:
            tensor = torch.tensor(matrix, dtype=torch.float32)
        except:
            raise TypeError("Input must be convertible to a PyTorch tensor")

    # Handle empty matrix
    if tensor.numel() == 0:
        return False

    # Calculate rank and compare with number of columns
    rank = torch.linalg.matrix_rank(tensor)
    return rank == tensor.shape[1]

In [84]:
# import tensorflow as tf

def is_lin_independent_tf(matrix):
    # Input validation
    if isinstance(matrix, tf.Tensor):
        tensor = matrix
    else:
        try:
            tensor = tf.convert_to_tensor(matrix, dtype=tf.float32)
        except:
            raise TypeError("Input must be convertible to a TensorFlow tensor")

    # Handle empty matrix
    if tf.size(tensor) == 0:
        return False

    # Calculate rank and compare with number of columns
    rank = tf.linalg.matrix_rank(tensor)
    return rank == tensor.shape[1]

In [85]:
X = np.array([[-4, 1], [-8, 2]])
X

array([[-4,  1],
       [-8,  2]])

In [86]:
is_lin_independent_np(X)

False

## Diagonal Matrices

Characteristics:

- Nonzero elements along the main diagonal, all other elements are zero (the identity matrix is a special case of a diagonal matrix)
- If the matrix is square, denoted as $diag(\boldsymbol{x})$ where $\boldsymbol{x}$ is a vector of the main-diagonal elements
- Computationally efficient:
    - Multiplication: $diag(\boldsymbol{x})\boldsymbol{y} = \boldsymbol{x} \odot\boldsymbol{y}$
    - Inversion: $diag(\boldsymbol{x})^{-1} = diag[1/\boldsymbol{x}_1, ..., 1/\boldsymbol{x}_n]^T$
        - The vector $\boldsymbol{x}$ cannot contain zeros
- Can be non-square and still be computationally efficient:
    - if $h > w$, then simply add zeros to the product
    - if $w ? h$, remove elements from the product



## Orthogonal Matrices

In orthogonal matrices, orthonomal vectors:
- Make up the columns of the matrix
- Make up the rows of the matrix

This means that $\boldsymbol{A}^T\boldsymbol{A} = \boldsymbol{A}\boldsymbol{A}^T = \boldsymbol{I}$.

Which also means: $\boldsymbol{A}^T = \boldsymbol{A}^{-1}\boldsymbol{I} = \boldsymbol{A}^{-1}$.

Calculating $\boldsymbol{A}^T$ is computationally cheap.

### Orthogonal Matrices with Python

In [87]:
I = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
I

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Demonstrating that any two columns are orthogonal to each other:

In [88]:
np.dot(I[0], I[2])

0

In [89]:
np.dot(I[1], I[2])

0

In [90]:
np.dot(I[0], I[1])

0

Demonstrating that each column has unit norm:

In [91]:
np.linalg.norm(I[0])

1.0

In [92]:
np.linalg.norm(I[1])

1.0

In [93]:
np.linalg.norm(I[2])

1.0

Demonstrating the orthogonality of the identity matrix:

In [94]:
np.dot(I.T, I)

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Using PyTorch on another matrix:

In [95]:
K = torch.tensor([[2/3, 1/3, 2/3], [-2/3, 2/3, 1/3], [1/3, 2/3, -2/3]])
K

tensor([[ 0.6667,  0.3333,  0.6667],
        [-0.6667,  0.6667,  0.3333],
        [ 0.3333,  0.6667, -0.6667]])

In [96]:
torch.dot(K[0], K[1])

tensor(0.)

In [97]:
torch.dot(K[0], K[2])

tensor(0.)

In [98]:
torch.dot(K[1], K[2])

tensor(0.)

In [99]:
torch.norm(K[0])

tensor(1.)

In [100]:
torch.norm(K[1])

tensor(1.)

In [101]:
torch.norm(K[2])

tensor(1.)

Demonstrating the orthogonality of the matrix $\boldsymbol{K}$:

In [102]:
torch.matmul(K.T, K)

tensor([[1.0000, 0.0000, 0.0000],
        [0.0000, 1.0000, 0.0000],
        [0.0000, 0.0000, 1.0000]])