# Lecture 2: Matrix Calculus & Probability Practice

### 1. Linear Algebra: Projections and Eigen-Analysis

Projection can be define as finding the nearest point on a subspace to a given vector.

Implementation: The Projection Matrix

The projection of a vector $v$ onto the subspace spanned by the columns of matrix $X$ is given by the matrix $P=X(X^TX)^{−1}X^T$.
 

In [7]:
import numpy as np

# Define a subspace (columns of X) and a vector v to project
X = np.array([[5]]) # A 2D plane in 3D space
v = np.array([5-7])

# Calculate the Projection Matrix: P = X @ inv(XT @ X) @ XT
XTX_inv = np.linalg.inv(X.T @ X)
P = X @ XTX_inv @ X.T

# The projected vector u
u = P @ v
print(f"Projected vector u: {u}")

Projected vector u: [-2.]


Eigenvalues and Volume

The determinant of a matrix can be interpreted as the ratio of the volume of the output shape to the volume of the input shape. If a matrix is not full rank, it squashes the volume to zero, meaning at least one eigenvalue is zero.


In [8]:
# Create a matrix A
A = np.array([[5, 7], [5, 6]])

# Calculate eigenvalues and determinant
eigenvalues = np.linalg.eigvals(A)
det_A = np.linalg.det(A)

print(f"Eigenvalues: {eigenvalues}")
print(f"Product of Eigenvalues: {np.prod(eigenvalues)}")
print(f"Determinant: {det_A}") # Should match the product

Eigenvalues: [-0.43717104 11.43717104]
Product of Eigenvalues: -4.999999999999997
Determinant: -4.999999999999999


### 2. Matrix Calculus and Convexity

Lecture 2 introduces the gradient as the direction of steepest ascent and the Hessian as a way to characterize the "shape" of a function.

Quadratic Forms and Definiteness

A symmetric matrix A is positive definite (PD) if the quadratic form $x^T$ $Ax>0$ for all non-zero $x$. This corresponds to a bowl-shaped (convex) function.

In [9]:
# Function to check definiteness via eigenvalues
def check_definiteness(matrix):
    vals = np.linalg.eigvals(matrix)
    if np.all(vals > 0): return "Positive Definite (Convex)"
    if np.all(vals >= 0): return "Positive Semi-Definite"
    if np.all(vals < 0): return "Negative Definite"
    return "Indefinite (Saddle Points)"

# Example: A convex loss function surface
A_convex = np.array([[6]])
print(f"Matrix A is: {check_definiteness(A_convex)}")

Matrix A is: Positive Definite (Convex)


Common Gradient Identities

The sources provide several key identities used in machine learning:

• $∇_x(b^Tx)=b$

• $∇_x(x^TAx)=2Ax$ (for symmetric $A$)

• $∇_Alog∣A∣=A^{−1}$

### 3. Probability: Monte Carlo Estimation

The lecture concludes with Expected Value and the Law of Large Numbers. The Monte Carlo estimate allows us to approximate an expectation by averaging random samples.

Implementation: Law of Large Numbers

To find $E[g(x)]$, we sample $x_i$ from a density $p(x)$ and compute $1/n∑g(x_i)$.

In [10]:
# Let g(x) = x^2 and x be sampled from a standard normal distribution
def g(x):
    return x**2

samples = np.random.normal(0, 1, 10000)
monte_carlo_estimate = np.mean(g(samples))

print(f"Monte Carlo Estimate of E[x^2]: {monte_carlo_estimate}")
# For N(0,1), E[x^2] is the variance + mean^2 = 1 + 0 = 1.

Monte Carlo Estimate of E[x^2]: 1.018694184596239


Summary Analogy

To visualize these concepts, imagine you are shining a flashlight (the matrix/operator) at a soccer ball (unit sphere). The shadow cast on the wall is an ellipsoid. The eigenvectors are the directions of the ellipsoid's axes, and the eigenvalues tell you how much the ball was stretched in those directions. If the shadow becomes a flat line, your matrix has "squashed" a dimension, its determinant is zero, and you have lost the ability to "undo" the transformation (no inverse).