# Understanding Matrix Multiplication with Einstein Summation

This notebook explains how the Einstein summation convention (often called `einsum`) simplifies the multiplication of non-square matrices and makes the dimensions of the resulting matrix intuitive.

**The Core Problem:** When multiplying two matrices, say **A** and **B**, the number of columns in the first matrix (**A**) must equal the number of rows in the second matrix (**B**). The resulting matrix, **C**, will then have the same number of rows as **A** and the same number of columns as **B**.

Let's formalize this:
- If matrix **A** has dimensions `(m, n)` (m rows, n columns).
- And matrix **B** has dimensions `(n, p)` (n rows, p columns).
- The resulting matrix **C** will have dimensions `(m, p)`.

Keeping track of these indices can sometimes be confusing. Einstein summation provides a clear and concise way to represent this operation.

## 1. Standard Matrix Multiplication (The Loop Method)

The standard formula for an element at row `i` and column `j` of the resulting matrix **C** is:

$$ C_{ij} = \sum_{k=0}^{n-1} A_{ik} B_{kj} $$

This formula tells us to:
1. Take the `i`-th row of matrix A.
2. Take the `j`-th column of matrix B.
3. Multiply them element-wise and sum the results.

Let's see this with a Python example.

In [5]:
import numpy as np

# Let's define two non-square matrices
# A has shape (3, 2) -> 3 rows, 2 columns
A = np.array([
    [1, 2],
    [3, 4],
    [5, 6]
])

# B has shape (2, 4) -> 2 rows, 4 columns
B = np.array([
    [7, 8, 9, 10],
    [11, 12, 13, 14]
])

# The inner dimensions match (A's columns = 2, B's rows = 2).
# The resulting matrix C should have shape (3, 4).

print(f"Shape of A: {A.shape}")
print(f"Shape of B: {B.shape}")

# Using numpy's built-in matrix multiplication operator `@`
C_numpy = A @ B

print("\nResult using NumPy's '@' operator:")
print(C_numpy)
print(f"Shape of resulting matrix C: {C_numpy.shape}")

Shape of A: (3, 2)
Shape of B: (2, 4)

Result using NumPy's '@' operator:
[[ 29  32  35  38]
 [ 65  72  79  86]
 [101 112 123 134]]
Shape of resulting matrix C: (3, 4)


## 2. Introducing Einstein Summation Convention

The Einstein summation convention is a notation that simplifies these kinds of operations. The core rules are:

1.  **Repeated indices are summed over.** If an index appears on both the input side, it's the index we sum across (like `k` in the formula above).
2.  **Indices that are not repeated define the output.** The indices that appear only once on the input side will be the indices of the resulting matrix.

The notation is expressed as a string: `"input_indices -> output_indices"`.

For our matrix multiplication, we can describe the matrices as:
- Matrix A has dimensions represented by indices `i` and `j` -> `ij`.
- Matrix B has dimensions represented by indices `j` and `k` -> `jk`.

The multiplication is therefore written as `ij,jk`.

- The index `j` is **repeated**, so it's the one we sum over. This corresponds to the inner dimension that must match.
- The indices `i` and `k` are **not repeated**. These will form the dimensions of our output matrix.

So, the complete `einsum` string is: **`ij,jk->ik`**

This simple string tells us everything:
- We are multiplying a matrix `(i, j)` with a matrix `(j, k)`.
- The `j` dimension is summed over.
- The resulting matrix will have dimensions `(i, k)`.

### Applying `einsum` to our Example

- Matrix **A** has shape `(3, 2)`. We can assign `i=3` and `j=2`.
- Matrix **B** has shape `(2, 4)`. We can assign `j=2` and `k=4`.

The convention `ij,jk->ik` perfectly describes our operation:

- **Input:** An `(i, j)` matrix and a `(j, k)` matrix.
- **Condition:** The `j` dimension (value = 2) matches.
- **Output:** An `(i, k)` matrix, which corresponds to a shape of `(3, 4)`.

This directly proves the concept: **The resulting matrix gets its rows from the first matrix (`i`) and its columns from the second matrix (`k`).**

Let's verify this with `numpy.einsum`.

In [6]:
# Using the same matrices A and B

# Perform the multiplication using einsum
C_einsum = np.einsum('ij,jk->ik', A, B)

print("Result using np.einsum('ij,jk->ik'):")
print(C_einsum)
print(f"Shape of resulting matrix C: {C_einsum.shape}")

# We can verify that this is identical to the standard method
print("\nAre the results from '@' and 'einsum' the same?")
print(np.array_equal(C_numpy, C_einsum))

Result using np.einsum('ij,jk->ik'):
[[ 29  32  35  38]
 [ 65  72  79  86]
 [101 112 123 134]]
Shape of resulting matrix C: (3, 4)

Are the results from '@' and 'einsum' the same?
True


## 3. Conclusion

The Einstein summation convention offers a powerful and intuitive way to think about matrix and tensor operations.

For the multiplication of two matrices `A` and `B`:

1.  **Identify the indices:** Let `A` be `ij` and `B` be `jk`.
2.  **Find the common index:** The index `j` appears in both. This represents the inner dimension that must match and will be summed over during the multiplication.
3.  **Determine the output indices:** The remaining indices, `i` and `k`, define the shape of the resulting matrix.

The expression `ij,jk->ik` is a complete and unambiguous description of the operation. It elegantly confirms that the output matrix's shape is `(rows of first matrix, columns of second matrix)`.