# Day 02 — Matrices & Dot Product (Prediction Engine)

## Objective
Understand **how a Machine Learning model outputs numbers**.

This notebook builds the prediction engine used across ML:
- Dataset as a **matrix**
- Parameters as **vectors + bias**
- Prediction via **dot product and broadcasting**

This is the mathematical core of linear models and neural networks.


## From Vectors to Matrices

From Day 01:

> One data sample = one vector

When we stack multiple samples together, we form a **matrix**.

If:
- $m$ samples
- $n$ features per sample

Then the dataset is:

$$
X \in \mathbb{R}^{m \times n}
$$

Rows → samples  
Columns → features


In [1]:
import numpy as np


## Matrix Shape Intuition

Consider:
- 3 samples
- 2 features per sample

The dataset matrix:

$$
X =
\begin{bmatrix}
x_{11} & x_{12} \\
x_{21} & x_{22} \\
x_{31} & x_{32}
\end{bmatrix}
$$

Shape of $X$:

$$
(3 \times 2)
$$

**Rule (Non-Negotiable):**
Matrix multiplication is only possible when shapes align.


In [2]:
# Dataset: 3 samples, 2 features
X = [
    [2, 3],
    [4, 5],
    [6, 7]
]

X


[[2, 3], [4, 5], [6, 7]]

## Weights Vector

A model assigns importance to each feature using **weights**.

If there are $n$ features:

$$
w =
\begin{bmatrix}
w_1 \\
w_2 \\
\vdots \\
w_n
\end{bmatrix}
\in \mathbb{R}^{n}
$$

Weights define the **direction** of the model in feature space.


In [3]:
W = [0.5, 1.0]
W


[0.5, 1.0]

## Dot Product — Core Operation

The dot product maps a vector to a scalar.

For input vector $x$ and weights $w$:

$$
x \cdot w = \sum_{i=1}^{n} x_i w_i
$$

This scalar is the **raw prediction** of a linear model.

Every linear model reduces to repeated dot products.


In [4]:
def dot_product(x, w):
    return sum(a * b for a, b in zip(x, w))


dot_product(X[2], W)


10.0

## Prediction for One Sample

For a single input vector $x$:

$$
\hat{y} = x \cdot w
$$

This is a **linear map** from $\mathbb{R}^n \to \mathbb{R}$.

No learning yet — only computation.


In [5]:
predictions = [dot_product(x, W) for x in X]
predictions


[4.0, 7.0, 10.0]

## Why Bias Is Required

The equation:

$$
\hat{y} = Xw
$$

forces the model to pass through the origin $(0,0)$.

This is **not acceptable** for real datasets.

To fix this, we introduce a **bias term**.


## Linear Model with Bias

A real linear model is defined as:

$$
\hat{y} = Xw + b
$$

Where:
- $X \in \mathbb{R}^{m \times n}$ → dataset
- $w \in \mathbb{R}^{n}$ → weights
- $b \in \mathbb{R}$ → bias
- $\hat{y} \in \mathbb{R}^{m}$ → predictions

Geometric intuition:
- $w$ controls **direction**
- $b$ controls **position**


In [6]:
b = 1.5  # bias term

predictions_with_bias = [dot_product(x, W) + b for x in X]
predictions_with_bias


[5.5, 8.5, 11.5]

## Broadcasting Insight

When we write:

$$
\hat{y} = Xw + b
$$

The scalar $b$ is **added to every element** of the vector $Xw$.

This is called **broadcasting**.

You will rely on this behavior heavily in NumPy and deep learning frameworks.


## Matrix–Vector Multiplication

Using matrix notation:

$$
\hat{y} = Xw + b
$$

Where each row of $X$ performs a dot product with $w$.

This computes predictions for **all samples at once**.


In [7]:
X_np = np.array(X)
W_np = np.array(W)

X_np @ W_np + b


array([ 5.5,  8.5, 11.5])

## ML Interpretation

At this stage:
- No loss function
- No gradients
- No updates

This is **forward computation only**.

Learning will modify $w$ and $b$ later using derivatives.


## Why This Matters

This exact computation appears in:
- Linear Regression
- Logistic Regression
- Neural Networks (inside every layer)

Deep learning is repeated:
$$
XW + b
$$

Nothing more.


## Day 02 Summary — Prediction Engine Locked

- Dataset = matrix $X$
- Weights = vector $w$
- Bias = scalar $b$
- Prediction = $Xw + b$

From now on:
> Models output numbers via linear algebra, not magic.


## Stop Condition

You proceed only if you can answer:

1. Why is bias required?  
   → To avoid forcing predictions through the origin

2. What does $Xw$ compute?  
   → Dot product for every sample

3. What is broadcasting?  
   → Adding a scalar to a vector element-wise

If any answer is shaky, repeat Day 02.
