# Lecture 1: The Ultimate Guide to Representing Data (A NumPy Masterclass)

Welcome to the first lecture and notebook for the series on Linear Algebra for Machine Learning! In this session, we'll build the entire foundation for our journey. We'll cover:

1.  **The "Why":** A tour of real-world machine learning problems to see how data is represented.
2.  **The "What":** Formal definitions of the core data containers: Scalars, Vectors, Matrices, and Tensors.
3.  **The "How":** A practical, hands-on masterclass in NumPy to create, inspect, and manipulate these objects.

## Setup

First, let's import NumPy and our visualization libraries.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Set plot style for better visuals
plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline

---

## Part 1: Why Linear Algebra? The Language of Data

Machine learning models can only process numbers. We use linear algebra to represent real-world data in a structured, numerical format.

### The Tour of Data Representation

In [None]:
# Example 1: Tabular Data (A single house as a vector)
house_features = np.array([
    1500,  # Square footage
    3,     # Number of bedrooms
    2,     # Number of bathrooms
    1990   # Year built
])

print(f"House features as a vector:\n{house_features}")
print(f"Vector shape: {house_features.shape}")

In [None]:
# Example 2: A dataset of multiple houses (a matrix)
houses_dataset = np.array([
    [1500, 3, 2, 1990],  # House 1
    [2000, 4, 3, 2000],  # House 2
    [1200, 2, 1, 1975]   # House 3
])

print(f"Houses dataset as a matrix:\n{houses_dataset}")
print(f"\nMatrix shape: {houses_dataset.shape}")

### Visual Example: Images as Tensors
A color image is a 3D tensor, where the dimensions represent height, width, and color channels (Red, Green, Blue).

In [None]:
# Create a simple 10x10 color image (3D Tensor)
color_image = np.zeros((10, 10, 3), dtype=np.uint8)

# Make the top-left quadrant red
color_image[0:5, 0:5] = [255, 0, 0]  # Red

# Make the bottom-right quadrant blue
color_image[5:10, 5:10] = [0, 0, 255] # Blue

print(f"Image Tensor Shape: {color_image.shape}")

plt.imshow(color_image)
plt.title('10x10 Color Image as a 3D Tensor')
plt.show()

---

## Part 2: The NumPy Masterclass

Now we'll dive deep into the essential NumPy operations for creating and manipulating these data structures.

### 2.1 Creating Arrays & Inspecting Attributes

In [None]:
M = np.array([
    [4, 1800, 25],
    [3, 1500, 40]
])

print(f"Matrix:\n{M}")
print(f"Shape (rows, columns): {M.shape}")
print(f"Number of dimensions: {M.ndim}")
print(f"Size (total elements): {M.size}")
print(f"Data type: {M.dtype}")

### 2.2 Array Creation Routines

In [None]:
print("np.zeros:")
print(np.zeros((2, 3)))

print("\nnp.ones:")
print(np.ones(4))

print("\nnp.arange:")
print(np.arange(5, 15, 2))

print("\nnp.linspace:")
print(np.linspace(0, 10, 5))

print("\nnp.random.randint (a 3x3 matrix of integers from 1 to 10):")
print(np.random.randint(1, 11, size=(3, 3)))

### 2.3 The Most Important Skill: Indexing and Slicing

In [None]:
M = np.random.randint(10, 100, size=(5, 5))
print(f"Original 5x5 Matrix:\n{M}")

print(f"\nElement at row 0, col 1: {M[0, 1]}")

print(f"\nSecond row (index 1): {M[1, :]}")

print(f"\nThird column (index 2): {M[:, 2]}")

print(f"\nTop-right 2x2 sub-grid:\n{M[0:2, 3:5]}")

**Boolean Indexing**

In [None]:
print(f"Original Matrix:\n{M}")

# Find all values in the matrix greater than 50
mask = M > 50
print(f"\nBoolean Mask (M > 50):\n{mask}")

print(f"\nValues greater than 50: {M[mask]}")

### 2.4 Reshaping and Broadcasting

In [None]:
# Reshaping
v = np.arange(12)
print(f"Original vector: {v}")
M_reshaped = v.reshape(4, 3)
print(f"\nReshaped 4x3 Matrix:\n{M_reshaped}")

# Broadcasting
curve = np.array([100, 200, 300])
print(f"\nMatrix to be broadcasted upon:\n{M_reshaped}")
print(f"\nVector to broadcast: {curve}")
result = M_reshaped + curve
print(f"\nResult after broadcasting:\n{result}")

---

## Part 3: Exercises

Now it's your turn to practice. Complete the following exercises to solidify your understanding.

### Exercise 1: User Feature Matrix
Create a matrix representing a batch of 5 user feature vectors, where each user is described by `[age, city_id, num_friends, avg_daily_minutes]`.

In [None]:
# Your solution here
user_data = np.array([
    [34, 101, 150, 45.5],
    [22, 102, 250, 90.1],
    [45, 101, 80, 30.2],
    [28, 103, 400, 120.0],
    [31, 102, 180, 60.8]
])

print(user_data)

### Exercise 2: Matrix Slicing
Using the matrix from Exercise 1, select:
1. The data for the first 3 users.
2. Only the `age` (1st column) and `num_friends` (3rd column) for all users.

In [None]:
# Your solution here
first_three_users = user_data[0:3]
print(f"First three users:\n{first_three_users}")

# Note: The second column index is 2
age_and_friends = user_data[:, [0, 2]]
print(f"\nAge and friends data for all users:\n{age_and_friends}")

### Exercise 3: Boolean Indexing
Using the matrix from Exercise 1, find all users who are older than 30 and have more than 100 friends.

In [None]:
# Your solution here
age_mask = user_data[:, 0] > 30
friends_mask = user_data[:, 2] > 100

# Combine masks with the logical AND operator `&`
combined_mask = age_mask & friends_mask

filtered_users = user_data[combined_mask]
print(f"Users older than 30 with >100 friends:\n{filtered_users}")

### Exercise 4: Matrix Normalization (Standardization)
Create a random 5x5 matrix and 'standardize' it. This is a common preprocessing step in ML where you subtract the mean of each *column* and divide by the standard deviation of each *column*.

In [None]:
# Your solution here
data = np.random.rand(5, 5) * 100

# Calculate mean and std dev for each column (axis=0)
col_means = data.mean(axis=0)
col_stds = data.std(axis=0)

# Standardize the data using broadcasting
standardized_data = (data - col_means) / col_stds

print("Original Data:")
print(data.round(2))
print("\nStandardized Data (Column means should be ~0, std dev ~1):")
print(standardized_data.round(2))

### Exercise 5 (Challenge): Tensor Operations
Create a random 3D tensor of shape `(32, 32, 3)` representing a small color image. Use slicing to set the 'blue' channel of a 5x5 patch in the center of the image to pure blue (value 1.0). Visualize the result.

In [None]:
# Your solution here
image = np.random.rand(32, 32, 3)

# Find the center and define the patch boundaries
center_x, center_y = 16, 16
patch_size = 5
start_x = center_x - patch_size // 2
end_x = start_x + patch_size
start_y = center_y - patch_size // 2
end_y = start_y + patch_size

# Set the blue channel (index 2) to 1.0 and others to 0
image[start_y:end_y, start_x:end_x, 0] = 0 # Red
image[start_y:end_y, start_x:end_x, 1] = 0 # Green
image[start_y:end_y, start_x:end_x, 2] = 1 # Blue

plt.imshow(image)
plt.title('Image with Blue Center Patch')
plt.show()

---

## Next Steps

Congratulations on completing the first lecture! You now have a solid foundation in how data is represented numerically and how to manipulate it with NumPy.

If you're comfortable with all the material here, you're ready to move on to **Lecture 2: The Dot Product - The Heart of Machine Learning!**