# Lesson 01: Introduction to NumPy - GonKen 2025 Edition

## Learning Objectives
By the end of this lesson, students should be able to:
- Understand the core functionalities of NumPy
- Create and manipulate multidimensional arrays
- Perform mathematical and statistical operations on arrays
- Understand broadcasting and vectorization
- Apply NumPy to solve intermediate to advanced data problems

## Section 1: Introduction to NumPy
What is NumPy and why is it important?

NumPy (Numerical Python) is a fundamental package for scientific computing with Python. It provides powerful N-dimensional array objects, broadcasting functions, and tools for integrating C/C++ and Fortran code. It also supports various mathematical functions for linear algebra, Fourier transforms, and statistical operations.

In [None]:
import numpy as np
# Note: np is a common alias used to reference the NumPy library.

### 2.2 Array Types: `dtype`  
**Concept:** Understand data types (`dtype`) in NumPy.

Every NumPy array has an associated **data type** (abbreviated as `dtype`), which determines:

- What kind of values it can store (e.g., integers, floats, booleans, strings)
- How much memory each element uses
- How operations are applied internally (e.g., integer division vs float division)

In [None]:
import numpy as np

a = np.array([1.0, 2.0, 3.0])
print(a.dtype)   # Output: float64

You can specify the `dtype` explicitly during array creation:
**Why it matters:**

- Operations behave differently depending on `dtype`:  
  e.g., `1 / 2` as `int32` gives `0`, but as `float64` gives `0.5`.

- Choosing appropriate `dtype` improves **performance** and **memory efficiency**, especially for large datasets.

> Use `array.dtype` to inspect, and use the `dtype=` parameter to control how data is stored.

In [None]:
np.array([1, 2, 3], dtype='int32')     # 32-bit integers
np.array([1, 2, 3], dtype='float64')   # 64-bit floating point
np.array([True, False], dtype='bool') # Boolean values

### 3. Array Shapes and Dimensions  
**Concept:** Learn about `.shape`, `.ndim`, and `.size` attributes of NumPy arrays.

NumPy arrays can have **one or more dimensions**, and understanding their structure is essential when performing mathematical operations, broadcasting, reshaping, or feeding data into machine learning models.

In [None]:
import numpy as np

a = np.array([[1, 2], [3, 4]])

Now let’s inspect its structure:
**Attributes explained:**

- `.shape`: A tuple indicating the number of elements along each axis (rows, columns, depth, etc.)  
- `.ndim`: Number of axes (dimensions).  
  - 1D → vector, 2D → matrix, 3D → tensor, etc.  
- `.size`: Total number of elements (product of dimensions).

In [None]:
print("Shape:", a.shape)   # (2, 2) → 2 rows, 2 columns
print("Dimensions:", a.ndim)  # 2 → it's a 2D array
print("Size:", a.size)     # 4 → total number of elements

You can create arrays of various dimensions:

> These attributes are especially useful when debugging shape mismatches or preparing data for machine learning pipelines (e.g., reshaping input features to `(n_samples, n_features)`).

In [None]:
np.array([1, 2, 3])                # 1D → shape: (3,)
np.array([[1, 2], [3, 4]])         # 2D → shape: (2, 2)
np.zeros((2, 3, 4))    

### 4 Array Operations
### 4.1 Arithmetic Operations  
**Concept:** Perform element-wise arithmetic on arrays.

NumPy supports **vectorized operations**, meaning you can apply arithmetic operations directly to arrays **without using loops**. These operations are performed **element-by-element** and are extremely fast due to underlying C optimizations.

In [None]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)  # [5 7 9]
print(a - b)  # [-3 -3 -3]
print(a * b)  # [4 10 18]
print(a / b)  # [0.25 0.4 0.5]

If the arrays are of the same shape, operations are applied to **corresponding elements**.

#### 4.2 Scalar operations also apply:

In [None]:
print(a + 10)  # [11 12 13]
print(a * 2)   # [2 4 6]

#### 4.3 Broadcasting in action:
NumPy can apply operations between arrays of different shapes using **broadcasting** rules:

In the example below, `b` (1D) is broadcast across the rows of `a` (2D), as long as the shapes are **compatible**.

> **Tip:** Vectorized operations using NumPy are not just cleaner—they are **much faster** than Python loops. Always prefer them when working with numeric data.

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])

print(a + b)
# Output:
# [[11 22 33]
#  [14 25 36]]

### Section 5: Indexing and Slicing
### 5.1 One-Dimensional Indexing

**Concept:** Access and manipulate elements of a 1D NumPy array using indexing and slicing.

A **1D array** is similar to a Python list and supports powerful indexing features.

You can access individual elements using **zero-based indexing**:

In [None]:
import numpy as np

a = np.array([10, 20, 30, 40, 50])

print(a[0])   # 10
print(a[3])   # 40

You can also use **negative indexing** to count from the end:

In [None]:
print(a[-1])  # 50
print(a[-2])  # 40

#### Slicing:
Slicing extracts **a range of elements**:

In [None]:
print(a[1:4])    # [20 30 40]
print(a[:3])     # [10 20 30] → start to index 2
print(a[2:])     # [30 40 50] → index 2 to end

#### Step slicing:
Slicing does **not create a copy**—it gives you a **view** of the original data. Modifying a slice will affect the original array unless explicitly copied.

In [None]:
print(a[::2])    # [10 30 50] → every second element

> **Tip:** Indexing and slicing are essential for selecting and transforming data before applying mathematical operations or feeding into models.

### 5.2 Multi-Dimensional Indexing  
**Concept:** Access and manipulate elements in 2D (or higher-dimensional) arrays using row-column indices.

In multi-dimensional arrays, NumPy uses **comma-separated indexing** to specify positions along each axis.

#### Example (2D array):

In [None]:
import numpy as np

b = np.array([[1, 2, 3],
              [4, 5, 6]])

This is a 2×3 array (2 rows, 3 columns).

#### Accessing individual elements:

In [None]:
print(b[0, 0])  # 1  → first row, first column
print(b[1, 2])  # 6  → second row, third column

#### Row and column slicing:

In [None]:
print(b[0])       # [1 2 3]  → entire first row
print(b[:, 1])    # [2 5]    → second column from all rows
print(b[1, :])    # [4 5 6]  → all columns from second row
print(b[0:2, 1:]) # [[2 3] [5 6]] → sub-matrix of bottom-right values

#### Negative indices:
You can use negative values to index from the end:

> Multi-dimensional indexing is powerful for extracting **submatrices**, accessing **features**, and reshaping data for modeling workflows.

In [None]:
print(b[-1, -1])  # 6 → last row, last column

### Section 6: Useful NumPy Functions
### 6.1 Array Initialization

### 6.1 Array Initialization  
**Concept:** Create arrays quickly using built-in NumPy functions.

NumPy provides a variety of functions to **initialize arrays** of specific shapes, sizes, and values. These are essential for creating placeholder data, structured grids, or clean matrices for computation.

#### Common array creation functions:
**1. `np.zeros(shape)`**  
Creates an array filled with zeros.

In [None]:
import numpy as np

np.zeros((2, 3))  
# Output:
# [[0. 0. 0.]
#  [0. 0. 0.]]

**2. `np.ones(shape)`**  
Creates an array filled with ones.

In [None]:
np.ones((3, 2))  
# Output:
# [[1. 1.]
#  [1. 1.]
#  [1. 1.]]

**3. `np.full(shape, value)`**  
Creates an array filled with a specified value.

In [None]:
np.full((2, 2), 7)  
# Output:
# [[7 7]
#  [7 7]]

**4. `np.eye(n)`**  
Creates an identity matrix (1s on the diagonal, 0s elsewhere).

In [None]:
np.eye(3)  
# Output:
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

**5. `np.arange(start, stop, step)`**  
Creates values in a range with a specified step (like Python's `range()`).

In [None]:
np.arange(0, 10, 2)  
# Output: [0 2 4 6 8]

**6. `np.linspace(start, stop, num)`**  
Creates `num` evenly spaced values between `start` and `stop` (inclusive).

> **Tip:** Use these initialization functions when you need a clean slate, such as preparing weights in machine learning, building masks, or scaffolding array-based computations.

In [None]:
np.linspace(0, 1, 5)  
# Output: [0.   0.25 0.5  0.75 1.  ]

### Section 7: Reshaping and Flattening
**Concept:** Transform the structure of arrays using `.reshape()` and `.flatten()`.

In many data science and machine learning tasks, we need to **restructure arrays**—for example, converting a 1D array into 2D format, or flattening a multi-dimensional array into a 1D vector.

NumPy provides two powerful tools for this:

#### 7.1 `reshape(new_shape)`  
Changes the shape of an array without modifying the underlying data.

In [None]:
import numpy as np

a = np.arange(6)         # [0 1 2 3 4 5]
b = a.reshape((2, 3))    # 2 rows, 3 columns

print(b)
# Output:
# [[0 1 2]
#  [3 4 5]]

- The new shape must be **compatible** with the total number of elements.
- You can use `-1` to let NumPy infer one of the dimensions:

In [None]:
a.reshape((3, -1))  # Output: shape (3, 2)

#### 7.2 `flatten()`  
Converts a multi-dimensional array into a **1D array**.
- `.flatten()` returns a **copy** of the array.
- If you want a **view**, use `.ravel()` instead (faster, but linked to original array).

> **Why it matters:**  
Reshaping is crucial for:
- Preparing input features for ML models (e.g. `(samples, features)`)
- Processing image data (e.g. reshaping 3D pixels into vectors)
- Aligning arrays for broadcasting or matrix operations

In [None]:
print(b.flatten())  # Output: [0 1 2 3 4 5]

### Section 8: Broadcasting
**Concept:** Apply operations to arrays of different shapes.

**Broadcasting** is a powerful NumPy feature that allows you to perform arithmetic operations between arrays of **different shapes**—as long as they follow certain compatibility rules.

Instead of manually resizing arrays, NumPy **automatically "stretches" dimensions** where possible to match shapes and perform **element-wise operations** efficiently.

![title](images/broadcasting.png)
<br>

#### 8.1 Example 1: 2D + 1D
Here, `a` gets broadcast to (3, 3), and `b` gets reshaped to match.

In [None]:
import numpy as np

a = np.array([[1], [2], [3]])       # Shape: (3, 1)
b = np.array([4, 5, 6])             # Shape: (3,) → becomes (1, 3)

print(a + b)
# Output:
# [[5 6 7]
#  [6 7 8]
#  [7 8 9]]

#### 8.2 Example 2: Scalar + Array

A scalar is broadcast across the entire array.

#### Broadcasting Rules Summary

1. **Compare shapes right-to-left** (trailing dimensions).  
2. Dimensions must be:
   - Equal, or  
   - One of them is 1  
3. Missing dimensions are padded with 1 from the left.

#### When to use broadcasting:

- Adding a vector to every row of a matrix  
- Scaling rows or columns  
- Efficient data transformations without looping

> **Tip:** Broadcasting makes your code cleaner and faster. However, always verify dimensions to avoid subtle bugs or shape mismatches.

In [None]:
x = np.array([1, 2, 3])
print(x + 10)   # [11 12 13]

### Section 9: Aggregation Functions 
**Concept:** Summarize or reduce array values using built-in aggregation functions.

Aggregation functions in NumPy **reduce multiple values into a single summary statistic**, such as a sum, mean, max, or standard deviation. These are extremely useful for analyzing datasets, computing statistics, or validating model outputs.

#### 9.1 Common Aggregation Functions:

In [None]:

import numpy as np

a = np.array([[1, 2],
              [3, 4]])

print(a.sum())       # Total sum: 10
print(a.mean())      # Average: 2.5
print(a.std())       # Standard deviation
print(a.min())       # Minimum value: 1
print(a.max())       # Maximum value: 4
print(a.prod())      # Product of all elements: 24

#### 9.2 Aggregation Across Axes

Use the `axis` argument to perform aggregations row-wise or column-wise:
- `axis=0`: collapse rows → operation across **columns**  
- `axis=1`: collapse columns → operation across **rows**

In [None]:
print(a.sum(axis=0))  # Sum along columns → [4 6]
print(a.sum(axis=1))  # Sum along rows    → [3 7]

### Real-world examples:

- Sum of pixel values in an image  
- Mean score across students  
- Standard deviation of sensor readings  
- Column-wise normalization of a dataset

> **Tip:** Always specify the `axis` explicitly when working with 2D or higher-dimensional arrays—it helps prevent confusion and bugs.

## Section 10  Check‑Your‑Understanding Questions
1. Create a 1D array with values from 0 to 9.
2. Find the mean and standard deviation of [5, 10, 15].
3. Create a 3x3 matrix with values from 1 to 9.
4. Extract the diagonal elements.
5. Create a matrix and normalize it (subtract mean and divide by std dev).
6. Demonstrate broadcasting on a 2x1 and 1x2 array.

These questions prepare you for a forthcoming set of 20 practice problems (beginner → advanced), which will drill deeper into numpy usecases.