[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wasim/Data-Science/blob/main/data-analyst-roadmap/02_numpy/02_array_operations.ipynb)

# NumPy Array Operations - Mathematics, Broadcasting & Vectorization

## üìö Learning Objectives
- Perform element-wise and matrix operations
- Understand and apply broadcasting rules
- Leverage vectorization for performance
- Work with universal functions (ufuncs)
- Apply aggregation functions

---

In [None]:
import numpy as np
import time

np.random.seed(42)

## 1. Element-wise Operations

NumPy performs operations element-by-element by default.

In [None]:
# Create sample arrays
a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])

print("Array a:", a)
print("Array b:", b)

# Arithmetic operations
print("\n--- Arithmetic Operations ---")
print("Addition (a + b):", a + b)
print("Subtraction (a - b):", a - b)
print("Multiplication (a * b):", a * b)
print("Division (b / a):", b / a)
print("Floor Division (b // a):", b // a)
print("Modulo (b % a):", b % a)
print("Power (a ** 2):", a ** 2)

In [None]:
# Operations with scalars
arr = np.array([1, 2, 3, 4, 5])

print("Original:", arr)
print("Add 10:", arr + 10)
print("Multiply by 3:", arr * 3)
print("Square:", arr ** 2)
print("Divide by 2:", arr / 2)

## 2. Universal Functions (ufuncs)

Fast element-wise operations on arrays.

In [None]:
# Mathematical functions
arr = np.array([1, 4, 9, 16, 25])

print("Array:", arr)
print("Square root:", np.sqrt(arr))
print("Exponential:", np.exp(arr[:3]))  # First 3 elements
print("Logarithm (natural):", np.log(arr))
print("Log base 10:", np.log10(arr))
print("Absolute value:", np.abs([-1, -2, 3, -4]))

In [None]:
# Trigonometric functions
angles = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])

print("Angles (radians):", angles)
print("Sine:", np.sin(angles))
print("Cosine:", np.cos(angles))
print("Tangent:", np.tan(angles[:4]))  # Exclude pi/2

# Convert degrees to radians
degrees = np.array([0, 30, 45, 60, 90])
radians = np.deg2rad(degrees)
print("\nDegrees to radians:", radians)

In [None]:
# Rounding functions
arr = np.array([1.23, 2.67, 3.45, 4.89, 5.12])

print("Original:", arr)
print("Round:", np.round(arr))
print("Floor:", np.floor(arr))
print("Ceiling:", np.ceil(arr))
print("Truncate:", np.trunc(arr))
print("Round to 1 decimal:", np.round(arr, 1))

## 3. Aggregation Functions

Reduce arrays to single values or along specific axes.

In [None]:
# 1D aggregations
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

print("Array:", arr)
print("\n--- Aggregations ---")
print(f"Sum: {np.sum(arr)}")
print(f"Mean: {np.mean(arr)}")
print(f"Median: {np.median(arr)}")
print(f"Standard deviation: {np.std(arr):.2f}")
print(f"Variance: {np.var(arr):.2f}")
print(f"Min: {np.min(arr)}")
print(f"Max: {np.max(arr)}")
print(f"Range: {np.ptp(arr)}")  # Peak to peak

In [None]:
# 2D aggregations with axis parameter
arr_2d = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]])

print("2D Array:\n", arr_2d)
print("\n--- Aggregations ---")
print("Overall sum:", np.sum(arr_2d))
print("Sum along axis 0 (columns):", np.sum(arr_2d, axis=0))
print("Sum along axis 1 (rows):", np.sum(arr_2d, axis=1))

print("\nMean along columns:", np.mean(arr_2d, axis=0))
print("Mean along rows:", np.mean(arr_2d, axis=1))

print("\nMax along columns:", np.max(arr_2d, axis=0))
print("Min along rows:", np.min(arr_2d, axis=1))

In [None]:
# Cumulative operations
arr = np.array([1, 2, 3, 4, 5])

print("Array:", arr)
print("Cumulative sum:", np.cumsum(arr))
print("Cumulative product:", np.cumprod(arr))

# 2D cumulative
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Array:\n", arr_2d)
print("Cumsum along axis 0:\n", np.cumsum(arr_2d, axis=0))
print("Cumsum along axis 1:\n", np.cumsum(arr_2d, axis=1))

## 4. Broadcasting

NumPy's powerful feature to perform operations on arrays of different shapes.

### Broadcasting Rules:
1. If arrays have different dimensions, pad the smaller shape with ones on the left
2. Arrays are compatible if dimensions are equal or one of them is 1
3. Arrays are broadcast together to the larger size

In [None]:
# Example 1: Scalar and array
arr = np.array([1, 2, 3, 4])
scalar = 10

print("Array:", arr)
print("Scalar:", scalar)
print("Result:", arr + scalar)  # Scalar is broadcast to [10, 10, 10, 10]

In [None]:
# Example 2: 1D and 2D arrays
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
arr_1d = np.array([10, 20, 30])

print("2D Array:\n", arr_2d)
print("\n1D Array:", arr_1d)
print("\nBroadcast addition:\n", arr_2d + arr_1d)
# 1D array is broadcast to each row

In [None]:
# Example 3: Column vector and row vector
col_vector = np.array([[1], [2], [3]])  # 3x1
row_vector = np.array([10, 20, 30])     # 1x3

print("Column vector (3x1):\n", col_vector)
print("\nRow vector (1x3):", row_vector)
print("\nBroadcast multiplication (3x3):\n", col_vector * row_vector)

In [None]:
# Practical example: Normalize each column
data = np.random.randint(0, 100, size=(4, 3))
print("Original data:\n", data)

# Calculate mean for each column
col_means = data.mean(axis=0)
print("\nColumn means:", col_means)

# Subtract mean from each column (broadcasting)
centered = data - col_means
print("\nCentered data:\n", centered)
print("\nNew column means:", centered.mean(axis=0))  # Should be ~0

## 5. Vectorization - Speed Comparison

Vectorization is the process of replacing explicit loops with array operations.

In [None]:
# Create large arrays
size = 1_000_000
a = np.random.rand(size)
b = np.random.rand(size)

# Method 1: Python loop (SLOW)
start = time.time()
result_loop = []
for i in range(len(a)):
    result_loop.append(a[i] + b[i])
loop_time = time.time() - start

# Method 2: NumPy vectorization (FAST)
start = time.time()
result_vector = a + b
vector_time = time.time() - start

print(f"Loop time: {loop_time:.4f} seconds")
print(f"Vectorized time: {vector_time:.4f} seconds")
print(f"Speedup: {loop_time/vector_time:.1f}x faster!")

In [None]:
# Vectorization example: Calculate distance from origin
# Formula: sqrt(x^2 + y^2)

x = np.random.rand(1000)
y = np.random.rand(1000)

# Vectorized approach
distances = np.sqrt(x**2 + y**2)

print("First 10 distances:", distances[:10])
print(f"Mean distance: {distances.mean():.3f}")
print(f"Max distance: {distances.max():.3f}")

## 6. Matrix Operations

In [None]:
# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix A:\n", A)
print("\nMatrix B:\n", B)

# Element-wise multiplication
print("\nElement-wise (A * B):\n", A * B)

# Matrix multiplication (dot product)
print("\nMatrix multiplication (A @ B):\n", A @ B)
# Alternative: np.dot(A, B) or np.matmul(A, B)

In [None]:
# Matrix operations
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

print("Matrix:\n", matrix)
print("\nTranspose:\n", matrix.T)
print("\nDiagonal:", np.diag(matrix))
print("\nTrace (sum of diagonal):", np.trace(matrix))

In [None]:
# Linear algebra operations
A = np.array([[4, 2], [3, 1]])

print("Matrix A:\n", A)
print("\nDeterminant:", np.linalg.det(A))

# Inverse (if determinant != 0)
A_inv = np.linalg.inv(A)
print("\nInverse:\n", A_inv)

# Verify: A @ A_inv should be identity matrix
print("\nA @ A_inv (should be identity):\n", np.round(A @ A_inv))

## 7. Comparison Operations

In [None]:
# Element-wise comparisons
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])

print("Array a:", a)
print("Array b:", b)
print("\na == b:", a == b)
print("a > b:", a > b)
print("a >= 3:", a >= 3)

# Logical operations
print("\n(a > 2) & (a < 5):", (a > 2) & (a < 5))  # AND
print("(a < 2) | (a > 4):", (a < 2) | (a > 4))  # OR
print("~(a > 3):", ~(a > 3))  # NOT

In [None]:
# Comparison functions
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 2, 3, 1, 6])

print("Array a:", a)
print("Array b:", b)
print("\nElement-wise maximum:", np.maximum(a, b))
print("Element-wise minimum:", np.minimum(a, b))

# Clipping values
arr = np.array([1, 5, 10, 15, 20])
clipped = np.clip(arr, 5, 15)  # Clip between 5 and 15
print("\nOriginal:", arr)
print("Clipped [5, 15]:", clipped)

## 8. Practical Examples

In [None]:
# Example 1: Calculate compound interest
# Formula: A = P(1 + r)^t

principal = 1000  # Initial amount
rate = 0.05       # 5% annual interest
years = np.arange(1, 11)  # 1 to 10 years

amount = principal * (1 + rate) ** years

print("Year\tAmount")
for year, amt in zip(years, amount):
    print(f"{year}\t${amt:.2f}")

In [None]:
# Example 2: Temperature conversion (Celsius to Fahrenheit)
# Formula: F = (C * 9/5) + 32

celsius = np.array([0, 10, 20, 30, 37, 100])
fahrenheit = (celsius * 9/5) + 32

print("Celsius\tFahrenheit")
for c, f in zip(celsius, fahrenheit):
    print(f"{c}¬∞C\t{f}¬∞F")

In [None]:
# Example 3: Grade calculation with weights
# Assignments: 30%, Midterm: 30%, Final: 40%

students = ['Alice', 'Bob', 'Charlie', 'Diana']
assignments = np.array([85, 90, 78, 92])
midterm = np.array([88, 85, 82, 95])
final = np.array([90, 88, 85, 93])

# Calculate weighted average
weights = np.array([0.30, 0.30, 0.40])
grades = np.array([assignments, midterm, final])
final_grades = np.dot(weights, grades)

print("Student\t\tFinal Grade")
for student, grade in zip(students, final_grades):
    print(f"{student}\t\t{grade:.1f}")

## üéØ Key Takeaways

1. **Element-wise operations** are the default in NumPy
2. **Universal functions (ufuncs)** provide fast element-wise operations
3. **Broadcasting** allows operations on arrays of different shapes
4. **Vectorization** is much faster than Python loops
5. Use **aggregation functions** with `axis` parameter for multi-dimensional arrays
6. **Matrix operations** use `@` or `np.dot()` for multiplication

## üìù Practice Exercises

1. Create a 10x10 matrix and normalize each row to have mean=0 and std=1
2. Calculate the Euclidean distance between two points using vectorization
3. Implement a function to calculate moving average using NumPy
4. Create a temperature dataset and convert between Celsius and Kelvin

In [None]:
# Your practice code here
