# Introduction to NumPy

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

## Why NumPy?

### Advantages:
1. **Performance**: NumPy arrays are faster than Python lists (up to 50x)
2. **Memory Efficient**: Uses less memory than lists
3. **Convenience**: Rich functionality for mathematical operations
4. **Foundation**: Core library for pandas, matplotlib, scikit-learn, and more
5. **Vectorization**: Perform operations on entire arrays without loops

### Key Features:
- Multi-dimensional array object (ndarray)
- Broadcasting capabilities
- Linear algebra operations
- Random number generation
- Fourier transforms

## Installation and Import

First, install NumPy if you haven't already:
```bash
pip install numpy
```

In [None]:
# Import NumPy with the standard alias
import numpy as np

# Check NumPy version
print("NumPy version:", np.__version__)

## Creating NumPy Arrays

There are multiple ways to create NumPy arrays:

In [None]:
# 1. From Python lists
list_1d = [1, 2, 3, 4, 5]
array_1d = np.array(list_1d)
print("1D Array:", array_1d)
print("Type:", type(array_1d))

# 2. From nested lists (2D array)
list_2d = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
array_2d = np.array(list_2d)
print("\n2D Array:\n", array_2d)

# 3. Using arange (like range)
array_range = np.arange(0, 10, 2)  # start, stop, step
print("\nArange:", array_range)

# 4. Using linspace (evenly spaced values)
array_linspace = np.linspace(0, 1, 5)  # start, stop, num_points
print("\nLinspace:", array_linspace)

In [None]:
# Special array creation functions

# Array of zeros
zeros = np.zeros((3, 4))  # 3 rows, 4 columns
print("Zeros:\n", zeros)

# Array of ones
ones = np.ones((2, 3))
print("\nOnes:\n", ones)

# Identity matrix
identity = np.eye(4)
print("\nIdentity Matrix:\n", identity)

# Array with a constant value
full = np.full((2, 3), 7)
print("\nFull Array:\n", full)

# Empty array (uninitialized values)
empty = np.empty((2, 2))
print("\nEmpty Array:\n", empty)

## Array Attributes

NumPy arrays have important attributes that describe their properties:

In [None]:
# Create a sample array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

print("Array:\n", arr)
print("\nShape (dimensions):", arr.shape)  # (rows, columns)
print("Number of dimensions:", arr.ndim)
print("Size (total elements):", arr.size)
print("Data type:", arr.dtype)
print("Item size (bytes):", arr.itemsize)
print("Total bytes:", arr.nbytes)

## Data Types in NumPy

NumPy supports various data types for efficient storage:

In [None]:
# Integer types
int_array = np.array([1, 2, 3], dtype=np.int32)
print("Int32 array:", int_array, "- dtype:", int_array.dtype)

# Float types
float_array = np.array([1.5, 2.7, 3.9], dtype=np.float64)
print("Float64 array:", float_array, "- dtype:", float_array.dtype)

# Boolean type
bool_array = np.array([True, False, True], dtype=np.bool_)
print("Boolean array:", bool_array, "- dtype:", bool_array.dtype)

# Converting data types
converted = int_array.astype(np.float64)
print("Converted to float:", converted, "- dtype:", converted.dtype)

## Array Indexing and Slicing

Access elements using indices, similar to Python lists but with more power:

In [None]:
# 1D array indexing
arr_1d = np.array([10, 20, 30, 40, 50])
print("Array:", arr_1d)
print("First element:", arr_1d[0])
print("Last element:", arr_1d[-1])
print("Slice [1:4]:", arr_1d[1:4])
print("Every other element:", arr_1d[::2])

# 2D array indexing
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("\n2D Array:\n", arr_2d)
print("Element at [1, 2]:", arr_2d[1, 2])  # row 1, column 2
print("First row:", arr_2d[0, :])
print("Second column:", arr_2d[:, 1])
print("Subarray:\n", arr_2d[0:2, 1:3])

In [None]:
# Boolean indexing
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print("Array:", arr)

# Create boolean mask
mask = arr > 5
print("Mask (arr > 5):", mask)

# Apply mask
filtered = arr[mask]
print("Filtered array:", filtered)

# Direct boolean indexing
even_numbers = arr[arr % 2 == 0]
print("Even numbers:", even_numbers)

# Multiple conditions
result = arr[(arr > 3) & (arr < 8)]
print("Numbers between 3 and 8:", result)

## Array Operations

NumPy supports vectorized operations (element-wise operations without loops):

In [None]:
# Arithmetic operations
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

print("a:", a)
print("b:", b)
print("\nAddition:", a + b)
print("Subtraction:", a - b)
print("Multiplication:", a * b)
print("Division:", a / b)
print("Power:", a ** 2)
print("Square root:", np.sqrt(a))

# Operations with scalars
print("\na + 10:", a + 10)
print("a * 2:", a * 2)

In [None]:
# Matrix operations
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])

print("Matrix A:\n", matrix_a)
print("\nMatrix B:\n", matrix_b)

# Element-wise multiplication
print("\nElement-wise multiplication:\n", matrix_a * matrix_b)

# Matrix multiplication (dot product)
print("\nMatrix multiplication (dot):\n", np.dot(matrix_a, matrix_b))
# or
print("\nMatrix multiplication (@):\n", matrix_a @ matrix_b)

# Transpose
print("\nTranspose of A:\n", matrix_a.T)

## Statistical Operations

NumPy provides many statistical functions:

In [None]:
data = np.array([12, 15, 18, 21, 24, 27, 30, 33])

print("Data:", data)
print("\nMean:", np.mean(data))
print("Median:", np.median(data))
print("Standard Deviation:", np.std(data))
print("Variance:", np.var(data))
print("Min:", np.min(data))
print("Max:", np.max(data))
print("Sum:", np.sum(data))
print("Cumulative sum:", np.cumsum(data))

# For 2D arrays, can specify axis
data_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("\n2D Data:\n", data_2d)
print("Mean of each column:", np.mean(data_2d, axis=0))
print("Mean of each row:", np.mean(data_2d, axis=1))

## Array Reshaping

Change the shape of arrays while preserving data:

In [None]:
# Original array
arr = np.arange(12)
print("Original array:", arr)
print("Shape:", arr.shape)

# Reshape to 2D
reshaped_2d = arr.reshape(3, 4)
print("\nReshaped to (3, 4):\n", reshaped_2d)

# Reshape to 3D
reshaped_3d = arr.reshape(2, 3, 2)
print("\nReshaped to (2, 3, 2):\n", reshaped_3d)

# Flatten back to 1D
flattened = reshaped_2d.flatten()
print("\nFlattened:", flattened)

# Ravel (returns view if possible)
raveled = reshaped_2d.ravel()
print("Raveled:", raveled)

## Broadcasting

Broadcasting allows NumPy to work with arrays of different shapes:

In [None]:
# Broadcasting with scalar
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Array:\n", arr)
print("\nArray + 10:\n", arr + 10)

# Broadcasting with 1D array
row = np.array([10, 20, 30])
print("\nArray + [10, 20, 30]:\n", arr + row)

# Broadcasting with column
col = np.array([[100], [200]])
print("\nArray + [[100], [200]]:\n", arr + col)

# Practical example: normalize data
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
mean = data.mean(axis=0)
std = data.std(axis=0)
normalized = (data - mean) / std
print("\nOriginal data:\n", data)
print("Normalized data:\n", normalized)

## Concatenation and Stacking

Combine multiple arrays:

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

print("Array A:\n", a)
print("\nArray B:\n", b)

# Vertical stack (vstack)
vstacked = np.vstack((a, b))
print("\nVertical stack:\n", vstacked)

# Horizontal stack (hstack)
hstacked = np.hstack((a, b))
print("\nHorizontal stack:\n", hstacked)

# Concatenate with axis
concat_axis0 = np.concatenate((a, b), axis=0)
print("\nConcatenate axis 0:\n", concat_axis0)

concat_axis1 = np.concatenate((a, b), axis=1)
print("\nConcatenate axis 1:\n", concat_axis1)

## Sorting

Sort arrays in various ways:

In [None]:
# 1D sorting
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
print("Original:", arr)
print("Sorted:", np.sort(arr))
print("Original (unchanged):", arr)

# Sort in place
arr.sort()
print("Sorted in place:", arr)

# 2D sorting
arr_2d = np.array([[3, 1, 4], [9, 2, 6], [5, 8, 7]])
print("\nOriginal 2D:\n", arr_2d)
print("Sort along axis 0 (columns):\n", np.sort(arr_2d, axis=0))
print("Sort along axis 1 (rows):\n", np.sort(arr_2d, axis=1))

# Get indices that would sort the array
arr = np.array([3, 1, 4, 1, 5])
indices = np.argsort(arr)
print("\nOriginal:", arr)
print("Sort indices:", indices)
print("Sorted using indices:", arr[indices])

## Random Number Generation

NumPy provides extensive random number capabilities:

In [None]:
# Set seed for reproducibility
np.random.seed(42)

# Random floats between 0 and 1
rand_floats = np.random.random(5)
print("Random floats:", rand_floats)

# Random integers
rand_ints = np.random.randint(1, 100, size=10)
print("Random integers (1-100):", rand_ints)

# Random from normal distribution
normal = np.random.randn(5)  # mean=0, std=1
print("Normal distribution:", normal)

# Random from custom normal distribution
custom_normal = np.random.normal(loc=50, scale=10, size=5)
print("Custom normal (mean=50, std=10):", custom_normal)

# Random choice from array
choices = np.random.choice([10, 20, 30, 40, 50], size=5)
print("Random choices:", choices)

# Shuffle array
arr = np.arange(10)
np.random.shuffle(arr)
print("Shuffled array:", arr)

## Practical Example: Data Analysis

Let's use NumPy for a real-world data analysis scenario:

In [None]:
# Simulate student test scores (100 students, 5 tests)
np.random.seed(123)
scores = np.random.randint(60, 100, size=(100, 5))

print("Student scores shape:", scores.shape)
print("\nFirst 5 students:\n", scores[:5])

# Calculate statistics
print("\n--- Overall Statistics ---")
print("Mean score:", np.mean(scores))
print("Median score:", np.median(scores))
print("Standard deviation:", np.std(scores))
print("Minimum score:", np.min(scores))
print("Maximum score:", np.max(scores))

# Per-test statistics
print("\n--- Per-Test Statistics ---")
test_means = np.mean(scores, axis=0)
print("Mean per test:", test_means)
print("Hardest test (lowest mean):", np.argmin(test_means) + 1)
print("Easiest test (highest mean):", np.argmax(test_means) + 1)

# Per-student statistics
student_means = np.mean(scores, axis=1)
print("\n--- Per-Student Statistics ---")
print("Best student (highest average):", np.argmax(student_means) + 1)
print("Best student's average:", np.max(student_means))

# Find students who passed all tests (score >= 70)
passing_grade = 70
all_passed = np.all(scores >= passing_grade, axis=1)
num_passed_all = np.sum(all_passed)
print(f"\nStudents who passed all tests (>={passing_grade}):", num_passed_all)

# Count perfect scores (100)
perfect_scores = np.sum(scores == 100)
print(f"Number of perfect scores: {perfect_scores}")

## Performance Comparison: NumPy vs Python Lists

Let's see why NumPy is faster:

In [None]:
import time

# Create large datasets
size = 1000000
python_list = list(range(size))
numpy_array = np.arange(size)

# Test Python list performance
start = time.time()
result_list = [x * 2 for x in python_list]
list_time = time.time() - start

# Test NumPy array performance
start = time.time()
result_numpy = numpy_array * 2
numpy_time = time.time() - start

print(f"Python list time: {list_time:.4f} seconds")
print(f"NumPy array time: {numpy_time:.4f} seconds")
print(f"NumPy is {list_time/numpy_time:.1f}x faster!")

## Common NumPy Functions Reference

### Mathematical Functions:
- `np.abs()` - Absolute value
- `np.sqrt()` - Square root
- `np.exp()` - Exponential
- `np.log()` - Natural logarithm
- `np.sin()`, `np.cos()`, `np.tan()` - Trigonometric functions
- `np.round()` - Round to nearest integer

### Statistical Functions:
- `np.mean()` - Average
- `np.median()` - Median value
- `np.std()` - Standard deviation
- `np.var()` - Variance
- `np.percentile()` - Percentiles
- `np.corrcoef()` - Correlation coefficient

### Aggregate Functions:
- `np.sum()` - Sum of elements
- `np.prod()` - Product of elements
- `np.min()`, `np.max()` - Minimum and maximum
- `np.argmin()`, `np.argmax()` - Index of min/max
- `np.cumsum()` - Cumulative sum

## Exercises

Try these exercises to practice NumPy:

1. Create a 5x5 matrix with values from 1 to 25
2. Extract all odd numbers from the matrix
3. Replace all even numbers with -1
4. Calculate the mean of each row
5. Create a function that normalizes a 2D array (subtract mean, divide by std)
6. Generate 1000 random numbers from a normal distribution and calculate the percentage within 1 standard deviation
7. Create two 3x3 matrices and perform matrix multiplication
8. Find the correlation between two random arrays of 100 elements each

Good luck!

## Summary

NumPy is essential for:
- Fast numerical computations
- Working with large datasets
- Scientific and statistical analysis
- Foundation for data science libraries

**Key Takeaways:**
- NumPy arrays are faster and more memory efficient than Python lists
- Use vectorized operations instead of loops
- Broadcasting allows operations on arrays of different shapes
- Rich set of mathematical and statistical functions
- Foundation for pandas, matplotlib, scikit-learn, and more

**Next Steps:**
- Practice with real datasets
- Learn pandas (built on NumPy)
- Explore matplotlib for visualization
- Study linear algebra with NumPy