# Chapter 19: NumPy Basics

Master the NumPy library — fast, efficient array-oriented computing and vectorized operations



### What is NumPy? (Slide 3)


<p><strong>NumPy</strong> (Numerical Python) is the foundational package for scientific computing in Python.</p>
<p><strong>Why NumPy?</strong></p>
<ul>
<li><code>ndarray</code> — a fast, memory-efficient multidimensional array object</li>
<li>Vectorized operations — no Python loops needed for math</li>
<li>10x–100x faster than plain Python lists for numerical work</li>
<li>Broadcasting — arithmetic between arrays of different shapes</li>
<li>Linear algebra, Fourier transforms, random number generation</li>
</ul>
<p><strong>Used By:</strong></p>
<ul>
<li>pandas, scikit-learn, TensorFlow, PyTorch, SciPy, matplotlib</li>
<li>Nearly every data science and ML library in Python is built on NumPy</li>
</ul>


> **Note:** Install: pip install numpy


### Creating ndarrays (Slide 4)


In [1]:
import numpy as np

# np.array(data) — Convert list/nested list into ndarray
data = [6, 7.5, 8, 0, 1]
arr = np.array(data)
print(arr)        # [6.  7.5 8.  0.  1. ]
print(arr.dtype)  # float64 (auto-inferred from input)

# 2D array from nested lists
data2d = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2d = np.array(data2d)

# Key attributes:
# .shape    → tuple of dimensions (rows, cols)
# .ndim     → number of dimensions (1D, 2D, 3D...)
# .size     → total number of elements
# .dtype    → data type of elements
# .itemsize → bytes per element
print(arr2d.shape)     # (2, 4)
print(arr2d.ndim)      # 2
print(arr2d.size)      # 8
print(arr2d.dtype)     # int64
print(arr2d.itemsize)  # 8 bytes per element


[6.  7.5 8.  0.  1. ]
float64
(2, 4)
2
8
int64
8


> **Note:** np.array auto-infers the dtype from input data


### Array Creation Functions (Slide 5)


In [2]:
# np.zeros(shape)     → array filled with 0s
# np.ones(shape)      → array filled with 1s
# np.full(shape, val) → array filled with custom value
# np.empty(shape)     → uninitialized (garbage values, fast!)
print(np.zeros(5))          # [0. 0. 0. 0. 0.]
print(np.ones((3, 4)))      # 3x4 matrix of 1s
print(np.full((2, 3), 7))   # 2x3 matrix filled with 7

# np.arange(start, stop, step) → like range() but returns ndarray
# np.linspace(start, stop, num) → evenly spaced between start & stop
print(np.arange(0, 20, 2))      # [0  2  4 ... 18]
print(np.linspace(0, 1, 5))     # [0. 0.25 0.5 0.75 1.]

# np.eye(n) → identity matrix (1s on diagonal)
print(np.eye(3))
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

# np.zeros_like(arr) → zeros with same shape as another array
arr = np.array([[1, 2], [3, 4]])
print(np.zeros_like(arr))  # [[0 0] [0 0]]


[0. 0. 0. 0. 0.]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[7 7 7]
 [7 7 7]]
[ 0  2  4  6  8 10 12 14 16 18]
[0.   0.25 0.5  0.75 1.  ]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[0 0]
 [0 0]]


> **Note:** np.empty is fast but contains uninitialized data — use with caution


### Data Types (dtype) (Slide 6)


In [3]:
# .dtype — check data type of array elements
# .astype(new_type) — cast to different type (always creates a copy!)

arr_f = np.array([1, 2, 3], dtype=np.float64)
arr_i = np.array([1, 2, 3], dtype=np.int32)
print(arr_f.dtype)  # float64
print(arr_i.dtype)  # int32

# Common dtypes: int8, int16, int32, int64
#                float16, float32, float64
#                bool, string_, complex128

# Casting with astype (truncates floats!)
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9])
int_arr = arr.astype(np.int32)
print(int_arr)  # [ 3 -1 -2  0 12] — decimals truncated!

# String to float
numeric_str = np.array(['1.25', '-9.6', '42'])
print(numeric_str.astype(float))  # [ 1.25 -9.6  42. ]

# Memory savings: float32 = half the memory of float64
big = np.arange(1000, dtype=np.float64)    # 8000 bytes
small = np.arange(1000, dtype=np.float32)  # 4000 bytes


float64
int32
[ 3 -1 -2  0 12]
[ 1.25 -9.6  42.  ]


> **Note:** Use smallest dtype needed to save memory on large arrays


### Vectorized Arithmetic (Slide 7)


In [4]:
# Vectorization: operations on entire arrays without for-loops
# All arithmetic operators (+, -, *, /, **) work element-wise
# Scalar values are broadcast to every element automatically
# Comparisons (>, <, ==) return boolean arrays
# Runs in optimized C code — 50-100x faster than Python loops

arr = np.array([[1., 2., 3.], [4., 5., 6.]])

print(arr * arr)    # squares each element
print(arr - arr)    # zeros
print(arr + 10)     # adds 10 to every element (scalar broadcast)
print(1 / arr)      # reciprocal of each element
print(arr ** 0.5)   # square root of each element

# Comparison → returns boolean array
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
print(arr2 > arr)
# [[False  True False]
#  [ True False  True]]

# Speed: Python list loop ~150ms vs NumPy ~2ms for 1M elements!


[[ 1.  4.  9.]
 [16. 25. 36.]]
[[0. 0. 0.]
 [0. 0. 0.]]
[[11. 12. 13.]
 [14. 15. 16.]]
[[1.         0.5        0.33333333]
 [0.25       0.2        0.16666667]]
[[1.         1.41421356 1.73205081]
 [2.         2.23606798 2.44948974]]
[[False  True False]
 [ True False  True]]


> **Note:** Vectorization: batch ops in C, not Python loops = massive speedup


### Broadcasting (Slide 8)


In [5]:
# Broadcasting: arithmetic between different-shaped arrays
# Rules:
#   1) Arrays with fewer dims get 1s prepended to shape
#   2) Size-1 dims are stretched to match the other
#   3) If sizes differ and neither is 1 → Error!

arr = np.arange(12).reshape((4, 3))
# [[ 0  1  2]
#  [ 3  4  5]
#  [ 6  7  8]
#  [ 9 10 11]]

# .mean(axis=1, keepdims=True) — row means as column vector
row_means = arr.mean(axis=1, keepdims=True)
print(arr - row_means)  # Each row centered to 0

# Scalar broadcast — 2 is applied to every element
print(arr * 2)

# Column vector broadcast — added to each column
col = np.array([[10], [20], [30], [40]])
print(arr + col)

# This FAILS: shapes (3,4) and (3,5) can't broadcast
# np.ones((3, 4)) + np.ones((3, 5))  → Error!


[[-1.  0.  1.]
 [-1.  0.  1.]
 [-1.  0.  1.]
 [-1.  0.  1.]]
[[ 0  2  4]
 [ 6  8 10]
 [12 14 16]
 [18 20 22]]
[[10 11 12]
 [23 24 25]
 [36 37 38]
 [49 50 51]]


> **Note:** Broadcasting avoids copying data — very memory efficient


### Basic Indexing & Slicing (Slide 9)


In [6]:
# arr[start:stop] — slice (just like Python lists)
# ⚠️ Slices return a VIEW, not a copy! Modifying it changes original!
# .copy() — explicitly create an independent copy

arr = np.arange(10)  # [0 1 2 3 4 5 6 7 8 9]

arr_slice = arr[5:8]     # VIEW of [5, 6, 7]
arr_slice[1] = 12345     # Modifies original!
print(arr)  # [ 0  1  2  3  4  5 12345  7  8  9]

arr_copy = arr[5:8].copy()  # Safe independent copy

# 2D indexing: arr2d[row, col]
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(arr2d[0])        # [1 2 3] — first row
print(arr2d[0, 2])     # 3 — row 0, col 2
print(arr2d[:2])       # first two rows
print(arr2d[:2, 1:])   # rows 0-1, cols 1 onwards
print(arr2d[1, :2])    # row 1, first 2 cols
print(arr2d[:, :1])    # all rows, first col only


[    0     1     2     3     4     5 12345     7     8     9]
[1 2 3]
3
[[1 2 3]
 [4 5 6]]
[[2 3]
 [5 6]]
[4 5]
[[1]
 [4]
 [7]]


> **Note:** ⚠️ Slices are VIEWS — modifying them changes the original!


### Boolean Indexing (Slide 10)


In [7]:
# arr[condition] — select elements where condition is True
# ~mask          — negate (NOT) a boolean array
# &              — AND two conditions (not Python 'and'!)
# |              — OR two conditions (not Python 'or'!)
# ⚠️ Boolean indexing ALWAYS returns a copy (unlike slicing)

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe'])
data = np.random.randn(6, 4)  # 6 rows, 4 cols random

mask = (names == 'Bob')
print(mask)  # [ True False False  True False False]

print(data[mask])   # Only rows where name is 'Bob'
print(data[~mask])  # Everything EXCEPT Bob

# Combine conditions with & (AND) and | (OR)
mask2 = (names == 'Bob') | (names == 'Will')
print(data[mask2])  # Bob and Will rows

# Set values conditionally — one liner!
data[data < 0] = 0  # Replace all negatives with 0
print(data)


[ True False False  True False False]
[[ 0.81056942 -1.6922134   0.745289   -1.38516254]
 [ 0.2845095   0.27310102 -0.62706153 -1.67771348]]
[[-1.35920304 -0.145703    0.10246742 -0.5801937 ]
 [ 0.77022802 -0.8044325  -0.11531427  1.03836646]
 [ 0.20405152 -0.32222983 -0.29778195 -1.04186345]
 [ 0.19979169  0.63932769 -1.61418748  1.22483611]]
[[ 0.81056942 -1.6922134   0.745289   -1.38516254]
 [ 0.77022802 -0.8044325  -0.11531427  1.03836646]
 [ 0.2845095   0.27310102 -0.62706153 -1.67771348]
 [ 0.20405152 -0.32222983 -0.29778195 -1.04186345]]
[[0.81056942 0.         0.745289   0.        ]
 [0.         0.         0.10246742 0.        ]
 [0.77022802 0.         0.         1.03836646]
 [0.2845095  0.27310102 0.         0.        ]
 [0.20405152 0.         0.         0.        ]
 [0.19979169 0.63932769 0.         1.22483611]]


> **Note:** Use & and | for combining (not Python 'and'/'or')


### Fancy Indexing (Slide 11)


In [8]:
# Fancy indexing: use integer arrays to select specific rows/elements
# arr[[4, 3, 0]]          — select rows 4, 3, 0 in that order
# arr[[-3, -5]]           — negative indices from the end
# arr[[r1,r2], [c1,c2]]   — select elements (r1,c1) and (r2,c2)
# arr[np.ix_(rows, cols)]  — rectangular subarray
# ⚠️ Fancy indexing ALWAYS returns a copy

arr = np.arange(32).reshape((8, 4))

# Select rows in specific order
print(arr[[4, 3, 0, 6]])

# Negative indices (from the end)
print(arr[[-3, -5, -7]])

# Select individual elements: (1,0), (5,3), (7,1)
print(arr[[1, 5, 7], [0, 3, 1]])  # [ 4 23 29]

# np.ix_ for rectangular cross-selection
print(arr[np.ix_([1, 5, 7], [0, 2, 3])])
# [[ 4  6  7]
#  [20 22 23]
#  [28 30 31]]


[[16 17 18 19]
 [12 13 14 15]
 [ 0  1  2  3]
 [24 25 26 27]]
[[20 21 22 23]
 [12 13 14 15]
 [ 4  5  6  7]]
[ 4 23 29]
[[ 4  6  7]
 [20 22 23]
 [28 30 31]]


> **Note:** Fancy indexing always copies — use np.ix_ for cross-indexing


### Transposing & Reshaping (Slide 12)


In [9]:
# .T              — transpose (swap rows/cols). Returns a view
# .reshape(shape)  — change shape without changing data. Use -1 to auto-infer
# .flatten()       — collapse to 1D, always returns a COPY
# .ravel()         — collapse to 1D, returns a VIEW when possible
# .swapaxes(a, b)  — swap two axes. Returns a view

arr = np.arange(15).reshape((3, 5))
print(arr.T)        # (5, 3) transposed

# Matrix dot product using transpose
result = np.dot(arr.T, arr)  # (5,3) @ (3,5) = (5,5)

# Reshape with -1 (auto-infer one dimension)
arr = np.arange(24)
print(arr.reshape((4, 6)))     # 4x6
print(arr.reshape((2, 3, 4)))  # 2x3x4
print(arr.reshape((6, -1)))    # 6x4 (auto-calculated)

# flatten (copy) vs ravel (view)
mat = arr.reshape((4, 6))
print(mat.flatten())  # Always a new copy
print(mat.ravel())    # View if possible (faster)

# Swap axes
arr3d = np.arange(24).reshape((2, 3, 4))
print(arr3d.swapaxes(1, 2).shape)  # (2, 4, 3)


[[ 0  5 10]
 [ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]]
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
(2, 4, 3)


> **Note:** reshape returns a view when possible; flatten always copies


### Universal Functions (ufuncs) (Slide 13)


In [10]:
# ufuncs: fast vectorized wrappers for element-wise operations
# --- Unary (one input) ---
# np.sqrt(arr)   — square root
# np.exp(arr)    — e^x
# np.log(arr)    — natural log
# np.abs(arr)    — absolute value
# np.sign(arr)   — returns -1, 0, or 1

arr = np.array([4, 9, 16, 25, 36])
print(np.sqrt(arr))          # [2. 3. 4. 5. 6.]
print(np.sign([-3, 0, 5]))   # [-1  0  1]

# --- Binary (two inputs) ---
# np.maximum(x, y) — element-wise max
# np.minimum(x, y) — element-wise min
# np.add(x, y)     — element-wise addition
# np.modf(arr)     — returns (fractional, integer) parts

x = np.array([5, 3, 8, 2])
y = np.array([1, 7, 4, 9])
print(np.maximum(x, y))  # [5 7 8 9]
print(np.add(x, y))      # [ 6 10 12 11]

arr = np.array([3.7, -1.2, 5.8])
frac, whole = np.modf(arr)
print(frac)   # [ 0.7 -0.2  0.8]
print(whole)  # [ 3. -1.  5.]


[2. 3. 4. 5. 6.]
[-1  0  1]
[5 7 8 9]
[ 6 10 12 11]
[ 0.7 -0.2  0.8]
[ 3. -1.  5.]


> **Note:** ufuncs run in compiled C — much faster than Python math


### Conditional Logic: np.where (Slide 14)


In [11]:
# np.where(condition, x, y) — vectorized if/else
#   Where condition is True  → use value from x
#   Where condition is False → use value from y
#   Works element-wise on entire arrays
#   Can be nested for multiple conditions

xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

result = np.where(cond, xarr, yarr)
print(result)  # [1.1 2.2 1.3 1.4 2.5]

# Replace negatives with 0, keep positives
arr = np.random.randn(4, 4)
print(np.where(arr > 0, arr, 0))

# Classify values into categories
print(np.where(arr > 0, 'positive', 'negative'))

# Nested conditions (multi-category)
result = np.where(arr > 1, 'high',
         np.where(arr > 0, 'medium', 'low'))


[1.1 2.2 1.3 1.4 2.5]
[[0.         0.         0.81617983 0.        ]
 [0.55780133 0.         0.         0.5588822 ]
 [0.         0.38809293 0.         0.34040072]
 [0.78666888 0.         1.01367536 0.31832388]]
[['negative' 'negative' 'positive' 'negative']
 ['positive' 'negative' 'negative' 'positive']
 ['negative' 'positive' 'negative' 'positive']
 ['positive' 'negative' 'positive' 'positive']]


> **Note:** np.where replaces loop-based if/else logic on arrays


### Mathematical & Statistical Methods (Slide 15)


In [12]:
# Aggregation methods — compute summary from array:
# .sum()     — sum all elements
# .mean()    — arithmetic mean
# .std()     — standard deviation
# .var()     — variance
# .min()     — minimum value
# .max()     — maximum value
# .argmin()  — INDEX of min value
# .argmax()  — INDEX of max value
# .cumsum()  — cumulative sum (running total)
# .cumprod() — cumulative product
# axis=0 → down columns | axis=1 → across rows

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(arr.sum())          # 45
print(arr.mean())         # 5.0
print(arr.sum(axis=0))    # [12 15 18] col sums
print(arr.sum(axis=1))    # [ 6 15 24] row sums
print(arr.cumsum(axis=0)) # running total down cols

# Boolean aggregation
arr2 = np.random.randn(100)
print((arr2 > 0).sum())   # count positives
print((arr2 > 0).any())   # True if any positive
print((arr2 > 0).all())   # True if ALL positive


45
5.0
[12 15 18]
[ 6 15 24]
[[ 1  2  3]
 [ 5  7  9]
 [12 15 18]]
55
True
False


> **Note:** axis=0 → down columns, axis=1 → across rows


### Sorting Arrays (Slide 16)


In [13]:
# np.sort(arr)       — returns a SORTED COPY
# arr.sort()         — sorts IN-PLACE (modifies original)
# np.sort(arr, axis) — sort along axis (0=cols, 1=rows)
# np.argsort(arr)    — returns INDICES that would sort the array

arr = np.array([5, 3, 8, 1, 7, 2])

print(np.sort(arr))  # [1 2 3 5 7 8] — copy, original unchanged

arr.sort()           # Modifies arr directly
print(arr)           # [1 2 3 5 7 8]

# Sort 2D arrays
arr2d = np.array([[5, 3, 1], [8, 2, 7]])
print(np.sort(arr2d, axis=0))  # Sort each column
print(np.sort(arr2d, axis=1))  # Sort each row

# Quantiles via sorting
large_arr = np.random.randn(1000)
large_arr.sort()
print(large_arr[int(0.05 * len(large_arr))])  # 5th percentile
print(large_arr[int(0.95 * len(large_arr))])  # 95th percentile

# argsort — useful for sorting parallel arrays
arr = np.array([30, 10, 40, 20])
print(np.argsort(arr))  # [1 3 0 2] — indices of sorted order


[1 2 3 5 7 8]
[1 2 3 5 7 8]
[[5 2 1]
 [8 3 7]]
[[1 3 5]
 [2 7 8]]
-1.6908596798138043
1.6545250317556373
[1 3 0 2]


> **Note:** argsort returns sort indices — great for sorting parallel arrays


### Unique & Set Operations (Slide 17)


In [14]:
# np.unique(arr)          — sorted unique values (fast set())
# np.isin(arr, vals)      — test each element membership → bool array
# np.intersect1d(x, y)    — elements in BOTH arrays
# np.union1d(x, y)        — all unique elements from both
# np.setdiff1d(x, y)      — elements in x NOT in y
# np.setxor1d(x, y)       — elements in either but NOT both

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe'])
print(np.unique(names))  # ['Bob' 'Joe' 'Will']

# Test membership
values = np.array([6, 0, 0, 3, 2, 5, 6])
print(np.isin(values, [2, 3, 6]))  
# [ True False False  True  True False  True]

# Set operations
x = np.array([1, 2, 3, 4, 5])
y = np.array([3, 4, 5, 6, 7])

print(np.intersect1d(x, y))  # [3 4 5]  — in both
print(np.union1d(x, y))      # [1 2 3 4 5 6 7]
print(np.setdiff1d(x, y))    # [1 2]    — in x not y
print(np.setxor1d(x, y))     # [1 2 6 7] — exclusive


['Bob' 'Joe' 'Will']
[ True False False  True  True False  True]
[3 4 5]
[1 2 3 4 5 6 7]
[1 2]
[1 2 6 7]


> **Note:** All set operations return sorted unique values


### Random Number Generation (Slide 18)


In [15]:
# Modern API (NumPy 1.17+): np.random.default_rng(seed)
# rng.random(size)              — uniform [0, 1)
# rng.standard_normal(shape)    — normal distribution (mean=0, std=1)
# rng.integers(low, high, size) — random ints [low, high)
# rng.uniform(low, high, size)  — uniform in [low, high)
# rng.choice(arr, size)         — random pick from array
# rng.shuffle(arr)              — shuffle in-place
# rng.permutation(n)            — shuffled copy of range(n)

rng = np.random.default_rng(seed=42)  # Reproducible!

print(rng.random(5))                       # [0.77, 0.43, ...]
print(rng.standard_normal((3, 3)))         # 3x3 normal
print(rng.integers(0, 10, size=(2, 4)))    # 2x4 ints
print(rng.uniform(1.0, 5.0, size=5))      # uniform floats
print(rng.choice(['a', 'b', 'c'], size=6)) # random picks

arr = np.arange(10)
rng.shuffle(arr)         # shuffles in-place
print(arr)
print(rng.permutation(10))  # returns shuffled copy


[0.77395605 0.43887844 0.85859792 0.69736803 0.09417735]
[[-1.30217951  0.1278404  -0.31624259]
 [-0.01680116 -0.85304393  0.87939797]
 [ 0.77779194  0.0660307   1.12724121]]
[[5 4 4 2]
 [0 5 8 0]]
[4.31052469 3.5266576  4.03235096 2.41810387 4.8827921 ]
['b' 'c' 'c' 'c' 'c' 'a']
[8 4 6 3 2 0 5 1 9 7]
[5 4 9 6 0 7 8 1 2 3]


> **Note:** Always use default_rng(seed) for reproducible results


### Linear Algebra (Slide 19)


In [16]:
# x @ y  or  np.dot(x, y) — matrix multiplication (not element-wise!)
# np.linalg.inv(mat)       — inverse of square matrix
# np.linalg.det(mat)       — determinant
# np.linalg.eig(mat)       — eigenvalues and eigenvectors
# np.linalg.solve(A, b)    — solve linear equations Ax = b

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])

# Matrix multiply: (2x3) @ (3x2) = (2x2)
print(x @ y)          # Python 3.5+ preferred syntax
print(np.dot(x, y))   # Same result

from numpy.linalg import inv, det, eig, solve

mat = np.array([[1, 2], [3, 4]])
print(inv(mat))     # Inverse
print(det(mat))     # Determinant: -2.0

eigenvalues, eigenvectors = eig(mat)
print(eigenvalues)

# Solve: 3x + y = 9, x + 2y = 8
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
print(solve(A, b))  # [2. 3.] ✓


[[ 28  64]
 [ 67 181]]
[[ 28  64]
 [ 67 181]]
[[-2.   1. ]
 [ 1.5 -0.5]]
-2.0000000000000004
[-0.37228132  5.37228132]
[2. 3.]


> **Note:** @ operator is the preferred way for matrix multiplication


### File I/O with Arrays (Slide 20)


In [17]:
# --- Binary formats (fast, compact) ---
# np.save('file', arr)                  — save as .npy
# np.load('file.npy')                   — load .npy
# np.savez('file', a=arr1, b=arr2)      — multiple arrays → .npz
# np.savez_compressed('file', data=arr) — compressed .npz
#
# --- Text formats (human-readable, slower) ---
# np.savetxt('f.csv', arr, delimiter=',') — save as CSV
# np.loadtxt('f.csv', delimiter=',')      — load CSV

arr = np.arange(10)

# Binary save/load
np.save('my_array', arr)         # my_array.npy
loaded = np.load('my_array.npy')
print(loaded)

# Multiple arrays
np.savez('archive', a=arr, b=arr**2)
arch = np.load('archive.npz')
print(arch['a'])  # Original
print(arch['b'])  # Squared

# Compressed (saves disk space for large arrays)
np.savez_compressed('compressed', data=arr)

# Text I/O (CSV)
np.savetxt('data.csv', arr.reshape(2, 5), delimiter=',')
loaded_txt = np.loadtxt('data.csv', delimiter=',')


[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[ 0  1  4  9 16 25 36 49 64 81]


> **Note:** .npy is binary (fast) — .csv is human-readable (slower)


### Example: Random Walks (Slide 21)


In [18]:
# Demonstrates NumPy's power: simulate 5000 random walks at once!
# rng.choice([-1, 1], size) — generate random steps
# .cumsum()                 — running total → walk path
# (condition).argmax()      — first index where condition is True
# .any(axis=1)              — True if any element in row is True

import numpy as np
rng = np.random.default_rng(seed=42)

# Single walk (1000 steps)
nsteps = 1000
draws = rng.choice([-1, 1], size=nsteps)
walk = draws.cumsum()

print(f"Min position:  {walk.min()}")
print(f"Max position:  {walk.max()}")
print(f"First crossing +10: {(np.abs(walk) >= 10).argmax()}")

# 5000 walks simultaneously! (impossible with loops)
nwalks = 5000
draws = rng.choice([-1, 1], size=(nwalks, nsteps))
walks = draws.cumsum(axis=1)  # cumsum across each row

print(f"Max across all walks: {walks.max()}")
print(f"Min across all walks: {walks.min()}")

hits30 = (np.abs(walks) >= 30).any(axis=1)
print(f"Walks reaching +/-30: {hits30.sum()}")


Min position:  -19
Max position:  23
First crossing +10: 115
Max across all walks: 118
Min across all walks: -128
Walks reaching +/-30: 3390


> **Note:** NumPy simulates 5000 walks simultaneously in milliseconds


### NumPy Best Practices (Slide 22)


<p><strong>✅ Do:</strong></p>
<ul>
<li><strong>Vectorize</strong> — replace Python loops with array operations</li>
<li><strong>Use smallest dtype</strong> — float32 vs float64 halves memory</li>
<li><strong>Preallocate arrays</strong> — <code>np.zeros(n)</code> instead of appending in loops</li>
<li><strong>Leverage broadcasting</strong> — avoid manual reshaping or tiling</li>
<li><strong>Know views vs copies</strong> — slicing = view, fancy/boolean indexing = copy</li>
<li><strong>Profile first</strong> — use <code>%timeit</code> before optimizing</li>
</ul>
<p><strong>❌ Don't:</strong></p>
<ul>
<li>Use Python <code>for</code> loops over array elements</li>
<li>Grow arrays with <code>np.append()</code> in a loop (slow!)</li>
<li>Ignore dtype — unintended float64 wastes memory</li>
<li>Forget that slices are views (mutations propagate!)</li>
</ul>
<p><strong>Memory Rule of Thumb:</strong></p>
<ul>
<li>1 million float64 values ≈ 8 MB</li>
<li>1 million float32 values ≈ 4 MB</li>
<li>1 million int8 values ≈ 1 MB</li>
</ul>
