# 1. What is NumPy and Why It’s Better Than Lists

NumPy (Numerical Python) is a Python library used to work with large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them

| Feature          | Python List         | NumPy Array                        |
| ---------------- | ------------------- | ---------------------------------- |
| Speed            | Slow for large data | Very fast due to C backend         |
| Memory Usage     | Inefficient         | Efficient, fixed-type data storage |
| Operations       | Manual looping      | Vectorized operations              |
| Type Consistency | Mixed types allowed | Homogeneous (better performance)   |


In [4]:
import numpy as np
import time

# List operation
lst = list(range(1000000))
start = time.time()
lst = [x * 2 for x in lst]
list_time=time.time() - start
print("List time:", time.time() - start)

# NumPy array operation
arr = np.arange(1000000)
start = time.time()
arr = arr * 2
arr_time=time.time() - start
print("NumPy time:", time.time() - start)
eff=list_time//arr_time
print("numpy array is faster than python list by : ", eff)

List time: 0.04579973220825195
NumPy time: 0.0023746490478515625
numpy array is faster than python list by :  19.0


In [20]:
import numpy as np
import time
mark_list = [100, 90, 80, 70, 60, 50, 40, 50, 100, 50]
start=time.time()
# Mean using regular Python list
total = 0
for i in range(len(mark_list)):
    total += mark_list[i]

print("Mean (List):", total // len(mark_list))  # use '/' for float result
list_time=time.time() - start
print("List time:", time.time() - start)

# Mean using NumPy
mark_np = np.array([100, 90, 80, 70, 60, 50, 40, 50, 100, 50])
start=time.time()

print("Mean (Numpy):", mark_np.mean())
arr_time=time.time() - start
print("List time:", time.time() - start)
eff=list_time//arr_time

print("numpy array is faster than python list by : ", eff)

Mean (List): 69
List time: 0.0002703666687011719
Mean (Numpy): 69.0
List time: 0.00019550323486328125
numpy array is faster than python list by :  1.0


In [22]:
import numpy as np
import time
mark_list = list(range(1000000))
start=time.time()
# Mean using regular Python list
total = 0
for i in range(len(mark_list)):
    total += mark_list[i]

print("Mean (List):", total // len(mark_list))  # use '/' for float result
list_time=time.time() - start
print("List time:", time.time() - start)

# Mean using NumPy
mark_np =  np.arange(1000000)
start=time.time()

print("Mean (Numpy):", mark_np.mean())
arr_time=time.time() - start
print("List time:", time.time() - start)
eff=list_time//arr_time

print("numpy array is faster than python list by : ", eff)

Mean (List): 499999
List time: 0.10429954528808594
Mean (Numpy): 499999.5
List time: 0.0009922981262207031
numpy array is faster than python list by :  107.0


 # 🔰 2. Types of Arrays (1D, 2D, 3D, Matrix) 

A NumPy array is a grid of values, all of the same data type, and is indexed by a tuple of nonnegative integers.

It’s like a powerful version of a Python list, but optimized for math and performance.

| Feature         | Python List  | NumPy Array           |
| --------------- | ------------ | --------------------- |
| Speed           | Slow         | Fast (C-backed)       |
| Memory          | More         | Less (tightly packed) |
| Data Types      | Mixed        | Same type only        |
| Broadcasting    | ❌ No         | ✅ Yes                 |
| Math Operations | Manual loops | Element-wise magic ✨  |



In [23]:
chess = np.zeros((8,8), dtype=int)
chess[1::2, ::2] = 1
chess[::2, 1::2] = 1
print(chess)


[[0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]]


In [24]:
#1-d Array
arr1 = np.array([10, 20, 30])
print(arr1)
print(arr1.shape)


[10 20 30]
(3,)


In [25]:
#2-D Array 
arr2 = np.array([[1, 2, 3],
                 [4, 5, 6]])
print(arr2)
print(arr2.shape)


[[1 2 3]
 [4 5 6]]
(2, 3)


In [26]:
#3-D Array 
arr3 = np.array([
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]]
])
print(arr3)
print(arr3.shape)


[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
(2, 2, 2)


🧮 Special Types of Arrays in NumPy


In [39]:
#✅ 1. np.zeros() → All Zeros
zeros = np.zeros((2, 3))
print(zeros)
print()
#✅ 2. np.ones() → All Ones
ones=np.ones((2,3))
print(ones)
print()

#✅ 3. np.eye() → Identity Matrix (2D)
identity = np.eye(4)
print(identity)
print()

#✅ 4. np.identity() → Same as eye() (only for square matrices)
I = np.identity(3)
print(I)
print()

#✅ 7. np.full(shape, fill_value)
filled = np.full((2, 3), 99)
print(filled)
print()
#✅ 8. np.random.rand() → Random Float Array
randoms = np.random.rand(2, 3)
print(randoms)
print()
#✅ 9. np.random.randint() → Random Integers
rand_ints = np.random.randint(1, 100, (3, 3))
print(rand_ints)



[[0. 0. 0.]
 [0. 0. 0.]]

[[1. 1. 1.]
 [1. 1. 1.]]

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

[[99 99 99]
 [99 99 99]]

[[0.19410612 0.25095026 0.94184601]
 [0.23216813 0.16308814 0.16923426]]

[[85 69 76]
 [36 90 71]
 [68 35  2]]


# NumPy Array Properties
| Property     | Description                         | Code           | Output             |
| ------------ | ----------------------------------- | -------------- | ------------------ |
| Shape        | Tuple of dimensions                 | `arr.shape`    | `(2, 3)`           |
| Dimensions   | Number of axes (1D, 2D, etc.)       | `arr.ndim`     | `2`                |
| Size         | Total number of elements            | `arr.size`     | `6`                |
| Data type    | Type of elements (int, float, etc.) | `arr.dtype`    | `int64` or `int32` |
| Item size    | Bytes per element                   | `arr.itemsize` | `8`                |
| Total memory | Total memory in bytes               | `arr.nbytes`   | `48` (6×8 bytes)   |


In [40]:
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Shape:", arr.shape)
print("Dimensions:", arr.ndim)
print("Size:", arr.size)
print("Data type:", arr.dtype)
print("Item size (bytes):", arr.itemsize)
print("Total memory (bytes):", arr.nbytes)


Shape: (2, 3)
Dimensions: 2
Size: 6
Data type: int64
Item size (bytes): 8
Total memory (bytes): 48


# Array Operations

In [46]:
import numpy as np

# 1. Create arrays
a = np.array([10, 20, 30])
b = np.array([1, 2, 3])

print (a)
print(b)
print("\n--- Arithmetic Operations ---")
print("a + b =", a + b)        # [11 22 33]
print("a - b =", a - b)        # [9 18 27]
print("a * b =", a * b)        # [10 40 90]
print("a / b =", a / b)        # [10. 10. 10.]

print("\n--- Scalar Operations ---")
print("a * 2 =", a * 2)        # [20 40 60]
print("a + 5 =", a + 5)        # [15 25 35]

print("\n--- Comparison Operations ---")
print("a > 15:", a > 15)       # [False  True  True]
print("a == 20:", a == 20)     # [False  True False]

print("\n--- Logical Operations ---")
bool1 = np.array([True, False, True])
bool2 = np.array([False, False, True])
print("logical_and:", np.logical_and(bool1, bool2))  # [False False  True]
print("logical_or:", np.logical_or(bool1, bool2))    # [ True False  True]

print("\n--- Aggregate Operations ---")
c = np.array([[1, 2, 3], [4, 5, 6]])
print("Array c:\n", c)
print("Sum:", c.sum())                 # 21
print("Mean:", c.mean())               # 3.5
print("Max:", c.max())                 # 6
print("Min:", c.min())                 # 1
print("Standard Deviation:", c.std()) # ~1.71
print("Row-wise sum:", c.sum(axis=1)) # [6 15]
print("Column-wise mean:", c.mean(axis=0)) # [2.5 3.5 4.5]


[10 20 30]
[1 2 3]

--- Arithmetic Operations ---
a + b = [11 22 33]
a - b = [ 9 18 27]
a * b = [10 40 90]
a / b = [10. 10. 10.]

--- Scalar Operations ---
a * 2 = [20 40 60]
a + 5 = [15 25 35]

--- Comparison Operations ---
a > 15: [False  True  True]
a == 20: [False  True False]

--- Logical Operations ---
logical_and: [False False  True]
logical_or: [ True False  True]

--- Aggregate Operations ---
Array c:
 [[1 2 3]
 [4 5 6]]
Sum: 21
Mean: 3.5
Max: 6
Min: 1
Standard Deviation: 1.707825127659933
Row-wise sum: [ 6 15]
Column-wise mean: [2.5 3.5 4.5]


# What is Indexing and Slicing?
Just like Python lists, NumPy arrays let you:

Index: Get a specific element

Slice: Get a group of elements (a sub-array)



# Indexing


In [48]:
import numpy as np

print("🔹 1D Array Indexing")
arr1d = np.array([10, 20, 30, 40, 50])
print("Array:", arr1d)
print("arr1d[0] =", arr1d[0])        # First element
print("arr1d[-1] =", arr1d[-1])      # Last element
print("arr1d[1:4] =", arr1d[1:4])    # Slice from index 1 to 3
print("arr1d[::-1] =", arr1d[::-1])  # Reversed array

print("\n🔹 2D Array Indexing")
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
print("Array:\n", arr2d)
print("arr2d[0, 0] =", arr2d[0, 0])  # First row, first column
print("arr2d[2, 1] =", arr2d[2, 1])  # Third row, second column
print("arr2d[-1, -1] =", arr2d[-1, -1])  # Last element
print("arr2d[1, :] =", arr2d[1, :])  # Entire second row
print("arr2d[:, 2] =", arr2d[:, 2])  # All rows, third column

print("\n🔹 Negative Indexing")
print(arr1d[-1])  # 40 (last element)
print(arr2d[-1, -2])  # 5 (second last column of last row)


🔹 1D Array Indexing
Array: [10 20 30 40 50]
arr1d[0] = 10
arr1d[-1] = 50
arr1d[1:4] = [20 30 40]
arr1d[::-1] = [50 40 30 20 10]

🔹 2D Array Indexing
Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
arr2d[0, 0] = 1
arr2d[2, 1] = 8
arr2d[-1, -1] = 9
arr2d[1, :] = [4 5 6]
arr2d[:, 2] = [3 6 9]

🔹 Negative Indexing
50
8


# Slicing
array[start : stop : step]
| Part    | Meaning                                    |
| ------- | ------------------------------------------ |
| `start` | Where to begin (inclusive)                 |
| `stop`  | Where to stop (exclusive)                  |
| `step`  | Interval between elements (default is `1`) |


In [49]:
import numpy as np

# --------------------------
# 🔹 1D Slicing Examples
# --------------------------
arr1d = np.array([10, 20, 30, 40, 50, 60])
print("🔹 1D Array:", arr1d)

print("arr1d[1:4]   =", arr1d[1:4])     # Elements at index 1 to 3
print("arr1d[:3]    =", arr1d[:3])      # First 3 elements
print("arr1d[3:]    =", arr1d[3:])      # From index 3 to end
print("arr1d[::2]   =", arr1d[::2])     # Every 2nd element
print("arr1d[::-1]  =", arr1d[::-1])    # Reverse the array
print("arr1d[-3:]   =", arr1d[-3:])     # Last 3 elements

# --------------------------
# 🔹 2D Slicing Examples
# --------------------------
arr2d = np.array([[ 1,  2,  3,  4],
                  [ 5,  6,  7,  8],
                  [ 9, 10, 11, 12],
                  [13, 14, 15, 16]])
print("\n🔹 2D Array:\n", arr2d)

# Full rows or columns
print("arr2d[1, :]      =", arr2d[1, :])     # Entire 2nd row
print("arr2d[:, 2]      =", arr2d[:, 2])     # Entire 3rd column

# Submatrices
print("arr2d[1:3, 1:3]  =\n", arr2d[1:3, 1:3])  # Middle 2×2 sub-matrix
print("arr2d[:2, :2]    =\n", arr2d[:2, :2])    # Top-left 2×2
print("arr2d[2:, 2:]    =\n", arr2d[2:, 2:])    # Bottom-right 2×2

# Fancy slicing
print("arr2d[:, ::2]    =\n", arr2d[:, ::2])    # Every second column
print("arr2d[::2, :]    =\n", arr2d[::2, :])    # Every second row

# Reverse slicing
print("arr2d[::-1, :]   =\n", arr2d[::-1, :])   # Rows reversed
print("arr2d[:, ::-1]   =\n", arr2d[:, ::-1])   # Columns reversed


🔹 1D Array: [10 20 30 40 50 60]
arr1d[1:4]   = [20 30 40]
arr1d[:3]    = [10 20 30]
arr1d[3:]    = [40 50 60]
arr1d[::2]   = [10 30 50]
arr1d[::-1]  = [60 50 40 30 20 10]
arr1d[-3:]   = [40 50 60]

🔹 2D Array:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
arr2d[1, :]      = [5 6 7 8]
arr2d[:, 2]      = [ 3  7 11 15]
arr2d[1:3, 1:3]  =
 [[ 6  7]
 [10 11]]
arr2d[:2, :2]    =
 [[1 2]
 [5 6]]
arr2d[2:, 2:]    =
 [[11 12]
 [15 16]]
arr2d[:, ::2]    =
 [[ 1  3]
 [ 5  7]
 [ 9 11]
 [13 15]]
arr2d[::2, :]    =
 [[ 1  2  3  4]
 [ 9 10 11 12]]
arr2d[::-1, :]   =
 [[13 14 15 16]
 [ 9 10 11 12]
 [ 5  6  7  8]
 [ 1  2  3  4]]
arr2d[:, ::-1]   =
 [[ 4  3  2  1]
 [ 8  7  6  5]
 [12 11 10  9]
 [16 15 14 13]]


| Code              | Description            |
| ----------------- | ---------------------- |
| `arr2d[1, :]`     | Row 1, all columns     |
| `arr2d[:, 2]`     | All rows, column 2     |
| `arr2d[1:3, 1:3]` | Middle 2×2 submatrix   |
| `arr2d[:2, :2]`   | Top-left 2×2 block     |
| `arr2d[2:, 2:]`   | Bottom-right 2×2 block |
| `arr2d[:, ::2]`   | Every 2nd column       |
| `arr2d[::2, :]`   | Every 2nd row          |
| `arr2d[::-1, :]`  | Rows reversed          |
| `arr2d[:, ::-1]`  | Columns reversed       |


# Reshaping and Flattening

🔹 What is Reshaping?
Reshaping means changing the shape of an array without changing its data.
📌 Rule: The total number of elements must stay the same



In [54]:
a = np.arange(12)  # array from 0 to 11
print(a)
print()
print(a.reshape(3, 4)) # 3 rows, 4 cols
print()
print(a.reshape(4, 3))    # 4 rows, 3 cols
print()
print(a.reshape(2, 2, 3)) # 3D: 2 blocks of 2×3
print()
 #📌 -1 is a smart shortcut to auto-calculate dimension:
print()
print(a.reshape(2, -1))# Automatically gives shape (2, 6)


[ 0  1  2  3  4  5  6  7  8  9 10 11]

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

[[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]


[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]


🔸 What is Flattening?
Flattening means converting any multi-dimensional array into a 1D array (a straight line).

| Method        | Description                            |
| ------------- | -------------------------------------- |
| `flatten()`   | Returns a **copy** of flattened array  |
| `ravel()`     | Returns a **view** (faster but linked) |
| `reshape(-1)` | Same as flattening                     |



In [60]:
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6]])
print(arr2d.reshape(6,1))

flat = arr2d.flatten()
print(flat)  # [1 2 3 4 5 6]


[[1]
 [2]
 [3]
 [4]
 [5]
 [6]]
[1 2 3 4 5 6]


# Array Modification

In [62]:
import numpy as np

print("🔹 ORIGINAL ARRAY")
arr = np.array([10, 20, 30, 40, 50])
print("Original:", arr)

# -----------------------------
# ✅ Modify values directly
# -----------------------------
print("\n🔹 MODIFYING ELEMENTS")
arr[2] = 99
arr[1:3] = [77, 88]
print("After modifying:", arr)

# -----------------------------
# ✅ Append and Insert
# -----------------------------
print("\n🔹 APPENDING & INSERTING")
appended = np.append(arr, [60, 70])
inserted = np.insert(arr, 1, 15)  # insert 15 at index 1
print("Appended:", appended)
print("Inserted:", inserted)

# -----------------------------
# ✅ Delete elements
# -----------------------------
print("\n🔹 DELETING ELEMENTS")
deleted = np.delete(arr, 2)  # remove element at index 2
print("Deleted index 2:", deleted)


# -----------------------------
# ✅ Conditional Replacement
# -----------------------------
print("\n🔹 CONDITIONAL MODIFICATION")
arr3 = np.array([5, 10, 15, 20, 25])
arr3[arr3 > 15] = 999
print("Replace >15 with 999:", arr3)


🔹 ORIGINAL ARRAY
Original: [10 20 30 40 50]

🔹 MODIFYING ELEMENTS
After modifying: [10 77 88 40 50]

🔹 APPENDING & INSERTING
Appended: [10 77 88 40 50 60 70]
Inserted: [10 15 77 88 40 50]

🔹 DELETING ELEMENTS
Deleted index 2: [10 77 40 50]

🔹 CONDITIONAL MODIFICATION
Replace >15 with 999: [  5  10  15 999 999]


🔹 Broadcasting & Vectorization in Python (NumPy)
✅ What is Vectorization?
Vectorization means performing operations on entire arrays (or vectors) without using explicit loops.

✅ It's faster, more readable, and efficient.
✅ What is Broadcasting?
Broadcasting is how NumPy automatically matches arrays of different shapes during arithmetic operations.

It "broadcasts" the smaller array so its shape matches the larger one.

🧠 Think of it like this:
You have a matrix (2D) and want to add a vector (1D) to each row — NumPy broadcasts the vector across the rows.

You don’t need a loop — NumPy does the expansion internally and efficiently.


| Concept           | Description                           | Use Case                    |
| ----------------- | ------------------------------------- | --------------------------- |
| **Vectorization** | Replacing loops with array operations | Fast math on whole datasets |
| **Broadcasting**  | Combining arrays of different shapes  | Add a 1D array to 2D matrix |


In [2]:
import numpy as np

# 2D array: each row is a sample, each column a feature
data = np.array([
    [10, 20, 30],
    [40, 50, 60],
    [70, 80, 90]
])

# 1D array: mean values to subtract from each column (feature-wise normalization)
column_means = np.mean(data, axis=0)  # shape: (3,)
print("Column Means:", column_means)

# Broadcasting: subtract the mean of each column from all rows
normalized = data - column_means  # Vectorized + Broadcasting

print("\nOriginal Data:\n", data)
print("\nNormalized Data (mean-centered):\n", normalized)


Column Means: [40. 50. 60.]

Original Data:
 [[10 20 30]
 [40 50 60]
 [70 80 90]]

Normalized Data (mean-centered):
 [[-30. -30. -30.]
 [  0.   0.   0.]
 [ 30.  30.  30.]]


✅ Practice Questions
Q1. Create a 2D NumPy array and subtract the row-wise mean (Hint: axis=1).
Q2. Create two arrays:
A = np.array([[1, 2, 3], [4, 5, 6]])

B = np.array([10, 20, 30])

Add B to A using broadcasting.



#  Data Cleaning in Python using NumPy


✅ Why Data Cleaning?
Real-world data is:

Messy

Incomplete (missing values)

Inconsistent (wrong types, impossible values)

Noisy (outliers)

Cleaning it ensures:

Better model performance

Accurate insights

No runtime errors

| Task                    | NumPy Function                                            |
| ----------------------- | --------------------------------------------------------- |
| Handling missing values | `np.isnan()`, `np.nan`, `np.nanmean()`, `np.nan_to_num()` |
| Replacing values        | `np.where()`, slicing                                     |
| Type conversion         | `.astype()`                                               |
| Removing outliers       | Based on z-scores or thresholds                           |
| Standardization         | Mean centering, scaling                                   |


In [3]:
import numpy as np

# -------------------------------
# 🔹 Step 1: Create Raw Data with NaNs
# -------------------------------

# Simulated dataset with missing values (np.nan)
data = np.array([
    [72.5, 80.0, np.nan],
    [68.0, np.nan, 70.0],
    [75.0, 85.5, 90.0],
    [np.nan, 60.0, 65.0]
])

print("🔹 Original Data:\n", data)

# -------------------------------
# 🔍 Step 2: Detect Missing Values
# -------------------------------

# Identify where NaNs are present
nan_mask = np.isnan(data)
print("\n🔍 Missing Values (True = NaN):\n", nan_mask)

# -------------------------------
# 📊 Step 3: Impute Missing Values with Column Mean
# -------------------------------

# Compute mean of each column ignoring NaNs
column_means = np.nanmean(data, axis=0)
print("\n📊 Column-wise Means (ignoring NaNs):", column_means)

# Replace NaNs with respective column mean
# np.where(condition, value_if_true, value_if_false)
# Apply column-wise mean wherever np.isnan() is True
cleaned_data = np.where(np.isnan(data), column_means, data)
print("\n✅ Data After Filling Missing Values:\n", cleaned_data)

# -------------------------------
# 🔁 Step 4: Type Conversion (optional)
# -------------------------------

# Convert data type to float32 for memory efficiency
cleaned_data = cleaned_data.astype('float32')
print("\n🔁 Data Type After Conversion:", cleaned_data.dtype)

# -------------------------------
# 🏁 Final Cleaned Output
# -------------------------------

print("\n🏁 Final Cleaned Dataset:\n", cleaned_data)


🔹 Original Data:
 [[72.5 80.   nan]
 [68.   nan 70. ]
 [75.  85.5 90. ]
 [ nan 60.  65. ]]

🔍 Missing Values (True = NaN):
 [[False False  True]
 [False  True False]
 [False False False]
 [ True False False]]

📊 Column-wise Means (ignoring NaNs): [71.83333333 75.16666667 75.        ]

✅ Data After Filling Missing Values:
 [[72.5        80.         75.        ]
 [68.         75.16666667 70.        ]
 [75.         85.5        90.        ]
 [71.83333333 60.         65.        ]]

🔁 Data Type After Conversion: float32

🏁 Final Cleaned Dataset:
 [[72.5      80.       75.      ]
 [68.       75.166664 70.      ]
 [75.       85.5      90.      ]
 [71.833336 60.       65.      ]]


| Step  | Description                                                              |
| ----- | ------------------------------------------------------------------------ |
| **1** | Creates a 2D NumPy array with some missing values (`np.nan`)             |
| **2** | Detects where missing values are using `np.isnan()`                      |
| **3** | Calculates **column-wise mean** with `np.nanmean()` and fills the NaNs   |
| **4** | Converts the data to `float32` for cleaner format or memory optimization |
| **5** | Displays the final cleaned data                                          |
