Python List vs NumPy Array Memory usage and performance difference
Python List:-

Stores references to objects, not raw data Each element can be of different type Extra memory is used for pointers Operations are slow because they run in Python loops

NumPy Array:-

Stores data in contiguous memory All elements are of same data type No pointer overhead Operations are fast because they run in C internally



In [1]:
import numpy as np

# Py list
list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]

list_result = []
for i in range(len(list_1)):
    list_result.append(list_1[i] * list_2[i])

print("List mult:", list_result)

# NumPy array
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])

array_result = arr1 * arr2
print("Array mult:", array_result)

List mult: [5, 12, 21, 32]
Array mult: [ 5 12 21 32]


2. Broadcasting in NumPy
Broadcasting allows NumPy to perform operations on arrays of different shapes without copying data. Rules NumPy follows:-

Compare shapes from right to left
Dimensions must be equal or one of them must be 1
If rules fail, error occurs

In [3]:
import numpy as np

matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

vector = np.array([10, 20, 30])

result = matrix + vector
print(result)

[[11 22 33]
 [14 25 36]
 [17 28 39]]


3. Missing values in Pandas Missing values mean data is absent or undefined.
Representation in Pandas:- NaN for numeric data None for object data Internally, Pandas converts most missing values to NaN.

In [5]:
#DataFrame with missing values
import pandas as pd
import numpy as np

data = {
    "Math": [80, 90, np.nan, 70],
    "Physics": [85, np.nan, 88, 92],
    "Chemistry": [78, 82, 80, np.nan]
}

df = pd.DataFrame(data)
print(df)
 

   Math  Physics  Chemistry
0  80.0     85.0       78.0
1  90.0      NaN       82.0
2   NaN     88.0       80.0
3  70.0     92.0        NaN


In [6]:
#Detect missing values
print(df.isnull())

    Math  Physics  Chemistry
0  False    False      False
1  False     True      False
2   True    False      False
3  False    False       True


In [7]:
#Replace missing values with column mean
df_filled = df.fillna(df.mean())
print(df_filled)

   Math    Physics  Chemistry
0  80.0  85.000000       78.0
1  90.0  88.333333       82.0
2  80.0  88.000000       80.0
3  70.0  92.000000       80.0


4. Boolean indexing in Pandas
Selecting rows based on True or False conditions.

Filtering does not loop manually. It creates a boolean mask and applies it.

In [8]:
data = {
    "Name": ["A", "B", "C", "D", "E"],
    "Age": [22, 25, 19, 30, 24],
    "Marks": [85, 90, 70, 88, 76],
    "Attendance": [90, 85, 95, 80, 92],
    "Result": ["Pass", "Pass", "Pass", "Pass", "Pass"]
}

df = pd.DataFrame(data)

In [9]:
"""Filter rows with two conditions

Condition:
Age greater than 21
Marks greater than 80"""


filtered = df[(df["Age"] > 21) & (df["Marks"] > 80)]
print(filtered)

  Name  Age  Marks  Attendance Result
0    A   22     85          90   Pass
1    B   25     90          85   Pass
3    D   30     88          80   Pass


5. Purpose of groupby() in Pandas
To split data into groups, apply aggregation, and combine results.

Used for:

Mean Sum Count Min Max

In [10]:
#Example: Average salary per department

data = {
    "Department": ["IT", "HR", "IT", "Finance", "HR", "Finance"],
    "Salary": [60000, 45000, 65000, 70000, 48000, 72000]
}

df = pd.DataFrame(data)

In [11]:
#Apply groupby

avg_salary = df.groupby("Department")["Salary"].mean()
print(avg_salary)

Department
Finance    71000.0
HR         46500.0
IT         62500.0
Name: Salary, dtype: float64
