### 1.Explain the difference between a Python list and a NumPy array in terms of memory usage and performance.Then, write a program that creates both and performs element-wisemultiplication.

## Answer: 
### Difference Between Python List and NumPy Array

## 1. Memory Usage

### Python List
- Stores references to objects.
- Elements can be of different data types.
- Uses more memory.

### NumPy Array
- Stored in contiguous memory.
- All elements are of same data type.
- Uses less memory.

---

## 2. Performance

### Python List
- Slower for numerical computations.
- Requires loops for element-wise operations.

### NumPy Array
- Faster due to vectorized operations.
- Implemented in C for optimized performance.


In [1]:
import numpy as np

# Python List
list1 = [1, 2, 3, 4, 5]
list2 = [10, 20, 30, 40, 50]

list_result = [a * b for a, b in zip(list1, list2)]

print("Python List Result:")
print(list_result)


# NumPy Array
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([10, 20, 30, 40, 50])

array_result = arr1 * arr2

print("\nNumPy Array Result:")
print(array_result)


Python List Result:
[10, 40, 90, 160, 250]

NumPy Array Result:
[ 10  40  90 160 250]


### 2. What is broadcasting in NumPy?Create a 3×3 NumPy array and add a 1D array to it using broadcasting. Explainhow NumPy applies the operation internally

# Answer:
### Broadcasting in NumPy

## What is Broadcasting?

Broadcasting is a technique in NumPy that allows arithmetic operations 
between arrays of different shapes.

Instead of copying data, NumPy automatically expands the smaller array 
to match the shape of the larger array during element-wise operations.

This makes computations:
- Faster
- Memory efficient
- Cleaner (no loops required)


In [2]:
import numpy as np

# Create 3x3 array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Create 1D array
arr = np.array([10, 20, 30])

# Broadcasting addition
result = matrix + arr

print("Original 3x3 Matrix:")
print(matrix)

print("\n1D Array:")
print(arr)

print("\nResult after Broadcasting Addition:")
print(result)


Original 3x3 Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

1D Array:
[10 20 30]

Result after Broadcasting Addition:
[[11 22 33]
 [14 25 36]
 [17 28 39]]


### 3. What are missing values in Pandas and how are they represented?Create a DataFrame with missing values and write code to:
### • Detect missing values
### • Replace them with the column mean
### Explain each step.

# Answer:
### Missing Values in Pandas

## What are Missing Values?

Missing values are data entries that are not available or undefined in a dataset.

They occur due to:
- Data not collected
- Errors during data entry
- Data corruption

## How are Missing Values Represented?

In Pandas, missing values are represented as:

- NaN (Not a Number) → for numeric columns
- None → for object columns
- NaT → for datetime columns

Most commonly used representation: NaN



In [3]:
import pandas as pd
import numpy as np

# Create DataFrame with missing values
data = {
    "Name": ["Aman", "Riya", "Karan", "Neha", "Rahul"],
    "Marks": [85, np.nan, 78, np.nan, 90],
    "Age": [20, 21, np.nan, 22, 23]
}

df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)


# -----------------------------
# Detect Missing Values
# -----------------------------

print("\nDetect Missing Values (True = Missing):")
print(df.isnull())

print("\nCount of Missing Values in Each Column:")
print(df.isnull().sum())


# -----------------------------
# Replace Missing Values with Column Mean
# -----------------------------

df["Marks"] = df["Marks"].fillna(df["Marks"].mean())
df["Age"] = df["Age"].fillna(df["Age"].mean())

print("\nDataFrame After Replacing Missing Values with Column Mean:")
print(df)


Original DataFrame:
    Name  Marks   Age
0   Aman   85.0  20.0
1   Riya    NaN  21.0
2  Karan   78.0   NaN
3   Neha    NaN  22.0
4  Rahul   90.0  23.0

Detect Missing Values (True = Missing):
    Name  Marks    Age
0  False  False  False
1  False   True  False
2  False  False   True
3  False   True  False
4  False  False  False

Count of Missing Values in Each Column:
Name     0
Marks    2
Age      1
dtype: int64

DataFrame After Replacing Missing Values with Column Mean:
    Name      Marks   Age
0   Aman  85.000000  20.0
1   Riya  84.333333  21.0
2  Karan  78.000000  21.5
3   Neha  84.333333  22.0
4  Rahul  90.000000  23.0


## Explanation of Each Step

1. Creating DataFrame:
   - We created a DataFrame containing NaN values using np.nan.

2. Detecting Missing Values:
   - df.isnull() → Returns True where value is missing.
   - df.isnull().sum() → Counts missing values in each column.

3. Replacing Missing Values:
   - df["Marks"].mean() calculates the average of available values.
   - fillna() replaces NaN with the column mean.
   - inplace=True updates the original DataFrame.

Why Use Mean?
- Keeps data distribution balanced.
- Common technique in data preprocessing.


### 4.Explain boolean indexing in NumPy or Pandas.Create a DataFrame with at least 5 columns and filter rows based on two conditionsusing logical operators. Explain how filtering works internally.

# Answer:
### Boolean Indexing in Pandas / NumPy

## What is Boolean Indexing?

Boolean indexing is a method of selecting data based on conditions.

It uses:
- True → Select the row
- False → Ignore the row

In Pandas and NumPy, we create a Boolean mask (True/False values)
and apply it to filter data.

Logical operators used:
- &  → AND
- |  → OR
- ~  → NOT

Important:
We must use parentheses around conditions.


In [4]:
import pandas as pd
import numpy as np

# Create DataFrame with 5 columns
data = {
    "Name": ["Aman", "Riya", "Karan", "Neha", "Rahul", "Simran"],
    "Age": [20, 22, 19, 23, 21, 24],
    "Marks": [85, 92, 75, 88, 60, 95],
    "City": ["Delhi", "Mumbai", "Delhi", "Pune", "Mumbai", "Delhi"],
    "Attendance": [90, 85, 80, 95, 70, 98]
}

df = pd.DataFrame(data)

df


Unnamed: 0,Name,Age,Marks,City,Attendance
0,Aman,20,85,Delhi,90
1,Riya,22,92,Mumbai,85
2,Karan,19,75,Delhi,80
3,Neha,23,88,Pune,95
4,Rahul,21,60,Mumbai,70
5,Simran,24,95,Delhi,98


In [5]:
filtered_df = df[(df["Age"] > 20) & (df["Marks"] > 80)]

filtered_df


Unnamed: 0,Name,Age,Marks,City,Attendance
1,Riya,22,92,Mumbai,85
3,Neha,23,88,Pune,95
5,Simran,24,95,Delhi,98


## How Boolean Filtering Works Internally

Step 1:
Condition 1 → (df["Age"] > 20)

Output:
[False, True, False, True, True, True]

Step 2:
Condition 2 → (df["Marks"] > 80)

Output:
[True, True, False, True, False, True]

Step 3:
Combine using AND (&)

Final Boolean Mask:
[False, True, False, True, False, True]

Step 4:
Pandas keeps rows where mask = True

Important:
No loops are used.
Filtering is vectorized and implemented in C,
so it is fast and memory efficient.


### 5. What is the purpose of the groupby() function in Pandas?Create a DataFrame with categorical data (e.g., department & salary) and calculatethe average salary per department using groupby(). Explain the output.

# Answer:
### groupby() Function in Pandas

## Purpose of groupby()

The groupby() function is used to:

- Split data into groups based on one or more columns
- Apply an aggregation function (mean, sum, count, etc.)
- Combine the results into a new DataFrame or Series

This follows the Split → Apply → Combine strategy.

Common aggregation functions:
- mean()
- sum()
- count()
- max()
- min()


In [6]:
import pandas as pd

# Create DataFrame with categorical data
data = {
    "Employee": ["Aman", "Riya", "Karan", "Neha", "Rahul", "Simran"],
    "Department": ["IT", "HR", "IT", "Finance", "HR", "Finance"],
    "Salary": [60000, 50000, 70000, 65000, 52000, 72000],
    "Experience": [2, 3, 4, 5, 2, 6],
    "City": ["Delhi", "Mumbai", "Delhi", "Pune", "Mumbai", "Delhi"]
}

df = pd.DataFrame(data)

df


Unnamed: 0,Employee,Department,Salary,Experience,City
0,Aman,IT,60000,2,Delhi
1,Riya,HR,50000,3,Mumbai
2,Karan,IT,70000,4,Delhi
3,Neha,Finance,65000,5,Pune
4,Rahul,HR,52000,2,Mumbai
5,Simran,Finance,72000,6,Delhi


In [7]:
avg_salary = df.groupby("Department")["Salary"].mean()

avg_salary


Department
Finance    68500.0
HR         51000.0
IT         65000.0
Name: Salary, dtype: float64

## How groupby() Works Internally

Step 1: Split
Data is divided based on unique values in the "Department" column.

IT → Rows 0 and 2
HR → Rows 1 and 4
Finance → Rows 3 and 5

Step 2: Apply
The mean() function is applied to the Salary column
within each group.

Step 3: Combine
The results are combined into a new Series where:
- Index = Department
- Value = Average Salary

Important:
groupby() does not change the original DataFrame.
It returns a new aggregated object.
