
# 📝 Assignment: Pandas & User-Defined Functions

---

## Dataset
Create the following DataFrame in Python:

In [None]:
import pandas as pd

# Create dataset
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, 30, 22, 35, 28],
    "Salary": [50000, 60000, 45000, 80000, 70000],
    "Department": ["HR", "IT", "Finance", "IT", "HR"]
}

df = pd.DataFrame(data)
df

---

## Question 1: Define Functions and Apply

Write **5 user-defined functions** that operate on DataFrame columns and create new columns:

1. **Age in 5 Years** – Add 5 to the `Age` column.
2. **Salary in Thousands** – Convert `Salary` to thousands.
3. **Is Senior** – Return `"Yes"` if age ≥ 30, else `"No"`.
4. **Department Length** – Length of the department name.
5. **Bonus** – 10% of the `Salary`.

Apply these functions to create new columns in `df`.

---

## Question 2: Aggregate and Transform

1. Calculate **average age** and **total salary**.
2. Increase **Salary** by 5% for all employees using a function.

---

## Question 3: Conditional Column

1. Create a new column `"Tax"` which is **20% of Salary if Salary ≥ 60000**, else **10%**.

---

## Question 4: Combine Columns

1. Create a new column `"Profile"` combining `"Name"` and `"Department"` as `"Name (Department)"`.

---

## Question 5: Custom Transformation

1. Create a column `"Age Group"`:

   * `"Young"` if Age < 25
   * `"Mid"` if 25 ≤ Age < 35
   * `"Senior"` if Age ≥ 35

# 📝 Assignment: Lambda Functions on Lists and DataFrames

---

## Dataset
Use the following list and DataFrame:

In [None]:
from functools import reduce
import pandas as pd

# List of numbers
numbers = [1, 2, 3, 4, 5, 6]

# Example DataFrame
data = {'A': [1, 2, 3], 'B': [10, 20, 30]}
df = pd.DataFrame(data)

numbers, df

---

## Question 1: Lambda with `map`

Double each number using `map` and a lambda function.

---

## Question 2: Lambda with `filter`

Select only even numbers using `filter` and a lambda function.

---

## Question 3: Lambda with `reduce`

Compute the **product of all numbers** using `reduce` and a lambda function.

---

## Question 4: Lambda on Multiple Variables

Add pairs of numbers `(1, 2)`, `(3, 4)`, `(5, 6)` using a lambda with multiple arguments.

---

## Question 5: Advanced Lambda (Filter + Map)

Filter numbers greater than 3 and double them using a combination of `filter` and `map`.

---

## Question 6: Custom Calculation with Lambda

Calculate `(x^2 + 3x + 5)` for each number using `map` and lambda.

---

## Question 7: Lambda Inside List Comprehension

Use a **lambda inside a list comprehension** to **square each number**.

---

## Question 8: Lambda with Pandas DataFrame

Create a new column `C` in `df` that is the **sum of columns `A` and `B`** using `lambda` and `apply`.

---

## Question 9: Conditional Lambda in DataFrame

Create a new column `D`:

* `"High"` if `C` ≥ 20
* `"Low"` otherwise

---

## Question 10: Lambda for Multiple Column Operations

Create a new column `E` which is `(A * 2 + B * 3)` using `lambda` and `apply`.

# 📝 Assignment: Pandas & Lambda Functions (Advanced)

---

## Dataset
Use the same dataset as before:

In [None]:
import pandas as pd

# Create dataset
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, 30, 22, 35, 28],
    "Salary": [50000, 60000, 45000, 80000, 70000],
    "Department": ["HR", "IT", "Finance", "IT", "HR"]
}

df = pd.DataFrame(data)
df

---

## Question 1: Basic Lambda Functions

Create **5 new columns** using **lambda functions**:

1. **Age in 5 Years** – Add 5 to `Age`.
2. **Salary in Thousands** – Convert `Salary` to thousands.
3. **Is Senior** – `"Yes"` if Age ≥ 30, else `"No"`.
4. **Department Length** – Length of the department name.
5. **Bonus** – 10% of `Salary`.

---

## Question 2: Aggregate with Lambda

1. Calculate **average age** and **total salary** using `lambda` inside `agg`.
2. Update Salary by 5% using `lambda`.

---

## Question 3: Conditional Column with Lambda

Create a column `"Tax"`:

* 20% of Salary if Salary ≥ 60000
* 10% otherwise

---

## Question 4: Combine Columns using Lambda

Create `"Profile"` column: `"Name (Department)"`

---

## Question 5: Age Grouping with Lambda

Create `"Age_Group"` column:

* `"Young"` if Age < 25
* `"Mid"` if 25 ≤ Age < 35
* `"Senior"` if Age ≥ 35

---

This **assignment demonstrates:**

* Applying **lambda functions** to columns
* Conditional logic with `lambda`
* Combining columns with `apply(lambda x: ...)`
* Aggregation using `lambda`

---

Sure! Here are **three interesting questions** on **global and local variables**, each with a Python solution. These are structured for a Markdown assignment or Jupyter notebook:

---

## Question 1: Local vs Global Scope

**Question:**
What will be the output of the following code? Explain why.

In [None]:
x = 10  # global variable

def func():
    x = 5  # local variable
    return x

print(func())
print(x)

**Answer:**

---

## Question 2: Modifying Global Variable Inside Function

**Question:**
Modify the global variable `count` inside a function using the `global` keyword.

In [None]:
count = 0

def increment():
    # TODO: increase global count by 1
    pass

increment()
increment()
print(count)

**Answer:**

---

## Question 3: Local Variable Overwriting Global

**Question:**
Predict the output of this code and explain.

In [None]:
value = 100

def update():
    value = value + 50
    return value

print(update())

**Answer:**

## Question 4: Banking Balance Update

**Problem:**
You have a global variable `balance` representing your bank balance. Write a function `deposit(amount)` that adds money to the global balance. Then call the function to deposit 500 and 300, and print the final balance.

```python

balance = 1000

def deposit(amount):
    # TODO: update global balance
    pass

deposit(500)
deposit(300)
print(balance)
```

**Solution:**

---

## Question 5: Temperature Conversion

**Problem:**
You have a global variable `temp_celsius`. Write a function `to_fahrenheit()` that converts it to Fahrenheit and returns the result, **without modifying the global variable**. Print both the converted temperature and the original Celsius.

```python
temp_celsius = 25

def to_fahrenheit():
    # TODO: convert to Fahrenheit using local variable
    pass

f = to_fahrenheit()
print("Fahrenheit:", f)
print("Celsius:", temp_celsius)
```

**Solution:**

# 📝 Assignment: Standard vs Dense Ranking in Pandas


## Dataset

In [None]:
import pandas as pd

# Example DataFrame
data = {
    "Student": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Score": [95, 80, 95, 70, 80]
}

df = pd.DataFrame(data)
df

---

## Question 1: Standard Ranking (`rank(method='min')`)

**Problem:**
Assign **standard ranks** to students based on their scores (higher score = better rank). Use method `'min'` for standard ranking.

**Explanation / Output:**

| Student | Score | Standard\_Rank |
| ------- | ----- | -------------- |
| Alice   | 95    | 1.0            |
| Bob     | 80    | 3.0            |
| Charlie | 95    | 1.0            |
| David   | 70    | 5.0            |
| Eva     | 80    | 3.0            |

* **Note:** Standard ranking leaves gaps after ties.

---

## Question 2: Dense Ranking (`rank(method='dense')`)

**Problem:**
Assign **dense ranks** to students based on their scores (higher score = better rank). Use method `'dense'`.

**Explanation / Output:**

| Student | Score | Dense\_Rank |
| ------- | ----- | ----------- |
| Alice   | 95    | 1           |
| Bob     | 80    | 2           |
| Charlie | 95    | 1           |
| David   | 70    | 3           |
| Eva     | 80    | 2           |

* **Note:** Dense ranking does **not leave gaps** after ties.

---

## Question 3: Rank in Ascending Order

**Problem:**
Compute **dense rank in ascending order** (lower score = better rank). Compare with standard ascending rank.

**Explanation / Output:**

| Student | Score | Standard\_Asc\_Rank | Dense\_Asc\_Rank |
| ------- | ----- | ------------------- | ---------------- |
| Alice   | 95    | 5.0                 | 3                |
| Bob     | 80    | 3.0                 | 2                |
| Charlie | 95    | 5.0                 | 3                |
| David   | 70    | 1.0                 | 1                |
| Eva     | 80    | 3.0                 | 2                |

* Standard ranking in ascending order leaves **gaps**.
* Dense ranking assigns **consecutive integers** even when scores tie.


# 📝 Assignment: Row-Level Aggregation in Pandas

---

## Dataset

In [None]:
import pandas as pd

# Example DataFrame
data = {
    "Student": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Math": [90, 75, 85, 60, 95],
    "Science": [85, 80, 78, 70, 88],
    "English": [92, 70, 80, 65, 90]
}

df = pd.DataFrame(data)
df

---

## Question: Compute Row-Level Aggregates

**Problem:**
For each student, calculate:

1. **Total Score** across all subjects
2. **Average Score** across all subjects
3. **Maximum Score** among subjects

Add these as **new columns** in the DataFrame.

Ah, perfect! Using `groupby` will **highlight the difference clearly** because `.apply()` and `.agg()` can reduce the number of rows, whereas `.transform()` preserves the original number of rows. Here’s a full example with a **20-row dataset**:

---

# 📝 Assignment: `.apply()` vs `.agg()` vs `.transform()` with `groupby`

---

## Dataset

In [None]:
import pandas as pd
import numpy as np

# 20-row dataset
np.random.seed(1)
data = {
    "Student": [f"Student{i}" for i in range(1, 21)],
    "Class": ["A"]*5 + ["B"]*5 + ["C"]*5 + ["D"]*5,
    "Math": np.random.randint(50, 101, 20),
    "Science": np.random.randint(50, 101, 20),
    "English": np.random.randint(50, 101, 20)
}

df = pd.DataFrame(data)
df

---

## Question: Compute **Class-wise Average Math Score**

Use `.groupby()` with `.apply()`, `.agg()`, and `.transform()` and observe the difference.

---

### 1. Using `.apply()`

**Observation:**

* Output is a **Series indexed by group**, one row per class:
* **Shape reduced**: only 4 rows (one per group).

---

### 2. Using `.agg()`

**Observation:**

* Output is a **DataFrame indexed by group**, one row per class with multiple columns:
* Shape reduced: **still one row per group**.

---

### 3. Using `.transform()`

**Observation:**

* Output **keeps all 20 rows**. Each student gets the **class average** in the new column:
* **Key difference:** `.transform()` preserves the **original DataFrame shape**, suitable for adding **group-level statistics back to the original rows**.

---

### ✅ Key Takeaways

| Method         | Output Shape                | Notes                                             |
| -------------- | --------------------------- | ------------------------------------------------- |
| `.apply()`     | Reduced (one row per group) | Flexible, can return Series or DataFrame          |
| `.agg()`       | Reduced (one row per group) | Multiple aggregations at once                     |
| `.transform()` | Same as original DataFrame  | Good for **adding group-level stats to each row** |



# 📝 Assignment: Using `.transform()` in Pandas

---

## Dataset

In [None]:
import pandas as pd
import numpy as np

# New dataset: 15 employees in 3 departments
np.random.seed(2)
data = {
    "Employee": [f"Emp{i}" for i in range(1, 16)],
    "Department": ["HR"]*5 + ["IT"]*5 + ["Finance"]*5,
    "Salary": np.random.randint(40000, 90000, 15),
    "Bonus": np.random.randint(2000, 10000, 15)
}

df = pd.DataFrame(data)
df

---

## Question: Compute Department-Level Statistics Using `.transform()`

**Problem:**
For each employee, compute:

1. **Department Average Salary** → new column `Dept_Avg_Salary`
2. **Department Maximum Bonus** → new column `Dept_Max_Bonus`

Use `.groupby()` and `.transform()` so that **all 15 rows are preserved**.

---

### Solution

**Explanation:**

* `.transform()` computes **group-level statistics** while **keeping original DataFrame shape**.
* Each employee row gets the **department-level metric**, useful for comparisons.



# 📝 Assignment: Using `.transform()` in Pandas

---

## Dataset

In [None]:
import pandas as pd
import numpy as np

# New dataset: 12 products in 3 categories
np.random.seed(3)
data = {
    "Product": [f"Prod{i}" for i in range(1, 13)],
    "Category": ["Electronics"]*4 + ["Clothing"]*4 + ["Toys"]*4,
    "Price": np.random.randint(50, 500, 12),
    "Units_Sold": np.random.randint(10, 100, 12)
}

df = pd.DataFrame(data)
df

---

## Question: Compute Category-Level Metrics Using `.transform()`

**Problem:**
For each product, calculate:

1. **Category Average Price** → new column `Category_Avg_Price`
2. **Category Total Units Sold** → new column `Category_Total_Units`

Use `.groupby()` and `.transform()` so that **all 12 rows are preserved**.

---

### Solution

**Notes:**

* `.transform()` allows you to **broadcast group-level calculations** to each row.
* The shape of `df` remains **the same** as the original DataFrame.
* This is useful for adding **group-level statistics** without losing row-level detail.



# 📝 Assignment: Cumulative Sum with `cumsum()`

---

## Dataset

In [None]:
import pandas as pd

# New dataset: daily sales for 10 days
data = {
    "Day": [f"Day{i}" for i in range(1, 11)],
    "Sales": [250, 400, 150, 300, 500, 200, 450, 350, 100, 300]
}

df = pd.DataFrame(data)
df

---

## Question: Compute Cumulative Sales

**Problem:**
For each day, calculate the **cumulative sum of sales** and store it in a new column called `Cumulative_Sales`. Use `cumsum()`.

---

### Solution

**Notes:**

* `cumsum()` calculates the **running total** for a column.
* Each row shows the sum of all previous rows including the current row.


# 📝 Assignment: Lead and Lag in Pandas

---

## Dataset

In [None]:
import pandas as pd

# Sample dataset: sales data for 12 days
data = {
    "Day": [f"Day{i}" for i in range(1, 13)],
    "Sales": [200, 250, 300, 280, 260, 310, 330, 290, 270, 350, 340, 360]
}

df = pd.DataFrame(data)
df

---

## Question 1: Create a Lag Column

**Problem:**
Create a new column `Lag_1` which contains the **sales of the previous day** (lag by 1).

---

## Question 2: Create a Lead Column

**Problem:**
Create a new column `Lead_1` which contains the **sales of the next day** (lead by 1).

---

## Question 3: Difference with Previous Day

**Problem:**
Create a new column `Diff_Lag` which shows the **difference between current day sales and previous day sales**.

---

## Question 4: Difference with Next Day

**Problem:**
Create a new column `Diff_Lead` which shows the **difference between next day sales and current day sales**.

---

✅ **Notes:**

* `shift(1)` → lag by 1 (previous row)
* `shift(-1)` → lead by 1 (next row)
* These operations are very useful in **time series analysis** for computing differences, growth rates, or moving calculations.



# 📝 Assignment: First and Last Values in Pandas

---

## Dataset

In [None]:
import pandas as pd

# Sample dataset: sales data for 12 employees in 3 departments
data = {
    "Employee": [f"Emp{i}" for i in range(1, 13)],
    "Department": ["HR"]*4 + ["IT"]*4 + ["Finance"]*4,
    "Sales": [250, 300, 280, 260, 310, 330, 290, 270, 400, 420, 380, 390]
}

df = pd.DataFrame(data)
df

---

## Question 1: First Sale Overall

**Problem:**
Find the **first sales value** in the dataset.

---

## Question 2: Last Sale Overall

**Problem:**
Find the **last sales value** in the dataset.

---

## Question 3: First Sale per Department

**Problem:**
Find the **first sale for each department** using `groupby()` and `first()`.

---

## Question 4: Last Sale per Department

**Problem:**
Find the **last sale for each department** using `groupby()` and `last()`.

---

✅ **Notes:**

* `iloc[0]` → first row
* `iloc[-1]` → last row
* `groupby().first()` → first value **per group**
* `groupby().last()` → last value **per group**
* These are useful for **time series, grouped summaries, or reporting first/last transactions**.



