## üß≠ Overview ‚Äî Python `map()` Function

The `map()` function is a built-in Python tool that applies a **function** to every item in a **sequence** (for example, a list or column of IDs).  
It returns an **iterator**, which you can convert into a list using `list()`.

**Basic idea:**  
> `map(function, sequence)` ‚Üí applies *function* to each element in *sequence*.

In data analysis, `map()` is helpful when you want to transform or summarize data efficiently without writing explicit loops.

In this example, we will:

- Load the `grades.csv` file into a pandas DataFrame.  
- Create a helper function named `lowest_grade()` that finds the minimum exam score for a given student ID.  
- Apply that function to every student using `map()`.  
- Convert the results into a list using `list()` or DataFrame for easier viewing, i.e., `results = list(map(function, sequence))`



In [29]:
# ============================================
# üß© Lesson: Python map() Function
# File: grades.csv
# ============================================

# Import relevant modules
import pandas as pd
import numpy as np

# Step 1 ‚Äì Load the dataset, save as a pandas dataframe
grades = pd.read_csv("grades.csv")

In [2]:
# Preview the first few rows
print("Preview of grades dataset:")
print(grades.head())

Preview of grades dataset:
   exam  student_id  grade
0     1           1   86.0
1     1           2   65.0
2     1           3   70.0
3     1           4   98.0
4     1           5   89.0


In [3]:
# display first few rows of grades
grades.head()

Unnamed: 0,exam,student_id,grade
0,1,1,86.0
1,1,2,65.0
2,1,3,70.0
3,1,4,98.0
4,1,5,89.0


In [4]:
# Step 2 ‚Äì Inspect the column names (important to identify what 'student ID' is)
print("\nColumn names in dataset:")
print(grades.columns)


Column names in dataset:
Index(['exam', 'student_id', 'grade'], dtype='object')


### üß© Concept: Understanding `dtype='object'` in Column Output

- When you print `grades.columns`, pandas returns an **Index object** that stores the column names.  
- The `dtype='object'` here means that the **column names themselves** are stored as strings.  
- It does **not** describe the data types of the dataset‚Äôs columns.  
- To see the data types for each column, use:
  `grades.dtypes`


In [5]:
# Actual column data types
grades.dtypes

exam            int64
student_id      int64
grade         float64
dtype: object

In [6]:
# Make a clean copy so we do not mutate the original
grades_clean = grades.copy()


In [20]:
# If you have NaNs, decide how to handle them (we'll treat missing as 0 for parity with earlier lesson)
grades_clean['grade'] = grades_clean['grade'].fillna(0)
print(grades_clean)

    exam  student_id  grade
0      1           1   86.0
1      1           2   65.0
2      1           3   70.0
3      1           4   98.0
4      1           5   89.0
5      1           6    0.0
6      1           7   75.0
7      1           8   56.0
8      1           9   90.0
9      1          10   81.0
10     2           1   79.0
11     2           2   60.0
12     2           3   78.0
13     2           4   75.0
14     2           5    0.0
15     2           6   80.0
16     2           7   87.0
17     2           8   82.0
18     2           9   95.0
19     2          10   96.0
20     3           1   78.0
21     3           2   80.0
22     3           3   87.0
23     3           4    0.0
24     3           5   89.0
25     3           6   90.0
26     3           7  100.0
27     3           8   72.0
28     3           9   73.0
29     3          10   75.0
30     4           1    0.0
31     4           2   80.0
32     4           3   81.0
33     4           4   82.0
34     4           5

In [30]:
# --- Step 3 ‚Äì Define a simplified function using grades_clean ---
import pandas as pd

def lowest_grade(student_id):
    """
    Return the lowest grade for the given student_id.
    """
    
    # Filter rows where student_id matches, then find the lowest grade
    
    student_grades = grades_clean.loc[grades_clean['student_id'] == student_id, 'grade']
    return student_grades.min()  # SEE EXPLANATIONS BELOW FOR CODE
    
student_ids = grades_clean['student_id'].unique()
lowest_list = list(map(lowest_grade, student_ids))

lowest_df = pd.DataFrame({
    'student_id': student_ids,
    'lowest_grade': lowest_list
}).sort_values('student_id', ignore_index=True)

print(lowest_df)



   student_id  lowest_grade
0           1           0.0
1           2           0.0
2           3          70.0
3           4           0.0
4           5           0.0
5           6           0.0
6           7          75.0
7           8          56.0
8           9          73.0
9          10          75.0


### üß© Understanding `grades_clean.loc[grades_clean['student_id'] == student_id, 'grade']`

This expression is fundamental when working with **pandas** DataFrames.  
It combines **filtering**, **selection**, and **assignment** concepts.

---

### üí° What each part means

| Piece | Explanation |
|-------|--------------|
| `grades_clean` | The name of your pandas **DataFrame** ‚Äî think of it as a spreadsheet stored in memory. |
| `grades_clean['student_id']` | Accesses the **column** named `'student_id'`. This returns a list-like object (called a *Series*) containing each student‚Äôs ID. |
| `grades_clean['student_id'] == student_id` | Creates a **Boolean mask** ‚Äî a list of `True` or `False` values for every row. For example, if `student_id = 3`, the mask might look like `[False, True, False, True, ...]`, meaning ‚Äúwhich rows belong to student 3.‚Äù |
| `grades_clean.loc[...]` | The `.loc[]` method **selects rows and columns by label**. Inside the brackets, you specify which rows and which column(s) you want. |
| `grades_clean.loc[grades_clean['student_id'] == student_id, 'grade']` | This says: ‚ÄúFrom the DataFrame `grades_clean`, select all rows where the `'student_id'` column equals the given `student_id`, and from those rows, return only the `'grade'` column.‚Äù |
| `student_grades = ...` | The equals sign `=` here is **assignment**, not mathematical equality. It means ‚Äústore whatever‚Äôs on the right-hand side into the variable `student_grades`.‚Äù |

---

### üß† Example (Step-by-Step)

If your dataset looks like this:

| exam | student_id | grade |
|------|-------------|-------|
| 1 | 3 | 80 |
| 2 | 3 | 95 |
| 3 | 3 | 88 |

then running:
```python
student_grades = grades_clean.loc[grades_clean['student_id'] == 3, 'grade']
print(student_grades)
0    80.0
1    95.0
2    88.0
Name: grade, dtype: float64

student_grades.min()
80.0


In [31]:
import pandas as pd
import numpy as np

# --- Base assumptions ---
# DataFrame 'grades' already exists with columns: exam, student_id, grade
# If you haven't filled NaNs earlier, the function below handles them.

# 1) Simple helper function
def lowest_grade(student_id):
    """
    Return the lowest grade for the given student_id.
    - Filters the 'grades' DataFrame to that student
    - Replaces missing grades (NaN) with 0
    - Returns the minimum value
    """
    student_grades = grades_clean.loc[grades_clean['student_id'] == student_id, 'grade']
    return student_grades.min()

# "return' belongs inside your function definition.
# "return" sends a value back to the code that called the function.
# Here, it finds the minimum (lowest) grade for one student 
# and gives that value back when lowest_grade(student_id) is called.

# ============================================
# 2) Apply the function to every student_id
# ============================================

# Create a list (array) of all distinct student IDs in the dataset.
# .unique() means ‚Äúlist each student_id only once.‚Äù
student_ids = grades_clean['student_id'].unique()

# ============================================
# 3) Apply the function to each student using map()
# ============================================

# The map() function applies another function to each item in a sequence.
# Here, map() takes each student_id from the list and applies lowest_grade(student_id)
# It returns an iterator (a generator object), so we must convert it to a list to view results.
lowest_list = list(map(lowest_grade, student_ids))


# ============================================
# 4) Combine results into a new DataFrame
# ============================================

# Create a new DataFrame called lowest_df that pairs:
# each student_id with their lowest_grade from lowest_list
# pd.DataFrame() is like building a new spreadsheet from two columns of data.
lowest_df = pd.DataFrame({
    'student_id': student_ids,       # column 1: all student IDs
    'lowest_grade': lowest_list      # column 2: the lowest grade for each student
})

# Sort by student_id so the output appears in logical order (1, 2, 3, etc.)
# ignore_index=True resets the index numbers to 0, 1, 2, ...
lowest_df = lowest_df.sort_values('student_id', ignore_index=True)


# ============================================
# üñ®Ô∏è Display results
# ============================================

# print() sends the DataFrame‚Äôs contents to the screen.
# You‚Äôll see each student_id alongside their lowest grade.
print(lowest_df)

print("\n‚úÖ Lowest grade per student (via map):")
print(lowest_df)

# 5) (Nice to have) Quick count check
print(f"\nStudents counted: {lowest_df['student_id'].nunique()} (expected {len(student_ids)})")

   student_id  lowest_grade
0           1           0.0
1           2           0.0
2           3          70.0
3           4           0.0
4           5           0.0
5           6           0.0
6           7          75.0
7           8          56.0
8           9          73.0
9          10          75.0

‚úÖ Lowest grade per student (via map):
   student_id  lowest_grade
0           1           0.0
1           2           0.0
2           3          70.0
3           4           0.0
4           5           0.0
5           6           0.0
6           7          75.0
7           8          56.0
8           9          73.0
9          10          75.0

Students counted: 10 (expected 10)


In [21]:
import pandas as pd

# --- Base assumptions ---
# DataFrame 'grades' already exists with columns: exam, student_id, grade
# If you haven't filled NaNs earlier, the function below handles them.

# 1) Simple helper function
def lowest_grade(student_id):
    """
    Return the lowest grade for the given student_id.
    - Filters the 'grades' DataFrame to that student
    - Replaces missing grades (NaN) with 0
    - Returns the minimum value
    """
    student_grades = grades_clean.loc[grades_clean['student_id'] == student_id, 'grade']
    return student_grades.min()

# 2) Get unique student IDs
student_ids = grades_clean['student_id'].unique()

# 3) Apply the function to each student using map()
lowest_list = list(map(lowest_grade, student_ids))

# 4) (Optional) Put results in a small table for readability
lowest_df = pd.DataFrame({
    'student_id': student_ids,
    'lowest_grade': lowest_list
}).sort_values('student_id', ignore_index=True)

print("‚úÖ Lowest grade per student (via map):")
print(lowest_df)

# 5) (Nice to have) Quick count check
print(f"\nStudents counted: {lowest_df['student_id'].nunique()} (expected {len(student_ids)})")


‚úÖ Lowest grade per student (via map):
   student_id  lowest_grade
0           1           0.0
1           2           0.0
2           3          70.0
3           4           0.0
4           5           0.0
5           6           0.0
6           7          75.0
7           8          56.0
8           9          73.0
9          10          75.0

Students counted: 10 (expected 10)


## üß† Key Takeaways

- **`map(function, sequence)`** applies the given function to every element of a sequence.  
- `map()` returns an **iterator**; use `list()` to collect all results at once.  
- It‚Äôs useful for applying the **same operation** to each element‚Äîwithout writing a manual `for` loop.  
- Example replacement for a loop:
  ```python
  for id in student_ids:
      lowest_grade(id)


## üß© Concept: Function Definition with Default DataFrame Argument

### This line defines a **function** named `lowest_grade` that accepts two parameters: 
1. **`student_id`** ‚Äì a required argument that identifies which student‚Äôs grades to analyze.  
2. **`df=grades_clean`** ‚Äì an optional argument with a **default value**.

```python
def lowest_grade(student_id, df=grades_clean):  # This line defines a function named `lowest_grade` that accepts two parameters


In [10]:
# ============================================
# üß© Step 3 ‚Äì Define a function to find the lowest grade for a given student (LONG format)
# Data columns present: ['exam', 'student_id', 'grade']
# ============================================

def lowest_grade(student_id, df=grades_clean):   # This line defines a function named `lowest_grade` that accepts two parameters
    """
    Return the lowest 'grade' for a single student_id.
    Works with LONG-format data where each row is (exam, student_id, grade).
    """
    # Filter rows for this student_id, then take the minimum of the 'grade' column
    return df.loc[df['student_id'] == student_id, 'grade'].min()

# Step 4 ‚Äì Collect distinct student IDs
student_ids = grades_clean['student_id'].unique().tolist()

# Step 5 ‚Äì Use map() to apply the function to every student_id
lowest_grades_iter = map(lowest_grade, student_ids)

# Step 6 ‚Äì Materialize the iterator into a list
lowest_grades_list = list(lowest_grades_iter)

# Step 7 ‚Äì Package results in a small summary table
lowest_grades_df = pd.DataFrame({
    'student_id': student_ids,
    'lowest_grade': lowest_grades_list
}).sort_values('student_id', ignore_index=True)

print("\nLowest grade for each student (via map):")
print(lowest_grades_df)

# (Nice to have) Quick verification: counts should match unique student count
print(f"\nStudents counted: {lowest_grades_df['student_id'].nunique()} (expected {len(student_ids)})")



Lowest grade for each student (via map):
   student_id  lowest_grade
0           1           0.0
1           2           0.0
2           3          70.0
3           4           0.0
4           5           0.0
5           6           0.0
6           7          75.0
7           8          56.0
8           9          73.0
9          10          75.0

Students counted: 10 (expected 10)


In [11]:
def lowest_grade(student_id):
    """
    Find the lowest grade across all exams for a given student_id.
    Treat missing exam grades as zeros.
    """
    student_grades = grades.loc[grades['student_id'] == student_id, 'grade']
    return student_grades.fillna(0).min()


In [12]:
def lowest_grade(student_id):
    
    """Find lowest grade across all exams for student with given student_id.
    Treat missing exam grades as zeros."""
    
    return grades.loc[grades['student_id'] == student_id]['grade'].fillna(0).min()

In [13]:
# test lowest_grade on student_id 1
assert lowest_grade(1) == 0.0, 'test failed'
print('test passed')

test passed


In [14]:
# sequence containing all distinct student ids
student_ids = grades['student_id'].unique()
student_ids

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [15]:
# apply lowest_grade to each student id
list(map(lowest_grade, student_ids))

[0.0, 0.0, 70.0, 0.0, 0.0, 0.0, 75.0, 56.0, 73.0, 75.0]