# Lesson 4: Filtering Data in Pandas

Hello, friend! Today's topic is **Filtering Data**. It's about focusing on the data that matters to us. We'll use **pandas**, a Python library, to help us with this.

## The Goal 🎯
Master data filtering in pandas. By the end, you'll be able to pick the necessary data from a big dataset.

---

## Basics of Data Filtering 📋

Filtering data in pandas is like finding your favorite outfit in a wardrobe. The easiest way to filter data is by columns. Let's illustrate this using a DataFrame of students' details.

```python
import pandas as pd

# Data of students
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Dave', 'Eve'],
    'age': [12, 13, 14, 13, 12],
    'grade_level': [6, 7, 8, 7, 6]
}

students_df = pd.DataFrame(data)

# Filter 7th grade students
grade_seven_students = students_df[students_df['grade_level'] == 7]

print(grade_seven_students)
# Outputs:
#     name  age  grade_level
# 1    Bob   13            7
# 3   Dave   13            7
```

The code above creates a DataFrame and selects only the rows where the `grade_level` is 7. This is similar to NumPy's boolean selection.

---

## Understanding Boolean Masking 🕵️‍♀️

One of the magic tricks of pandas is **Boolean masking**. A Boolean mask hides parts of your data based on it being `True` or `False`.

### Example: Creating a Boolean Series
```python
# Boolean Series for 7th grade
is_grade_seven = students_df['grade_level'] == 7
print(is_grade_seven)
# 0    False
# 1     True
# 2    False
# 3     True
# 4    False
```

### Filtering Using Boolean Series
```python
# Filtering using Boolean Series
grade_seven_students = students_df[is_grade_seven]

print(grade_seven_students)
# Outputs:
#     name  age  grade_level
# 1    Bob   13            7
# 3   Dave   13            7
```

Only rows where the Boolean Series is `True` are selected.

---

## Advanced Data Filtering 🚀

Sometimes, we need to filter data using multiple conditions. Python lets us do this with logical operators: **And (`&`)**, **Or (`|`)**, and **Not (`~`)**.

### Example: Multiple Conditions
```python
# Filter 7th grade students who are 13 years old
grade_seven_and_thirteen = students_df[(students_df['grade_level'] == 7) & (students_df['age'] == 13)]

print(grade_seven_and_thirteen)
# Outputs:
#     name  age  grade_level
# 1    Bob   13            7
# 3   Dave   13            7
```

### Using `isin()` Method
The `isin()` method checks whether a pandas Series is in a list of values.

```python
# Filter students who are in 6th or 7th grade
middle_school_students = students_df[students_df['grade_level'].isin([6, 7])]

print(middle_school_students)
# Outputs:
#      name  age  grade_level
# 0   Alice   12            6
# 1     Bob   13            7
# 3    Dave   13            7
# 4     Eve   12            6
```

---

## Lesson Summary 📚

This lesson covered:
1. **Basic filtering** by column values.
2. **Boolean masking** for filtering rows.
3. **Advanced filtering** with multiple conditions and the `isin()` method.

### Key Takeaways:
- Boolean masking is a powerful way to filter data.
- Logical operators (`&`, `|`, `~`) allow complex filtering.
- The `isin()` method simplifies filtering with lists of values.

Keep practicing these skills on different datasets. Practice makes perfect! Stay tuned for the next lesson. 🚀


## School Data Filtering: Who's in Sixth Grade?

Imagine you're helping out at the school's administrative office, and you need to find which students are in the 6th grade. The given code creates a DataFrame from a school data dictionary and applies a boolean mask to filter out only the 6th-grade students. Click Run to witness the magic of data filtering with pandas!
import pandas as pd

# Data of students in a school
school_data = {'student_id': [101, 102, 103, 104, 105],
               'name': ['Liam', 'Olivia', 'Noah', 'Emma', 'James'],
               'grade': [5, 6, 5, 6, 6]}

school_df = pd.DataFrame(school_data)

# Filter 6th grade students using a boolean mask
sixth_grade_students = school_df[school_df['grade'] == 6]
print(sixth_grade_students)

The code snippet uses pandas to create a DataFrame from a dictionary and filters students in the 6th grade using a Boolean mask. Here's a walkthrough of the process:

---

### Code Explanation:
1. **Create a DataFrame**:
   - `school_data` is a dictionary with keys: `student_id`, `name`, and `grade`.
   - The `pd.DataFrame()` function converts this dictionary into a DataFrame.

2. **Boolean Masking**:
   - A Boolean condition (`school_df['grade'] == 6`) checks where the `grade` is equal to 6.
   - This creates a Boolean Series that is used to filter rows in the DataFrame.

3. **Filter Results**:
   - The rows where the condition evaluates to `True` are selected.
   - The filtered DataFrame contains only the 6th-grade students.

---

### Expected Output:
The filtered DataFrame (`sixth_grade_students`) will display:
```plaintext
   student_id   name  grade
1         102  Olivia      6
3         104    Emma      6
4         105   James      6
```

---

This demonstrates how pandas' Boolean indexing simplifies the process of filtering data in a DataFrame. 🎉

## Filtering School Records with Pandas

Imagine you're helping out at the school's administrative office, and you need to find which students are in the 6th grade. The given code creates a DataFrame from a school data dictionary and applies a boolean mask to filter out only the 6th-grade students. Click Run to witness the magic of data filtering with pandas!
import pandas as pd

# Data of students in a school
school_data = {'student_id': [101, 102, 103, 104, 105],
               'name': ['Liam', 'Olivia', 'Noah', 'Emma', 'James'],
               'grade': [5, 6, 5, 6, 6]}

school_df = pd.DataFrame(school_data)

# Filter 6th grade students using a boolean mask
sixth_grade_students = school_df[school_df['grade'] == 6]
print(sixth_grade_students)

This code efficiently filters students in the 6th grade using pandas' Boolean indexing. Here's a detailed breakdown:

---

### Code Functionality:
1. **Creating the DataFrame**:
   - The `school_data` dictionary contains student information with keys:
     - `student_id`: A unique ID for each student.
     - `name`: The name of each student.
     - `grade`: The grade of each student.
   - `pd.DataFrame(school_data)` converts this dictionary into a tabular DataFrame format.

2. **Boolean Mask**:
   - `school_df['grade'] == 6` generates a Boolean Series:
     - `True` for rows where `grade` is 6.
     - `False` otherwise.

3. **Applying the Boolean Mask**:
   - `school_df[school_df['grade'] == 6]` selects only the rows where the Boolean Series is `True`.

4. **Storing Filtered Data**:
   - The resulting subset is assigned to `sixth_grade_students`.

5. **Output**:
   - `print(sixth_grade_students)` displays the DataFrame containing only 6th-grade students.

---

### Output:
```plaintext
   student_id    name  grade
1         102  Olivia      6
3         104    Emma      6
4         105   James      6
```

---

### How It Works:
- The filter isolates students in the 6th grade (`grade == 6`).
- The `sixth_grade_students` DataFrame now holds:
  - IDs, names, and grades of students in the 6th grade.

---

### Why It’s Useful:
Pandas simplifies data manipulation and analysis, enabling tasks like filtering, sorting, and grouping with concise and readable code. 🎓

## Schools Data Filter Debugging

Alright, Stellar Navigator! Your next mission is to refine the data filter. Adjust the current code to display students from the 6th grade with the A score. Use your knowledge of logical operators to edit the filtering criteria in the DataFrame.

Ready for this task? Go for it!
import pandas as pd

# Example DataFrame simulating a school's student records
student_records = pd.DataFrame({
    'StudentID': [101, 102, 103, 104, 105],
    'Score': ['A', 'B', 'C', 'B', 'A'],
    'Grade': [5, 5, 6, 5, 6]
})

# Filter the DataFrame to show students in 6th grade with Score A
sixth_grade_A = student_records[(student_records['Grade'] == 5) & (student_records['Score'] == 'B')]
print(sixth_grade_A)


Your task involves adjusting the filter criteria to display **students in the 6th grade with an 'A' score**. Here’s the corrected and updated code:

---

### Updated Code:
```python
import pandas as pd

# Example DataFrame simulating a school's student records
student_records = pd.DataFrame({
    'StudentID': [101, 102, 103, 104, 105],
    'Score': ['A', 'B', 'C', 'B', 'A'],
    'Grade': [5, 5, 6, 5, 6]
})

# Filter the DataFrame to show students in 6th grade with Score A
sixth_grade_A = student_records[(student_records['Grade'] == 6) & (student_records['Score'] == 'A')]
print(sixth_grade_A)
```

---

### Explanation of Changes:
1. **Filtering Conditions**:
   - `(student_records['Grade'] == 6)` ensures the filter focuses on 6th-grade students.
   - `(student_records['Score'] == 'A')` selects students who achieved an 'A' score.
   - Combined using `&` for the logical AND operation, so both conditions must be true.

2. **Correct Output**:
   - The original filter targeted **5th-grade students with 'B' scores**, which doesn't meet the new requirement.

---

### Output:
```plaintext
   StudentID Score  Grade
4        105     A      6
```

This reflects only **students in the 6th grade** with an **'A' score**.

---

### Mission Accomplished! 🌟
You’ve successfully refined the filter criteria. This technique of combining conditions is essential for advanced data filtering in pandas. 🎓 Keep exploring!

## Data Filtering with Multiple Grades

Hey there, Space Voyager! We've got data on different schools, and we want to filter out only the elementary schools from Riverside. However, our code isn't working as expected. Can you spot the mistake and fix it? Remember to use the skills from our filtering lesson. Good luck!
import pandas as pd

# Data about schools
school_data = {
    'name': ['Greenfield Elementary', 'Sunnydale High', 'Riverside Elementary 1', 'Riverside Elementary 2'],
    'type': ['Elementary', 'High', 'Elementary', 'Elementary'],
    'city': ['Greenfield', 'Sunnydale', 'Riverside', 'Riverside']
}

schools_df = pd.DataFrame(school_data)

# Filter schools of type 'Elementary'
elementary_schools = schools_df[(schools_df['type'] == 'Elementary') and (schools_df['city'] == 'Riverside')]
print(elementary_schools)

The issue in the code lies in the use of the `and` operator, which is not supported for pandas Series. Instead, we need to use the bitwise `&` operator for element-wise logical operations when filtering in pandas. Here's the corrected code:

---

### Fixed Code:
```python
import pandas as pd

# Data about schools
school_data = {
    'name': ['Greenfield Elementary', 'Sunnydale High', 'Riverside Elementary 1', 'Riverside Elementary 2'],
    'type': ['Elementary', 'High', 'Elementary', 'Elementary'],
    'city': ['Greenfield', 'Sunnydale', 'Riverside', 'Riverside']
}

schools_df = pd.DataFrame(school_data)

# Filter schools of type 'Elementary' in Riverside
elementary_schools = schools_df[(schools_df['type'] == 'Elementary') & (schools_df['city'] == 'Riverside')]
print(elementary_schools)
```

---

### Explanation of Changes:
1. **Logical Operator Fix**:
   - Replaced `and` with `&` for element-wise operations in pandas.
   - Parentheses are required around each condition to ensure proper precedence.

2. **Output DataFrame**:
   - The fixed code now correctly filters rows where `type` is `'Elementary'` and `city` is `'Riverside'`.

---

### Correct Output:
```plaintext
                     name         type       city
2  Riverside Elementary 1  Elementary  Riverside
3  Riverside Elementary 2  Elementary  Riverside
```

---

### Key Reminder:
- Always use `&` for **AND**, `|` for **OR**, and `~` for **NOT** when filtering in pandas.
- Enclose conditions in parentheses to avoid precedence issues.

You’re all set, Space Voyager! 🚀

## Navigating the Stars: Sophomore Achievers

Stellar Navigator, let's test your skills a bit more!

Add the missing code to filter and display students who are in the 8th or 10th grade. Use pandas dataframe's isin method to complete this task. Shine bright, you've got this!
import pandas as pd

# Data of students
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Mary', 'John'],
    'age': [12, 13, 14, 15, 12],
    'grade_level': [10, 7, 8, 9, 10]
}

students_df = pd.DataFrame(data)

# TODO: Filter students who are in 8th or 10th grade
filtered_students = _____

print(filtered_students)

Here's the completed code using pandas' `isin()` method to filter students in the 8th or 10th grade:

---

### Completed Code:
```python
import pandas as pd

# Data of students
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Mary', 'John'],
    'age': [12, 13, 14, 15, 12],
    'grade_level': [10, 7, 8, 9, 10]
}

students_df = pd.DataFrame(data)

# Filter students who are in 8th or 10th grade
filtered_students = students_df[students_df['grade_level'].isin([8, 10])]

print(filtered_students)
```

---

### Explanation:
1. **`isin()` Method**:
   - Used to check if a value in a column exists in a specified list.
   - `students_df['grade_level'].isin([8, 10])` creates a boolean mask where `True` corresponds to rows where the `grade_level` is 8 or 10.

2. **Boolean Indexing**:
   - Applied the boolean mask to the DataFrame to filter rows matching the condition.

---

### Output:
```plaintext
    name  age  grade_level
0  Alice   12           10
2 Charlie   14            8
4   John   12           10
```

---

Keep shining, Stellar Navigator! You're ready to conquer data! 🌟

## Charting the Schools - Data Filtering in Pandas

Chart your course through the data universe, Stellar Navigator! Your mission: create filters that guide us to 10th-grade students excelling in tests. Grab your cosmic compass and draft the code to seek out these high-achieving sophomores.
import pandas as pd

# Sample data from the School and Education System
data = {
    'student_id': [1, 2, 3, 4, 5],
    'test_score': [89, 92, 78, 90, 85],
    'grade_level': [10, 10, 9, 10, 9]
}

students_df = pd.DataFrame(data)

# TODO: Use a filter to find students in the 10th grade who have a test score higher than 90.
print(high_achieving_sophomores)

Here's the completed code to find the 10th-grade students who have a test score higher than 90:

---

### Code Solution:
```python
import pandas as pd

# Sample data from the School and Education System
data = {
    'student_id': [1, 2, 3, 4, 5],
    'test_score': [89, 92, 78, 90, 85],
    'grade_level': [10, 10, 9, 10, 9]
}

students_df = pd.DataFrame(data)

# Filter students in the 10th grade who have a test score higher than 90
high_achieving_sophomores = students_df[(students_df['grade_level'] == 10) & (students_df['test_score'] > 90)]

print(high_achieving_sophomores)
```

---

### Explanation:
1. **Filtering Conditions**:
   - `students_df['grade_level'] == 10`: Selects rows where the student is in the 10th grade.
   - `students_df['test_score'] > 90`: Selects rows where the test score is greater than 90.

2. **Logical AND (`&`)**:
   - Combines the two conditions so only students meeting both criteria are included.

3. **Boolean Indexing**:
   - The combined boolean mask is applied to filter the DataFrame.

---

### Output:
```plaintext
   student_id  test_score  grade_level
1           2          92           10
```

---

You're navigating the data galaxy like a pro, Stellar Navigator! Keep charting your course to data brilliance! 🌟



You're on the last stretch, Space Voyager! It's time to showcase your skills by implementing a Pandas data filtering operation from scratch. Use the schools data to filter out only the Charter type schools. Then, print the DataFrame that contains your filtered data.
import pandas as pd

# TODO: Create a dictionary with school data including school names, types, and grades

# TODO: Convert the dictionary to a DataFrame

# TODO: Filter the DataFrame to include only Charter type schools

# TODO: Print the DataFrame with filtered Charter type school

Here's a complete solution to filter and display only Charter type schools:

---

### Code Solution:
```python
import pandas as pd

# Create a dictionary with school data including school names, types, and grades
school_data = {
    'school_name': ['Greenfield Elementary', 'Sunnydale High', 'Riverside Charter', 'Mountain View Charter'],
    'school_type': ['Elementary', 'High', 'Charter', 'Charter'],
    'grade_level': ['K-5', '9-12', 'K-8', 'K-12']
}

# Convert the dictionary to a DataFrame
schools_df = pd.DataFrame(school_data)

# Filter the DataFrame to include only Charter type schools
charter_schools = schools_df[schools_df['school_type'] == 'Charter']

# Print the DataFrame with filtered Charter type schools
print(charter_schools)
```

---

### Explanation:
1. **Dictionary Creation**:
   - The `school_data` dictionary contains school names, types, and grade levels.

2. **Convert to DataFrame**:
   - The `pd.DataFrame()` function converts the dictionary into a Pandas DataFrame.

3. **Filter Charter Schools**:
   - `schools_df['school_type'] == 'Charter'`: Creates a boolean mask for rows where `school_type` is `'Charter'`.
   - This mask is applied to `schools_df` to filter the rows.

4. **Display Results**:
   - The filtered DataFrame, `charter_schools`, is printed.

---

### Output:
```plaintext
           school_name school_type grade_level
2    Riverside Charter     Charter        K-8
3  Mountain View Charter     Charter       K-12
```

---

You're blazing through data challenges like a true explorer! Keep reaching for the stars, Space Voyager! 🚀✨