# Lesson 1: Deep Dive into Conditional Selection


Hello! Today we're delving deeper into **Conditional Selection**. As you might remember, it's a technique for selecting data in a DataFrame that meets given conditions. It's a key tool for data analysis as it allows us to focus on the most pertinent information.

In today's lesson, we'll explore more complex conditional selection scenarios and learn about an important method: **`where()`**. Our journey will start with a refresher on conditional selection, move on to sophisticated compound conditions, and finally, dive into the `where()` method. Let's get started!

---

## Recap of Conditional Selection

Before venturing into uncharted territory, let's refresh our memory on conditional selection. With this technique, Python sifts through our data and returns elements that meet specific stipulations by comparing columns or rows of our DataFrame against certain conditions.

For instance, given a pandas DataFrame `scores_df` of students' names and their test scores:

```python
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'Dave'], 'Score': [88, 92, 95, 80]}
scores_df = pd.DataFrame(data)

print(scores_df)
#      Name  Score
# 0   Alice     88
# 1     Bob     92
# 2 Charlie     95
# 3    Dave     80
```

To find out who scored more than 90:

```python
print(scores_df[scores_df['Score'] > 90])
#      Name  Score
# 1     Bob     92
# 2 Charlie     95
```

By using `'Score' > 90`, we created a mask to filter rows where the condition resolves to `True`. Pretty cool, right?

---

## Compound Conditional Selection

In real-world scenarios, you might need to select data based on multiple conditions. For this, you can use **compound conditions**.

### Operators:
- **`&` (and)**: Requires all conditions to be true.
- **`|` (or)**: Requires at least one condition to be true.
- **`~` (not)**: Negates a condition.

**Important:** Use parentheses around conditions when using `&` or `|` to ensure proper evaluation.

### Example 1: Using `&` (and)

Find students who scored more than 85 **and** whose names start with 'A':

```python
print(scores_df[(scores_df['Score'] > 85) & (scores_df['Name'].str.startswith('A'))])
#     Name  Score
# 0  Alice     88
```

There's Alice! She meets both conditions.

### Example 2: Using `~` (not)

Exclude 'Bob' from the DataFrame:

```python
print(scores_df[~(scores_df['Name'] == 'Bob')])
#      Name  Score
# 0   Alice     88
# 2 Charlie     95
# 3    Dave     80
```

Adios, Bob!

---

## `.where()` Method in Depth

The pandas **`where()`** method is helpful when you want to select data but replace the data that doesn't satisfy the condition with a custom value instead of discarding it.

### Example: Replace Scores Below 85 with 'Fail'

```python
print(scores_df['Score'].where(scores_df['Score'] > 85, other='Fail'))
# 0    88
# 1    92
# 2    95
# 3   Fail
# Name: Score, dtype: object
```

Here, records not meeting the condition (`Score > 85`) were replaced with 'Fail'. How handy is that!

---

## Lesson Takeaways

We covered a significant portion of our adventure:
- Reviewed **conditional selection**.
- Learned about **compound conditions** using `&`, `|`, and `~`.
- Explored the **`where()`** method for replacing data that doesn't meet conditions.

### Pro Tip:
Practice makes perfect! Engage in problem-solving to solidify your understanding of complex conditional selection and the `where()` method. Every solved problem brings you closer to becoming a data analysis expert. On your marks, get set, **code!** 🚀


## Identifying Top Students in Math or History

Imagine you're helping to analyze data for an education conference. You are interested in identifying students who excel in either math or history. The provided code answers the question: Who are the students scoring more than 85 in either math or history?
import pandas as pd

# DataFrame representing students and their test scores in two subjects
df = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Charlie', 'Dana', 'Eli'],
    'math_score': [82, 95, 78, 98, 89],
    'history_score': [75, 88, 90, 64, 80]
})

# Select students who scored more than 85 in math or history
top_students = df[(df['math_score'] > 85) | (df['history_score'] > 85)]
print(top_students)

The provided code identifies students who scored more than 85 in either math or history. Here's an explanation and the expected output:

### Explanation:
1. A pandas DataFrame `df` is created with students' names, math scores, and history scores.
2. The condition `df['math_score'] > 85` checks for students scoring more than 85 in math.
3. Similarly, `df['history_score'] > 85` checks for scores exceeding 85 in history.
4. The `|` operator (logical OR) combines these conditions, meaning the filter includes rows where **either** condition is true.
5. The filtered rows are stored in `top_students` and printed.

### Code:
```python
import pandas as pd

# DataFrame representing students and their test scores in two subjects
df = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Charlie', 'Dana', 'Eli'],
    'math_score': [82, 95, 78, 98, 89],
    'history_score': [75, 88, 90, 64, 80]
})

# Select students who scored more than 85 in math or history
top_students = df[(df['math_score'] > 85) | (df['history_score'] > 85)]
print(top_students)
```

### Output:
```
   student  math_score  history_score
1      Bob          95             88
3     Dana          98             64
4      Eli          89             80
```

### Key Points:
- **Bob** scored 95 in math and 88 in history.
- **Dana** scored 98 in math.
- **Eli** scored 89 in math. 

These students meet the criteria of scoring more than 85 in at least one subject.

## Flexing with Logical Operators

Your data selections are looking great! Now, let's switch up the logical operators. Modify the starter code to select students with a math score <= 80 or with study hours > 10. Replace the & with | and change the condition for the math score to <=.

Onward to mastery!
import pandas as pd

# Grades DataFrame
grades = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Charlie', 'Diana', 'Evan'],
    'math_score': [91, 85, 79, 92, 88],
    'study_hours': [15, 8, 5, 18, 11]
})

# TODO: Select students with a math score <= 80 OR study_hours > 10
top_students = grades[(grades['math_score'] > 80) & (grades['study_hours'] > 10)]
print(top_students)


To modify the code to select students with a **math score ≤ 80 OR study_hours > 10**, replace the `&` operator with `|` and adjust the condition for `math_score` to `<= 80`. Here's the updated code:

### Updated Code:
```python
import pandas as pd

# Grades DataFrame
grades = pd.DataFrame({
    'student': ['Alice', 'Bob', 'Charlie', 'Diana', 'Evan'],
    'math_score': [91, 85, 79, 92, 88],
    'study_hours': [15, 8, 5, 18, 11]
})

# Select students with a math score <= 80 OR study_hours > 10
top_students = grades[(grades['math_score'] <= 80) | (grades['study_hours'] > 10)]
print(top_students)
```

### Explanation:
1. **Condition 1**: `grades['math_score'] <= 80` selects students with math scores of 80 or lower.
2. **Condition 2**: `grades['study_hours'] > 10` selects students who studied for more than 10 hours.
3. **Logical OR (`|`)**: Combines the two conditions, selecting rows that satisfy **either** condition.

### Expected Output:
```
   student  math_score  study_hours
0    Alice          91           15
2  Charlie          79            5
3    Diana          92           18
4     Evan          88           11
```

### Key Points:
- **Alice**, **Diana**, and **Evan** have study hours > 10.
- **Charlie** has a math score ≤ 80. 
- These students meet at least one of the conditions.


## Academic Prospects: Filtering Excellence

Fantastic! Now that you've seen the code run, it's time to exercise your stellar coding muscles. Your mission, should you choose to accept it, involves filtering a DataFrame. Add the code to select only the students who excel both academically and in dedicated study.
import pandas as pd

# DataFrame with students’ data
students = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Daisy', 'Eve'],
    'grade': [85, 73, 95, 88, 76],
    'study_hours': [15.5, 9.0, 10.5, 12.0, 8.5]
})

# TODO: Write code to select only the students who have a grade above 80 and study more than 10 hours
print(students[____])

Here’s how to filter the DataFrame to select only students who have a **grade above 80 AND study more than 10 hours**:

### Completed Code:
```python
import pandas as pd

# DataFrame with students’ data
students = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Daisy', 'Eve'],
    'grade': [85, 73, 95, 88, 76],
    'study_hours': [15.5, 9.0, 10.5, 12.0, 8.5]
})

# Select students who have a grade above 80 and study more than 10 hours
top_students = students[(students['grade'] > 80) & (students['study_hours'] > 10)]
print(top_students)
```

### Explanation:
1. **Condition 1**: `students['grade'] > 80` filters students with grades above 80.
2. **Condition 2**: `students['study_hours'] > 10` filters students who study more than 10 hours.
3. **Logical AND (`&`)**: Combines both conditions, selecting rows that satisfy **both**.

### Expected Output:
```
      name  grade  study_hours
0    Alice     85         15.5
3    Daisy     88         12.0
```

### Key Insights:
- **Alice** and **Daisy** excel academically with grades above 80 and are diligent in their studies with more than 10 study hours.



## Math Grade Review with Conditional Selection

Now you're in the pilot seat, Space Voyager! The dataframe is ready at your command. Let's identify those whose math grades require review. Add the necessary code to apply your knowledge of the where() function you've learned.
import pandas as pd

grades = pd.DataFrame({
    'Name': ['Anna', 'Ben', 'Cathy', 'Dan'],
    'Math': [75, 88, 82, 95],
    'Science': [92, 76, 88, 89]
})
# TODO: Identify and label Math grades below 85 as 'Review Needed'.
print(grades)

Here’s the updated code to identify Math grades below 85 and label them as `'Review Needed'` using the `.where()` function:

### Completed Code:
```python
import pandas as pd

grades = pd.DataFrame({
    'Name': ['Anna', 'Ben', 'Cathy', 'Dan'],
    'Math': [75, 88, 82, 95],
    'Science': [92, 76, 88, 89]
})

# Identify and label Math grades below 85 as 'Review Needed'
grades['Math'] = grades['Math'].where(grades['Math'] >= 85, other='Review Needed')

print(grades)
```

### Explanation:
1. **`where()` Function**: Replaces values that do not meet the condition (`grades['Math'] >= 85`) with the specified value (`'Review Needed'`).
2. **Condition**: Only Math grades greater than or equal to 85 remain unchanged.
3. **Replacement**: Grades below 85 are replaced with `'Review Needed'`.

### Expected Output:
```
     Name           Math  Science
0    Anna  Review Needed       92
1     Ben             88       76
2   Cathy  Review Needed       88
3     Dan             95       89
```

### Key Insights:
- **Anna** and **Cathy** need to review their Math scores, while **Ben** and **Dan** have grades above 85.