<font color='darkred'> Unless otherwise noted, **this notebook will not be reviewed or autograded.**</font> You are welcome to use it for scratchwork, but **only the files listed in the exercises will be checked.**

---

# Exercises

For these exercises, add your functions to the *apputil\.py* file. If you like, you're welcome to adjust the *app\.py* file, but it is not required. *These exercises use the [Titanic dataset](https://raw.githubusercontent.com/leontoddjohnson/datasets/main/data/titanic.csv).*


## Exercise 1 Survival Patterns



For this Exercise you will analyze survival patterns on the Titanic by looking at passenger class, gender, and age group. Name the function `exercise1()`.

1. Create a new column in the Titanic dataset that divides passengers into age categories:  
    - Child (<13)  
    - Teen (13–19)  
    - Adult (20–59)  
    - Senior (60+)  

2. Group the passengers by class, gender, and age group.  

3. For each group, calculate:  
    - The total number of passengers  
    - The number of survivors  
    - The survival rate  

4. Return a table that includes the results for all combinations of class, gender, and age group.  

5. Order the results so they are easy to interpret.  

6. Use Plotly to display your results visually in a function called `visualize_exercise1()` . You are free to choose the chart type that you think best communicates the findings. Be creative — try different approaches and compare which one tells the clearest story.  


In [None]:
def exercise1():
    """
    Returns a DataFrame with survival counts and survival rates
    grouped by passenger class and sex.
    """
    # Load the Titanic dataset from the given URL
    df = pd.read_csv("https://raw.githubusercontent.com/leontoddjohnson/datasets/main/data/titanic.csv")
    # Group the data by passenger class and sex, and focus on the Survived column
    grouped = (
        df.groupby(["Pclass", "Sex"])["Survived"]
          # Count the total passengers and sum the survivors
          .agg(["count", "sum"]) 
          # Reset the index so Pclass and Sex are columns
          .reset_index()
          # Rename the columns to Total and Survived
          .rename(columns={"count": "Total", "sum": "Survived"})
    )
    # Calculate the survival rate for each group
    grouped["SurvivalRate"] = grouped["Survived"] / grouped["Total"]
    # Return the final table
    return grouped

## Exercise 2: Survival by Family Size

Using the Titanic dataset, write a function named `exercise2()`  to investigate the relationship between family size and survival.

1. Create a new column in the Titanic dataset that represents the total family size for each passenger. Family size is defined as the number of siblings/spouses aboard plus the number of parents/children aboard, plus the passenger themself.  

2. Group the passengers by family size and calculate for each group:  
    - The total number of passengers  
    - The number of survivors  
    - The survival rate  

3. Return a table showing these results, sorted so the values are clear and easy to interpret.  

4. Use Plotly to display your results visually in a function called `visualize_exercise2()`. You are free to choose the chart type that you think best communicates the findings. Be creative — try different approaches and compare which one tells the clearest story.  

In [None]:
def exercise2():
    """
    Calculates survival rate grouped by family size 
    (siblings/spouses + parents/children + self).
    """
    # Load the Titanic dataset from the given URL
    df = pd.read_csv("https://raw.githubusercontent.com/leontoddjohnson/datasets/main/data/titanic.csv")
    # Create a FamilySize column by adding siblings/spouses, parents/children, and the passenger
    df["FamilySize"] = df["SibSp"] + df["Parch"] + 1
    # Group the data by FamilySize and focus on the Survived column
    grouped = (
        df.groupby("FamilySize")["Survived"]
          # Count the total passengers and sum the survivors
          .agg(["count", "sum"])
          # Reset the index so FamilySize is a column
          .reset_index()
          # Rename the columns to Total and Survived
          .rename(columns={"count": "Total", "sum": "Survived"})
    )
    # Calculate the survival rate for each family size
    grouped["SurvivalRate"] = grouped["Survived"] / grouped["Total"]
    
    # Return the final table
    return grouped

## Exercise 3:  Family Size and Wealth

Using the Titanic dataset, write a function named `exercise3()` to explore the relationship between family size, passenger class, and ticket fare.  

1. Create a new column in the Titanic dataset that represents the total family size for each passenger. Family size is defined as the number of siblings/spouses aboard plus the number of parents/children aboard, plus the passenger themself.  

2. Group the passengers by family size and passenger class. For each group, calculate:  
   - The total number of passengers  
   - The average ticket fare  
   - The minimum and maximum ticket fares (to capture variation in wealth)  

3. Return a table with these results, sorted so that the values are clear and easy to interpret (for example, by class and then family size).  

4. Use Plotly to display your results visually in a function called `visualize_exercise3()`.  
   - Experiment with different chart types.  
   - Be creative — try multiple visualizations and compare which one tells the clearest story.  

In [None]:
import pandas as pd
import plotly.express as px

def exercise3():
    """
    Analyzes the relationship between family size, passenger class, and ticket fare.
    Returns a table with total passengers, average, minimum, and maximum fares.
    """
    # Load dataset
    df = pd.read_csv(
        "https://raw.githubusercontent.com/leontoddjohnson/datasets/main/data/titanic.csv"
    )
    df["FamilySize"] = df["SibSp"] + df["Parch"] + 1
    # Group by class and family size
    grouped = (
        df.groupby(["Pclass", "FamilySize"])["Fare"]
          .agg(TotalPassengers="count",
               AvgFare="mean",
               MinFare="min",
               MaxFare="max")
          .reset_index()
    )
    # Sort for readability
    grouped = grouped.sort_values(by=["Pclass", "FamilySize"]).reset_index(drop=True)

    return grouped