[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zjelveh/zjelveh.github.io/blob/master/files/cfc/class_demo_notebook.ipynb)

# Final Project Example: Yankees Games & Noise Complaints

**Research Question:** Do noise complaints increase on Yankees game days?

## 1. Load the Data

In [None]:
import pandas as pd
import seaborn as sns

# Load datasets from GitHub
base_url = 'https://raw.githubusercontent.com/zjelveh/zjelveh.github.io/master/files/cfc/'

complaints = pd.read_csv(base_url + 'nyc_311_noise_sample.csv')
yankees_games = pd.read_csv(base_url + 'yankees_home_games_2023.csv')

# Check size - this goes in Slide 3
print(f"Dataset size: {len(complaints):,} complaints")
print(f"Yankees games: {len(yankees_games)} home games")

## 2. Create Comparison Groups

In [None]:
# Convert dates
complaints['date'] = pd.to_datetime(complaints['created_date']).dt.date
yankees_games['date'] = pd.to_datetime(yankees_games['game_date']).dt.date

# THE KEY LINE - Creates comparison for Slide 4
game_dates = yankees_games['date'].tolist()
complaints['is_game_day'] = complaints['date'].isin(game_dates)

# Check group sizes
complaints['is_game_day'].value_counts()

## 3. Calculate Metrics (For Slide 5)

In [None]:
# Calculation 1: Total complaints
totals = complaints.groupby('is_game_day').size()
print("Total complaints by group:")
print(totals)

In [None]:
# Calculation 2: Average per day
game_days = 81
non_game_days = 284

avg_game = totals[True] / game_days
avg_non_game = totals[False] / non_game_days

print(f"Average on game days: {avg_game:.1f}")
print(f"Average on non-game days: {avg_non_game:.1f}")

In [None]:
# Calculation 3: Percent increase
pct_increase = ((avg_game - avg_non_game) / avg_non_game) * 100
print(f"\nGAME DAY EFFECT: {pct_increase:.1f}% increase")

## 4. Create Visualization with Seaborn

In [None]:
# Create DataFrame for plotting
plot_data = pd.DataFrame({
    'Day Type': ['Non-Game Days', 'Game Days'],
    'Average Complaints': [avg_non_game, avg_game]
})

# Bar chart for Slide 7
ax = sns.barplot(data=plot_data, x='Day Type', y='Average Complaints', 
                  palette=['steelblue', 'coral'])
ax.set_title('Yankees Games Increase Noise Complaints by 31.5%')
ax.set_ylabel('Average Complaints per Day')

# Add value labels on bars
for i, v in enumerate(plot_data['Average Complaints']):
    ax.text(i, v + 10, f'{v:.0f}', ha='center')

# RIGHT-CLICK → SAVE IMAGE AS → Use in PowerPoint

## 5. Borough Breakdown (Secondary Analysis)

In [None]:
# Focus on Bronx vs Brooklyn
bronx_brooklyn = complaints[complaints['borough'].isin(['BRONX', 'BROOKLYN'])]

# Calculate by borough
borough_analysis = bronx_brooklyn.groupby(['borough', 'is_game_day']).size().unstack(fill_value=0)
borough_analysis.columns = ['Non-Game Days', 'Game Days']

# Calculate percent increase by borough
for borough in ['BRONX', 'BROOKLYN']:
    game = borough_analysis.loc[borough, 'Game Days'] / game_days
    non_game = borough_analysis.loc[borough, 'Non-Game Days'] / non_game_days
    pct = ((game - non_game) / non_game) * 100
    print(f"{borough}: {pct:.1f}% increase on game days")

## 6. Create Borough Comparison Chart

In [None]:
# Create data for borough comparison
borough_plot = []
for borough in ['BRONX', 'BROOKLYN']:
    for day_type in ['Non-Game Days', 'Game Days']:
        count = borough_analysis.loc[borough, day_type]
        days = non_game_days if day_type == 'Non-Game Days' else game_days
        avg = count / days
        borough_plot.append({
            'Borough': borough.title(),
            'Day Type': day_type.replace('Non-', ''),
            'Average Complaints': avg
        })

borough_df = pd.DataFrame(borough_plot)

# Create grouped bar chart
ax = sns.barplot(data=borough_df, x='Borough', y='Average Complaints', hue='Day Type')
ax.set_title('Borough Analysis: Bronx Shows Stronger Effect')
ax.legend(title='Day Type')

## Key Takeaways

1. **31.5% increase** in noise complaints on game days
2. **Bronx shows stronger effect** (38%) than Brooklyn (26%)
3. Every number here goes into your slides
4. Simple pandas and seaborn operations tell the whole story

**Note:** This notebook uses only seaborn for visualization (no matplotlib.pyplot needed!)