[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zjelveh/zjelveh.github.io/blob/master/files/cfc/ps/problem_set_2.ipynb)

**IMPORTANT**: Save your own copy!
1. Click File → Save a copy in Drive
2. Rename it with your name like: "PS2_YourName"
3. Work in YOUR copy, not the original


---


# Problem Set 2: DC Crime Analysis with Pandas
## CCJS 418E - Fall 2025

**Due**: Sunday, October 26, 11:59pm

**Submit**: If you are working on Colab, download your notebook **TWICE**. Once as .ipynb and once as .py. **UPLOAD BOTH FILES.** 
- [Submit here](https://umd.instructure.com/courses/1389501/assignments/7376099)

**Code Review**: Tuesday, October 29 in class

---

## Scenario

On August 7, 2025, the federal government initiated an emergency intervention in DC's public safety system, deploying federal resources to assist the Metropolitan Police Department. Local community organizations want to understand whether this "Federal Surge" has had any measurable impact on crime patterns.

You've been asked to analyze DC crime data from 2025 to help these organizations understand:
- What crime patterns looked like before the federal intervention
- Whether there have been changes since August 7
- Which wards have been most affected

This analysis will inform community advocacy and help residents understand the effectiveness of the federal intervention.

You may use AI tools to help write and debug your code. During the code review, you'll explain your logic and demonstrate you understand how pandas operations work.
- [Link to TerpAI](https://terpai.umd.edu/chat)

## Setup: Import Libraries and Load Data

First, we need to import pandas and load the DC crime data. The dataset contains incidents reported from 30 days before to 30 days after Aug 7, 2025.

**Note**: The dataset includes helpful pre-calculated columns:
- `intervention_period`: Labels crimes as "Pre-Intervention" or "Post-Intervention" (before/after August 7)
- `days_before_after`: Number of days from intervention (negative = before, positive = after)
- `is_violent`, `is_property`, `is_weekend`: Boolean flags for crime types and timing

In [None]:
# Import pandas
import pandas as pd

# Load the DC crime data
# Dataset columns: report_dat, offense, ward, shift, day_of_week, 
# intervention_period, days_before_after, is_after_intervention,
# is_violent, is_property, is_weekend

url = "https://raw.githubusercontent.com/zjelveh/zjelveh.github.io/refs/heads/master/files/cfc/ps/dc_crime_2025_sample.csv"
df = pd.read_csv(url)

# Display basic information about the dataset
print("DC Crime Data loaded successfully!")
print(f"Shape of data: {df.shape}")
print(f"\nColumn names: {list(df.columns)}")


In [None]:

# Important constants for calculating daily averages
DAYS_BEFORE_INTERVENTION = 30  # The thirty days before Aug 7 
DAYS_AFTER_INTERVENTION = 30    # The thirty days after Aug 7

print(f"\nDays before intervention: {DAYS_BEFORE_INTERVENTION}")
print(f"Days after intervention: {DAYS_AFTER_INTERVENTION}")

## Part 1: Data Exploration and Basic Operations
*Using concepts from lectures: Loading data, .head(), .info(), column operations*

Let's explore the dataset to understand DC's crime landscape in 2025.

In [None]:
# Task 1.1: Display the first 10 rows of the dataset
# YOUR CODE HERE

In [None]:
# Task 1.2: Use .info() to understand the data types
# YOUR CODE HERE

In [None]:
# Task 1.3: Calculate basic statistics for the entire dataset
# How many total crimes are in the dataset?
# How many unique offense types are there?
# How many unique wards?

total_crimes = None  # YOUR CODE HERE
unique_offenses = None  # YOUR CODE HERE
unique_wards = None  # YOUR CODE HERE

print(f"Total crimes: {total_crimes}")
print(f"Unique offense types: {unique_offenses}")
print(f"Unique wards: {unique_wards}")

In [None]:
# Task 1.4: What are the top 5 most common crime types in DC?
# Use value_counts() to find out

top_5_crimes = None  # YOUR CODE HERE

print("Top 5 most common offenses in DC:")
print(top_5_crimes)

In [None]:
# Task 1.5: How are crimes distributed across the intervention periods?
# Count how many crimes occurred before vs after the intervention

period_counts = None  # YOUR CODE HERE - use value_counts() on intervention_period

print("Crimes by intervention period:")
print(period_counts)

## Part 2: Overall Impact - Filtering to Compare Before and After
*Using concepts from lectures: Boolean filtering, multiple conditions*

**Focus**: Use filtering to understand the city-wide impact of the intervention. Did crime decrease overall? Did violent and property crimes respond differently?

In [None]:
# Task 2.1: Filter the data to create separate DataFrames for before and after
pre_intervention = None  # YOUR CODE HERE - filter where intervention_period == 'Pre-Intervention'
post_intervention = None  # YOUR CODE HERE - filter where intervention_period == 'Post-Intervention'

print(f"Crimes before intervention: {len(pre_intervention)}")
print(f"Crimes after intervention: {len(post_intervention)}")

# Calculate daily averages (total crimes divided by number of days)
daily_pre = None  # YOUR CODE HERE - divide by DAYS_BEFORE_INTERVENTION
daily_post = None  # YOUR CODE HERE - divide by DAYS_AFTER_INTERVENTION

print(f"\nAverage daily crimes BEFORE: {daily_pre:.1f}")
print(f"Average daily crimes AFTER: {daily_post:.1f}")
print(f"Percent Change (new - old) / old: {((daily_post - daily_pre) / daily_pre * 100):+.1f}%")

In [None]:
# Task 2.2: Filter for violent crimes city-wide
# The dataset includes an 'is_violent' column (True/False)

violent_crimes = None  # YOUR CODE HERE - filter where is_violent == True

print(f"Total violent crimes: {len(violent_crimes)}")
print(f"Percentage of all crimes that are violent: {len(violent_crimes)/len(df)*100:.1f}%")

In [None]:
# Task 2.3: Compare violent crime rates before and after the intervention
# What percentage of crimes were violent before vs after?

violent_pre = None  # YOUR CODE HERE - filter for violent crimes AND pre-intervention
violent_post = None  # YOUR CODE HERE - filter for violent crimes AND post-intervention

# Calculate percentages
violent_rate_pre = None  # YOUR CODE HERE - (violent_pre count / pre_intervention count) * 100
violent_rate_post = None  # YOUR CODE HERE - (violent_post count / post_intervention count) * 100

print(f"Violent crime rate BEFORE intervention: {violent_rate_pre:.1f}%")
print(f"Violent crime rate AFTER intervention: {violent_rate_post:.1f}%")
print(f"Change: {violent_rate_post - violent_rate_pre:+.1f} percentage points")

In [None]:
# Task 2.4: Focus on property crimes city-wide
# The dataset includes an 'is_property' column (True/False)
# Have property crimes decreased since the intervention?

property_pre = None  # YOUR CODE HERE - filter for property crimes AND pre-intervention
property_post = None  # YOUR CODE HERE - filter for property crimes AND post-intervention

# Calculate daily averages
daily_property_pre = None  # YOUR CODE HERE
daily_property_post = None  # YOUR CODE HERE

print(f"Average daily property crimes BEFORE: {daily_property_pre:.1f}")
print(f"Average daily property crimes AFTER: {daily_property_post:.1f}")
print(f"Change: {((daily_property_post - daily_property_pre) / daily_property_pre * 100):+.1f}%")

In [None]:
# Task 2.5: Look at the immediate impact - the first week after intervention
# Filter for crimes where days_before_after is in specific ranges

week_before = None  # YOUR CODE HERE - filter where days_before_after between -7 and -1
week_after = None  # YOUR CODE HERE - filter where days_before_after between 1 and 7

print(f"Crimes in week BEFORE intervention: {len(week_before)}")
print(f"Crimes in week AFTER intervention: {len(week_after)}")
print(f"Immediate change: {((len(week_after) - len(week_before)) / len(week_before) * 100):+.1f}%")

## Part 3: Geographic Analysis - Identifying High-Crime Wards
*Using concepts from lectures: Groupby, creating columns, sorting*

**Focus**: Crime isn't evenly distributed across DC. Identify which wards have the most crime and create a column to flag them for later analysis.

<center>
<img src='https://planning.dc.gov/sites/default/files/dc/sites/op/page_content/images/2022Wards_small.png'>

In [None]:
# Task 3.1: Identify the high-crime wards
# Count how many crimes occurred in each ward

ward_crime_counts = None  # YOUR CODE HERE - group by ward, use .size()

print("Crimes by ward:")
print(ward_crime_counts.sort_values(ascending=False))

# Now identify the top 3 wards with the most crime
# The .index gives us the ward numbers (the labels) from the Series
top_3_ward_numbers = ward_crime_counts.nlargest(3).index

print(f"\nTop 3 high-crime wards: {list(top_3_ward_numbers)}")

In [None]:
# Task 3.2: Create a flag column for high-crime wards
# Use the top 3 ward numbers you just identified to create a True/False column

df['is_high_crime_ward'] = None  # YOUR CODE HERE - use .isin() with top_3_ward_numbers

# Verify it worked
high_ward_crimes = None  # YOUR CODE HERE - count crimes where is_high_crime_ward == True
print(f"\nCrimes in top 3 wards: {high_ward_crimes} ({high_ward_crimes / len(df) * 100:.1f}%)")
print(f"This means {high_ward_crimes / len(df) * 100:.1f}% of DC's crime is concentrated in just 3 wards")

In [None]:
# Task 3.3: Which wards have the lowest crime?
# Find the 3 wards with the fewest crimes

bottom_3_wards = None  # YOUR CODE HERE - use .nsmallest() on ward_crime_counts

print("3 wards with lowest crime:")
print(bottom_3_wards)

## Part 4: Breaking Down the Impact - Using Groupby to Find Patterns
*Using concepts from lectures: .groupby(), .size(), .sum(), .mean()*

**Focus**: Now that we know what the intervention did overall, let's use groupby to break down the impact by different categories. Which wards saw the biggest changes? Which days of the week? Which offense types?

In [None]:
# Task 4.1: Compare crime levels by intervention period using groupby
# Group by intervention_period and count total crimes

crimes_by_period = None  # YOUR CODE HERE - groupby intervention_period, use .size()

print("Total crimes by intervention period:")
print(crimes_by_period)

# Calculate daily averages for each period
print("\nDaily averages:")
print(f"Pre-intervention: {crimes_by_period['Pre-Intervention'] / DAYS_BEFORE_INTERVENTION:.1f} crimes/day")
print(f"Post-intervention: {crimes_by_period['Post-Intervention'] / DAYS_AFTER_INTERVENTION:.1f} crimes/day")

In [None]:
# Task 4.2: How many violent vs property crimes in each period?
# Group by intervention_period and sum the is_violent and is_property columns

violence_by_period = None  # YOUR CODE HERE - groupby intervention_period, sum is_violent
property_by_period = None  # YOUR CODE HERE - groupby intervention_period, sum is_property

print("Violent crimes by period:")
print(violence_by_period)
print("\nProperty crimes by period:")
print(property_by_period)

In [None]:
# Task 4.3: Which wards have the most crimes in each period?
# This helps us see if crime shifted geographically after the intervention

# For pre-intervention period
wards_pre = None  # YOUR CODE HERE - filter for pre, group by ward, use .size()
top_5_wards_pre = None  # YOUR CODE HERE - get top 5 using .nlargest()

# For post-intervention period
wards_post = None  # YOUR CODE HERE - filter for post, group by ward, use .size()
top_5_wards_post = None  # YOUR CODE HERE - get top 5 using .nlargest()

print("Top 5 wards by crime count:")
print("\nPre-intervention:")
print(top_5_wards_pre)
print("\nPost-intervention:")
print(top_5_wards_post)

In [None]:
# Task 4.4: Which specific offense types are most common in each period?
# This shows whether the mix of crimes has changed

# For pre-intervention period
offenses_pre = None  # YOUR CODE HERE - filter for pre, group by offense, use .size()
top_5_offenses_pre = None  # YOUR CODE HERE - get top 5 using .nlargest()

# For post-intervention period
offenses_post = None  # YOUR CODE HERE - filter for post, group by offense, use .size()
top_5_offenses_post = None  # YOUR CODE HERE - get top 5 using .nlargest()

print("Top 5 offense types:")
print("\nPre-intervention:")
print(top_5_offenses_pre)
print("\nPost-intervention:")
print(top_5_offenses_post)

In [None]:
# Task 4.5: Analyze day-of-week patterns before and after
# Has the federal presence changed which days see the most crime?

# For pre-intervention period
dow_pre = None  # YOUR CODE HERE - filter pre, group by day_of_week, use .size()

# For post-intervention period  
dow_post = None  # YOUR CODE HERE - filter post, group by day_of_week, use .size()

# Sort both to see which day is busiest
dow_pre_sorted = None  # YOUR CODE HERE
dow_post_sorted = None  # YOUR CODE HERE

print("Crimes by day of week (sorted by count):")
print("\nPre-intervention:")
print(dow_pre_sorted)
print("\nPost-intervention:")
print(dow_post_sorted)

## Part 5: Advanced Comparisons
*Combining filtering, grouping, and calculations*

**Focus**: Answer more nuanced questions by combining the techniques you've learned.

In [None]:
# Task 5.1: Compare violent crime rates across wards before and after
# For each ward, what percentage of crimes are violent in each period?

# Pre-intervention: for each ward, calculate percent violent
pre_data = df[df['intervention_period'] == 'Pre-Intervention']
ward_violent_pct_pre = None  # YOUR CODE HERE - group by ward, calculate mean of is_violent, multiply by 100

# Post-intervention: for each ward, calculate percent violent
post_data = df[df['intervention_period'] == 'Post-Intervention']
ward_violent_pct_post = None  # YOUR CODE HERE - group by ward, calculate mean of is_violent, multiply by 100

print("Violent crime percentage by ward:")
print("\nPre-intervention:")
print(ward_violent_pct_pre)
print("\nPost-intervention:")
print(ward_violent_pct_post)

In [None]:
# Task 5.2: Weekend vs weekday crime patterns
# Are there different offense types on weekends vs weekdays?

weekend_crimes = None  # YOUR CODE HERE - filter where is_weekend == True
weekday_crimes = None  # YOUR CODE HERE - filter where is_weekend == False

# What are the top 5 offenses on weekends?
weekend_offenses = None  # YOUR CODE HERE - group weekend_crimes by offense, use .size()
top_5_weekend = None  # YOUR CODE HERE - get top 5

# What are the top 5 offenses on weekdays?
weekday_offenses = None  # YOUR CODE HERE - group weekday_crimes by offense, use .size()
top_5_weekday = None  # YOUR CODE HERE - get top 5

print("Top 5 weekend offenses:")
print(top_5_weekend)
print("\nTop 5 weekday offenses:")
print(top_5_weekday)

In [None]:
# Task 5.3: Compare high-crime wards to other wards
# Did high-crime wards improve more or less than other wards?

# High-crime wards (where is_high_crime_ward == True)
high_ward_pre = None  # YOUR CODE HERE - filter for high crime wards AND pre-intervention
high_ward_post = None  # YOUR CODE HERE - filter for high crime wards AND post-intervention

# Other wards (where is_high_crime_ward == False)
other_ward_pre = None  # YOUR CODE HERE
other_ward_post = None  # YOUR CODE HERE

# Calculate daily averages
high_daily_pre = len(high_ward_pre) / DAYS_BEFORE_INTERVENTION
high_daily_post = len(high_ward_post) / DAYS_AFTER_INTERVENTION
other_daily_pre = len(other_ward_pre) / DAYS_BEFORE_INTERVENTION
other_daily_post = len(other_ward_post) / DAYS_AFTER_INTERVENTION

print("Daily crime rates:")
print(f"\nHigh-crime wards:")
print(f"  Before: {high_daily_pre:.1f} crimes/day")
print(f"  After: {high_daily_post:.1f} crimes/day")
print(f"\nOther wards:")
print(f"  Before: {other_daily_pre:.1f} crimes/day")
print(f"  After: {other_daily_post:.1f} crimes/day")

## Part 6: Policy Impact Summary

Create a comprehensive summary that community organizations can use to understand the federal intervention's impact. Use specific numbers from your analysis above.

In [None]:
# Create a data-driven summary of the federal intervention's impact
# Include specific statistics from your analysis

print("=" * 70)
print("FEDERAL INTERVENTION IMPACT ANALYSIS: DC CRIME DATA")
print("Intervention Date: August 7, 2025")
print("=" * 70)
print()

# YOUR CODE HERE - Add 5-7 key findings that would matter to community organizations
# Use SPECIFIC NUMBERS from your analysis above
# Consider:
# - Overall crime trends (daily averages before vs after from Task 2.1)
# - Which crime types have been most affected (violent vs property from Task 2.3-2.4)
# - Which wards have seen changes (from Task 4.3)
# - Immediate impact (first week from Task 2.5) vs sustained impact (overall from Task 2.1)
# - Day of week patterns that have changed (from Task 4.5)
# - Geographic concentration findings (from Part 3)
# - High-crime ward response (from Task 5.3)

print("KEY FINDINGS:")
print()
print("1. Overall Impact:")
### YOUR CODE HERE

## Submission Checklist

Before submitting, make sure:
- [ ] All code cells run without errors
- [ ] All tasks are completed (look for `YOUR CODE HERE` markers)
- [ ] Your results make sense (check for reasonable percentages and counts)
- [ ] You've calculated daily averages correctly (dividing by the right number of days)
- [ ] You've included the policy impact summary with specific numbers from your analysis
- [ ] File is saved as PS2_YourFirstName_YourLastName.ipynb

Remember: During code review, be prepared to:
- Explain what any line of your code does
- Modify filters or groupby operations when asked
- Describe what the federal intervention analysis reveals
- Calculate daily averages
- Discuss whether your findings suggest the intervention is working
- Explain the difference between .size(), .count(), and .sum()
- Show how you combined multiple conditions with & and |
- Explain why we use groupby for some questions and filtering for others