[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zjelveh/zjelveh.github.io/blob/master/files/cfc/ps/problem_set_3.ipynb)

**IMPORTANT**: Save your own copy!
1. Click File → Save a copy in Drive
2. Rename it: "PS3_YourName"
3. Work in YOUR copy

---

# Problem Set 3: COVID-19 and NYPD Arrests
## CCJS 418E - Fall 2025

**Due**: November 23, 11:59pm
**Submit**: Download as .ipynb AND .py - upload both

---

## Scenario

On March 16, 2020, New York City went into lockdown. Within days, the streets emptied. Schools closed. Subways ran near-empty. And NYPD arrests dropped sharply.

But here's what's interesting: **Not all arrest types dropped by the same amount.**

Some categories fell by 70%, others barely changed. This natural experiment provides insight into different types of law enforcement activity.

You're analyzing NYPD arrest data to understand:

- Which types of arrests depend on proactive enforcement vs. responding to calls for service?
- Did arrest patterns vary across NYC's five boroughs?
- When the city reopened, which arrest types returned to pre-COVID levels?

This analysis uses real administrative data to examine how external shocks affect law enforcement operations.

You may use AI tools to help write and debug your code.
- [TerpAI](https://terpai.umd.edu/chat)

## Setup: Import Libraries and Load Data

**Dataset**: NYPD arrests 2019-2021

This dataset covers the critical period around COVID-19:
- **2019**: Pre-COVID baseline
- **2020**: COVID shock and initial response
- **2021**: Recovery period

**Columns**:
- **ARREST_KEY**: Unique identifier for each arrest
- **ARREST_DATE**: Date the arrest occurred (MM/DD/YYYY)
- **PD_DESC**: Detailed description of the offense (NYPD classification)
- **OFNS_DESC**: General offense category (e.g., ASSAULT, ROBBERY, DANGEROUS DRUGS)
- **LAW_CAT_CD**: Severity level - F (Felony) or M (Misdemeanor)
- **ARREST_BORO**: Borough where arrest occurred - B (Bronx), K (Brooklyn), M (Manhattan), Q (Queens), S (Staten Island)
- **ARREST_PRECINCT**: NYPD precinct number (1-123)
- **AGE_GROUP**: Age category of person arrested (<18, 18-24, 25-44, 45-64, 65+)
- **PERP_SEX**: Gender (M/F)
- **PERP_RACE**: Racial/ethnic category

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

url = "https://raw.githubusercontent.com/zjelveh/zjelveh.github.io/refs/heads/master/files/cfc/ps/nypd_arrests_2019_2021_clean.csv"
df = pd.read_csv(url)

print(f"Loaded {len(df):,} arrests")
print(f"Columns: {list(df.columns)}")

In [None]:
# Key date: NYC COVID-19 lockdown
COVID_LOCKDOWN = pd.to_datetime('2020-03-16')

print(f"COVID-19 lockdown date: {COVID_LOCKDOWN.strftime('%B %d, %Y')}")
print("This was the day NYC issued stay-at-home orders")
print("Everything that follows stems from this moment\n")

---

## Part 1: March 16, 2020

**Focus**: Establish the baseline and identify when enforcement patterns shifted.

Before we can understand COVID's impact, we need to know what "normal" looked like. How many arrests did NYPD make in a typical month? When exactly did things change?

**Why this matters**: Understanding the timing helps us separate COVID effects from other policy changes. If arrests dropped overnight on March 16, that's COVID. If it was gradual, other factors might be involved.

### Task 1.1: Explore the data

Display the first 10 rows and use `.info()` to examine the data structure and data types.

In [None]:
# Display first 10 rows to understand the data structure
# YOUR CODE HERE

In [None]:
# Use .info() to check data types and missing values
# YOUR CODE HERE

### Task 1.2: Convert to datetime and extract components

Convert ARREST_DATE to datetime format, then extract the year, month, and month name into separate columns.

In [None]:
# Convert ARREST_DATE to datetime format
df['ARREST_DATE'] = None  # YOUR CODE HERE

print(df['ARREST_DATE'].dtype)
print(df['ARREST_DATE'].head())

In [None]:
# Extract year, month, and month name for analysis
df['year'] = None  # YOUR CODE HERE
df['month'] = None  # YOUR CODE HERE
df['month_name'] = None  # YOUR CODE HERE

print(df[['ARREST_DATE', 'year', 'month', 'month_name']].head(10))

### Task 1.3: Create the COVID period indicator

Create a `covid_period` column that labels each arrest as 'Pre-COVID' (before March 16, 2020) or 'COVID' (on/after March 16, 2020).

In [None]:
covid_start = pd.to_datetime('2020-03-16')

# Create covid_period column: 'Pre-COVID' or 'COVID'
# Hint: Use conditional logic with .loc[] to assign values based on date comparison
df['covid_period'] = None  # YOUR CODE HERE

print(df['covid_period'].value_counts())
print(f"\n{df['covid_period'].value_counts(normalize=True)['COVID']*100:.1f}% of arrests occurred during COVID period")

### Task 1.4: Establish the baseline - Arrests by year

Count total arrests per year and calculate the year-over-year percentage changes using the formula: `((new_year - old_year) / old_year) * 100`

For example, to calculate the change from 2019 to 2020: `((arrests_2020 - arrests_2019) / arrests_2019) * 100`

In [None]:
# Count total arrests per year
arrests_by_year = None  # YOUR CODE HERE

print("Total arrests by year:")
print(arrests_by_year)

# Calculate year-over-year changes
change_2019_2020 = None  # YOUR CODE HERE
change_2020_2021 = None  # YOUR CODE HERE

print(f"\n2019→2020: {change_2019_2020:+.1f}%")
print(f"2020→2021: {change_2020_2021:+.1f}%")

**Reflection Questions**:
- Did arrests recover by 2021, or are we in a "new normal"?
- What does the 2019→2020 drop tell us about COVID's immediate impact?

---

## Part 2: Comparing Different Types of Arrests

**Focus**: Examine whether different arrest categories responded differently to COVID restrictions.

**Background**:
Law enforcement activities can be classified into different types:
- **Proactive enforcement**: Officer-initiated encounters (drug possession, quality-of-life violations, traffic stops)
- **Reactive enforcement**: Responding to 911 calls (assault, robbery, burglary)

**Research Question**: Did COVID affect these categories differently?

**Hypothesis**: If COVID limited officer mobility and resources, proactive enforcement should decline more than reactive enforcement, since departments must still respond to emergency calls.

**Why this matters**: Understanding which activities are more flexible helps explain law enforcement resource allocation. If certain arrest types can vary substantially without affecting public safety, that suggests they're driven by policy choices rather than crime levels. This informs resource planning and policy decisions.

### Task 2.1: Testing the Hypothesis - Felonies vs Misdemeanors

Felonies tend to be more serious and reactive (someone calls 911 about an assault).
Misdemeanors include more discretionary enforcement (officer sees marijuana, decides to arrest).

**Prediction**: Misdemeanor arrests should drop more than felonies.

Calculate how many felonies and misdemeanors occurred in each year, then compute the percentage change from 2019 to 2020 for each category using the same formula from Task 1.4: `((2020_value - 2019_value) / 2019_value) * 100`

In [None]:
print("Distribution of arrest severity:")
print(df['LAW_CAT_CD'].value_counts())

# Group by year AND severity to see the pattern
arrests_by_severity = None  # YOUR CODE HERE

print("\nArrests by year and severity:")
print(arrests_by_severity)

In [None]:
# Extract specific values using .loc for comparison
# Example: arrests_by_severity.loc[2019, 'F'] gets felonies in 2019

felony_2019 = None  # YOUR CODE HERE
felony_2020 = None  # YOUR CODE HERE
misdemeanor_2019 = None  # YOUR CODE HERE
misdemeanor_2020 = None  # YOUR CODE HERE

# Calculate percent change drops
felony_drop = None  # YOUR CODE HERE
misdemeanor_drop = None  # YOUR CODE HERE

print(f"Felonies dropped: {felony_drop:.1f}%")
print(f"Misdemeanors dropped: {misdemeanor_drop:.1f}%")
print(f"\nDifference: {abs(misdemeanor_drop - felony_drop):.1f} percentage points")


### Task 2.1b: Visualize the Discretionary Enforcement Pattern

Create a bar plot using `sns.barplot()` to visualize how felonies and misdemeanors changed across the three years.

- **x-axis**: year
- **y-axis**: number of arrests
- **hue**: LAW_CAT_CD (to show felonies vs misdemeanors as different colored bars)
- **data**: your grouped dataframe from the previous step

In [None]:
# For the barplot, use sns.barplot() with year on x-axis, arrests on y-axis, hue=LAW_CAT_CD
# YOUR CODE HERE



**Reflection**: What does this reveal about discretionary enforcement? Does the visualization support the hypothesis?

### Task 2.2: When Did the Floor Fall Out? - Monthly Trends

Create a monthly time series and identify which month had the fewest arrests.

In [None]:
# Create monthly period column for proper time series. (Use the to_period function)
df['year_month'] = None  # YOUR CODE HERE

# Count arrests per month
monthly_arrests = None  # YOUR CODE HERE

monthly_df = monthly_arrests.reset_index()
monthly_df.columns = ['year_month', 'arrests']

print(monthly_df)

In [None]:
# Find the lowest month
monthly_sorted = None  # YOUR CODE HERE

print("5 months with fewest arrests:")
print(monthly_sorted.head(5))

# Use .iloc[0] to get first row by position
lowest = monthly_sorted.iloc[0]
print(lowest)


**Note**: `.iloc[0]` gets first row by position. `.loc` would use label-based indexing.

### Task 2.2b: Visualize the Collapse

Create a line plot using `sns.lineplot()` to visualize the monthly arrest trend over time.

- **x-axis**: year_month
- **y-axis**: arrests (monthly count)
- **data**: monthly_df from the previous step

In [None]:
# Create a line plot showing monthly arrests over time
# YOUR CODE HERE



### Task 2.3: Which Specific Crimes Disappeared?

Identify the top 10 most common offense types overall (across all years), then calculate arrests for each of these offense types **grouped by both offense type AND year**.

You should end up with a result showing how many arrests each offense type had in 2019, 2020, and 2021.

In [None]:
# Find the top 10 most common offense types
top_offenses = None  # YOUR CODE HERE

print("Top 10 offense types overall:")
print(top_offenses)

# This assumes you used value_counts to make top_offenses
top_offenses_df = top_offenses.reset_index()
top_offenses_df.columns = ['offense', 'count']
top_names = top_offenses_df['offense']

df_top = df[df['OFNS_DESC'].isin(None)] # YOUR CODE HERE

# How did each offense type change by year?
offense_by_year = None  # YOUR CODE HERE

print("\nTop offenses by year:")
print(offense_by_year.head(30))

**Reflection**: Looking at the offense types that dropped most:
- Are they proactive enforcement (drugs, traffic, quality-of-life)?
- Or reactive to 911 calls (assault, robbery, burglary)?

---

## Part 3: Geographic Variation Across NYC Boroughs

**Focus**: Examine whether arrest patterns changed uniformly across NYC's five boroughs.

NYC has 5 boroughs with different characteristics:
- **Manhattan (M)**: Commercial center, many commuters
- **Brooklyn (K)**: Most populous borough
- **Queens (Q)**: Diverse, residential
- **Bronx (B)**: High population density
- **Staten Island (S)**: Smallest population, most suburban

**Research Question**: Did all boroughs experience similar changes in arrest patterns during COVID?

**Why this matters**: Geographic variation can reveal whether changes were driven by uniform policy decisions or varied by local conditions and enforcement priorities. Comparing arrest rates (not just raw counts) accounts for population differences across boroughs.

**Note**: We'll use **rates and percentages** (not raw counts) to make fair comparisons across boroughs of different sizes.

<img src='https://www.nycgo.com/images/460x285/BoroughsMap_460x285_B.jpg' width='400'>

### Task 3.1: Calculate arrest rates by borough and year

Calculate arrests by borough and year, then determine what percentage of **that year's total arrests** came from each borough.

For example: If Brooklyn had 50,000 arrests in 2019 and NYC had 200,000 total arrests in 2019, Brooklyn would be 25% of that year's total.

**Approach:**
1. Group by borough and year to get arrests per borough per year
2. Calculate total arrests per year (across all boroughs)
3. Merge these together
4. Calculate: `pct_of_total = (borough_arrests / year_total) * 100`

In [None]:
# First, count arrests by borough and year
borough_year = None  # YOUR CODE HERE

# Convert to DataFrame for easier manipulation
borough_year_df = borough_year.reset_index(name='arrests')
print(borough_year_df)

# Calculate what percentage of ALL arrests came from each borough in each year
# Step 1, create a dataframe that counts the number of arrets each year
# Hint: use groupby, and reset_index with name=
# Step 2, Merge this onto borough_year_df on 'year'
# Step 3, Create a pct_of_total column

### Task 3.2: Merge Analysis - Felony Percentage by Borough

Calculate what percentage of each borough's arrests were felonies **in each year** (group by both borough AND year). Did some boroughs shift toward more serious enforcement during COVID?

**Approach**: You'll need to merge two dataframes:
- Table 1: Total arrests by borough and year
- Table 2: Felony arrests by borough and year
- Then calculate: `felony_pct = (felony_arrests / total_arrests) * 100`

In [None]:
# We want to know: Did some boroughs shift toward more serious enforcement?
# Calculate: What % of each borough's arrests were felonies?

# Table 1: Total arrests by borough and year
# YOUR CODE HERE

# Table 2: Felony arrests by borough and year
# YOUR CODE HERE

# Merge and calculate felony_pct
# YOUR CODE HERE



### Task 3.3: Calculate percentage change by borough

Calculate the percentage change in arrests from 2019 to 2020 for each borough using the same formula from Task 1.4: `((2020_value - 2019_value) / 2019_value) * 100`

In [None]:
# Calculate percent change 2019 → 2020 for each borough
# This shows: Which borough saw the biggest DROP in arrests?
# Step 1, create new data frame that is borough_year_df filtered to 2019
# Step 2, create new data frame that is borough_year_df filtered to 2020
# Step 3, merge these two
# Step 4, calculate percentage change

# YOUR CODE HERE



### Task 3.3b: Visualize percentage changes

Create a bar plot using `sns.barplot()` showing which boroughs saw the steepest percentage drops.

- **x-axis**: ARREST_BORO (borough codes)
- **y-axis**: pct_change (the percentage change column you calculated)
- **data**: your merged dataframe from the previous step

In [None]:
# Create a barplot showing percentage change by borough
# This visualization should show which boroughs saw the steepest percentage drops
# Use sns.barplot()
# YOUR CODE HERE

**Interpretation Questions**:
- Did all boroughs experience similar percentage drops?
- Which borough saw the smallest percentage decrease?
- What might explain why some boroughs' enforcement dropped more than others?

---

## Part 4: Age Patterns and School Closures

**Focus**: Compare arrest patterns across age groups before and during COVID.

**Background**: COVID-19 closed NYC schools from March 2020 through much of 2021. This created a natural experiment: what happens to arrests of different age groups when schools close?

Before COVID:
- School resource officers operated in many schools
- Students had routine contact with law enforcement during school hours
- Truancy policies brought police into contact with teenagers

**Research Question**: Did juvenile arrests (age <18) change differently than adult arrests during COVID?

**Hypothesis**: If school attendance affects arrest patterns, juvenile arrests should decline more than adult arrests when schools close.

**Why this matters**: This analysis helps quantify the relationship between school operations and juvenile justice system contact. The data can inform decisions about resource allocation and school-based law enforcement programs.

In [None]:
# Create is_juvenile column (age < 18)
df['is_juvenile'] = None  # YOUR CODE HERE
print(df['is_juvenile'].value_counts())

# Calculate: What share of arrests in 2019 were juveniles?
juv_2019_share = None  # YOUR CODE HERE
print(f"\nIn 2019, {juv_2019_share:.1f}% of arrests were juveniles")

### Task 4.1: Compare juvenile vs adult arrest drops

Calculate the percentage drop in arrests from 2019 to 2020 separately for juveniles (is_juvenile=True) and adults (is_juvenile=False) using the percentage change formula from Task 1.4.

Then answer: Did juvenile arrests drop more or less than adult arrests?

Also calculate what share of 2020 arrests were juveniles: `(juvenile_arrests_2020 / total_arrests_2020) * 100`

In [None]:
# Group by year and juvenile status
# Calculate percent drops 2019 → 2020
# YOUR CODE HERE

# Suggested approach:
# juv_by_year = df.groupby(['year', 'is_juvenile']).size().reset_index(name='arrests')
# Extract 2019 and 2020 values for juveniles and adults
# Calculate percent change for each



---

## Part 5: The "New Normal" - Which Arrests Never Came Back

**Focus**: Identify permanent vs. temporary shifts in enforcement.

By 2021, NYC was reopening:
- Restaurants and businesses reopened
- Schools were back (hybrid at first)
- Subway ridership was recovering

But arrests didn't fully recover. Some types stayed at COVID levels.

**Why this matters**: Permanent drops might indicate:
- NYPD learned these arrests were unnecessary
- Political pressure changed priorities
- Permanent policy reforms (e.g., marijuana decriminalization)

Temporary drops suggest the arrests were just delayed, not eliminated.

### Task 5.1: Calculate recovery rates

Calculate recovery rates: what percentage of the 2019-2020 drop was recovered by 2021?

In [None]:
# Recovery formula: (2021 - 2020) / (2019 - 2020) * 100
# 100% = full recovery, 0% = stayed at COVID levels, >100% = exceeded baseline


# Calculate for felonies and misdemeanors
# YOUR CODE HERE

# Suggested approach:
# felony_2021 = arrests_by_severity.loc[2021, 'F']
# felony_recovery = ((felony_2021 - felony_2020) / (felony_2019 - felony_2020)) * 100

### Task 5.2: Which offense types never recovered?

For the top 15 offense types, calculate 2021 arrests as a percentage of 2019 levels.

In [None]:
# Step 1: Determine top 15 offense types across all years
# Step 2: Create dataframe containing 2019 arrests belonging to those 15 offense types
# Step 3: Create dataframe containing 2021 arrests belonging to those 15 offense types
# Step 4: Create dataframe containing 2019 total counts of each 15 offense type
# Step 5: Create dataframe containing 2021 total counts of each 15 offense type
# Step 6: Merge these data frames on offense type
# Step 7: Create column showing 2021 arrests as percentage of 2019 



### Task 5.2b: Visualize the Recovery Gap

Create a bar plot using `sns.barplot()` showing which offense types recovered vs. stayed suppressed.

- **x-axis**: recovery percentage (2021 as % of 2019)
- **y-axis**: offense type
- **data**: your merged dataframe from the previous step, sorted by recovery percentage

**Hint**: Use `orient='h'` in barplot for horizontal bars to make offense names easier to read.

In [None]:
# Create a barplot showing recovery rates for different offense types


**Interpretation**:
- Which offense types are still below 80% of 2019 levels?
- Are these discretionary or reactive crimes?
- What might explain why some arrests never came back?

---

## Part 6: Executive Summary - Key Findings (EXTRA CREDIT)

**This section is optional extra credit. Complete Parts 1-5 first.**

**Focus**: Synthesize your analysis into a clear summary of findings.

You've completed the analysis. Now create an executive summary that presents the key empirical patterns you discovered.

**Your task**: Write a structured summary of findings
**Your audience**: Policy analysts and researchers
**Your goal**: Present what the data reveals about arrest patterns during COVID
**Your evidence**: Specific numbers and percentages from your analysis above

In [None]:
print("=" * 70)
print("EXECUTIVE SUMMARY: COVID-19 AND NYPD ARREST PATTERNS")
print("Analysis Period: 2019-2021")
print("")
print("")
print("=" * 70)
print()

print("EXECUTIVE SUMMARY:")
print("This analysis examines how COVID-19 restrictions affected different")
print("categories of arrests using NYPD administrative data (N=500K+ arrests).")
print()

# YOUR FINDINGS HERE - Use specific numbers from your analysis
# Include at least 5 key findings:

print("1. DIFFERENTIAL IMPACT BY ARREST TYPE")
# YOUR CODE HERE - Use the misdemeanor vs felony drop percentages
# Example: print(f"   - Misdemeanors dropped {misdemeanor_drop:.1f}% vs felonies {felony_drop:.1f}%")

print("\n2. SCHOOL CLOSURES INTERRUPTED CRIMINALIZATION OF YOUTH")
# YOUR CODE HERE - Use the juvenile vs adult drop percentages

print("\n3. GEOGRAPHIC VARIATION")
# YOUR CODE HERE - Which borough maintained highest enforcement?

print("\n4. PERSISTENCE OF CHANGES")
# YOUR CODE HERE - Which offense types stayed below 80% of 2019?

print("\n5. IMPLICATIONS")
print("   Based on this analysis, we recommend:")
print("   - [YOUR INTERPRETATION based on discretionary enforcement finding]")
print("   - [YOUR INTERPRETATION based on school-based enforcement finding]")
print("   - [YOUR INTERPRETATION based on recovery patterns]")

## Submission Checklist

Before submitting, make sure:
- [ ] All code cells run without errors
- [ ] All visualizations use seaborn
- [ ] Used pd.to_datetime(), .dt accessor correctly
- [ ] Used .iloc and .loc for indexing
- [ ] Created covid_period column
- [ ] Completed all merge operations
- [ ] All 5 visualizations are complete
- [ ] Summary memo has specific numbers from your analysis
- [ ] Interpretations are based on your findings

**Key concepts to understand**:
- What discretionary vs reactive policing means
- Why the felony/misdemeanor comparison tests this
- What it means that some arrests never recovered
- How your findings relate to resource allocation decisions