[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zjelveh/zjelveh.github.io/blob/master/files/cfc/8_datetime_operations_solutions.ipynb)

**IMPORTANT**: Save your own copy!
1. Click File â†’ Save a copy in Drive
2. Rename it
3. Work in YOUR copy, not the original

---

# 8. Datetime Operations with Pandas - SOLUTIONS
## CCJS 418E: Coding for Criminology

This notebook contains worked solutions for Lab 8 exercises.

## Setup: Import Libraries and Load Data

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns

# Load the data
df = pd.read_csv('https://raw.githubusercontent.com/zjelveh/zjelveh.github.io/refs/heads/master/files/cfc/nypd_arrests_2013_2015_garner.csv')

# Convert ARREST_DATE to datetime (needed for all exercises)
df['ARREST_DATE'] = pd.to_datetime(df['ARREST_DATE'])

# Define key dates for reference
pullback_start = pd.to_datetime('2014-12-01')
pullback_end = pd.to_datetime('2015-02-28')

# Create the period column (needed for some exercises)
df['period'] = 'Baseline'
df.loc[df['ARREST_DATE'].between(pullback_start, pullback_end), 'period'] = 'Pullback'
df.loc[df['ARREST_DATE'] > pullback_end, 'period'] = 'Recovery'

print(f"Loaded {len(df):,} arrest records")
df.head()

## Quick Check #1 - SOLUTION

**Task:** Extract the year, month, and day of week from ARREST_DATE.

In [None]:
# SOLUTION
# Extract date components using .dt
df['year'] = df['ARREST_DATE'].dt.year
df['month'] = df['ARREST_DATE'].dt.month
df['day_of_week'] = df['ARREST_DATE'].dt.day_name()

print("Date components extracted successfully!")
print(df[['ARREST_DATE', 'year', 'month', 'day_of_week']].head(10))

## Quick Check #2 - SOLUTION

**Task:** How many unique quarters are in the dataset?

In [None]:
# SOLUTION
df['quarter'] = df['ARREST_DATE'].dt.to_period('Q')
print(f"Unique quarters: {df['quarter'].nunique()}")
print(f"\nQuarters in dataset:")
print(sorted(df['quarter'].unique()))

# Clean up
df = df.drop(columns=['quarter'])

## Quick Check #3 - SOLUTION

**Task:** Filter the data for arrests that occurred in January 2015 only.

In [None]:
# SOLUTION
# Method 1: Comparison operators
jan_2015_v1 = df[(df['ARREST_DATE'] >= pd.to_datetime('2015-01-01')) &
                 (df['ARREST_DATE'] <= pd.to_datetime('2015-01-31'))]

# Method 2: .between()
jan_2015_v2 = df[df['ARREST_DATE'].between(pd.to_datetime('2015-01-01'),
                                            pd.to_datetime('2015-01-31'))]

print(f"January 2015 arrests (Method 1): {len(jan_2015_v1):,}")
print(f"January 2015 arrests (Method 2): {len(jan_2015_v2):,}")
print(f"Same result? {len(jan_2015_v1) == len(jan_2015_v2)}")

## Quick Check #4 - SOLUTION

**Task:** Calculate the percentage change in daily arrest rates from pullback to recovery.

In [None]:
# SOLUTION
# First, create the filtered datasets
baseline = df[df['ARREST_DATE'] < pullback_start]
pullback = df[df['ARREST_DATE'].between(pullback_start, pullback_end)]
recovery = df[df['ARREST_DATE'] > pullback_end]

# Calculate number of days in each period
baseline_days = (pullback_start - df['ARREST_DATE'].min()).days
pullback_days = (pullback_end - pullback_start).days + 1
recovery_days = (df['ARREST_DATE'].max() - pullback_end).days

# Calculate daily rates
baseline_daily = len(baseline) / baseline_days
pullback_daily = len(pullback) / pullback_days
recovery_daily = len(recovery) / recovery_days

# Calculate percentage change from pullback to recovery
pct_change_recovery = ((recovery_daily - pullback_daily) / pullback_daily) * 100

print(f"Daily rate during pullback: {pullback_daily:.1f} arrests/day")
print(f"Daily rate during recovery: {recovery_daily:.1f} arrests/day")
print(f"Percentage change: {pct_change_recovery:+.1f}%")

# Compare to original baseline
print(f"\nOriginal baseline: {baseline_daily:.1f} arrests/day")
print(f"Recovery as % of baseline: {(recovery_daily / baseline_daily * 100):.1f}%")

## Exercise 1: Day of Week Analysis - SOLUTION

**Task:** Calculate which day of the week has the highest average arrest count. Did this pattern change during the pullback?

In [None]:
# SOLUTION
# Group by day of week to see overall pattern
day_counts = df.groupby('day_of_week').size().sort_values(ascending=False)

print("Total arrests by day of week:")
print(day_counts)
print(f"\nHighest arrest day: {day_counts.index[0]} with {day_counts.values[0]:,} arrests")

In [None]:
# Now check if the pattern changed during pullback
day_period = df.groupby(['period', 'day_of_week']).size().reset_index(name='arrests')

print("\nArrests by day of week and period:")
print(day_period.pivot(index='day_of_week', columns='period', values='arrests'))

# Visualize
sns.barplot(data=day_period,
            x='day_of_week',
            y='arrests',
            hue='period',
            palette='Set2')

print("\nInterpretation: The pattern shows arrests dropped across ALL days during")
print("the pullback, but the relative ordering (which days are busiest) stayed similar.")

## Exercise 2: Quarterly Analysis - SOLUTION

**Task:** Create quarterly periods and count arrests by quarter.

In [None]:
# SOLUTION
# Create quarterly periods and convert to timestamp for easier plotting
df['year_quarter'] = df['ARREST_DATE'].dt.to_period('Q').dt.to_timestamp()

# Count arrests by quarter
quarterly_arrests = df.groupby('year_quarter').size().reset_index(name='arrests')

print("Quarterly arrest counts:")
print(quarterly_arrests)

# Visualize the trend
sns.lineplot(data=quarterly_arrests,
             x='year_quarter',
             y='arrests',
             marker='o')

print("\nInterpretation: The quarterly view smooths out monthly fluctuations,")
print("making the pullback period (Q4 2014 - Q1 2015) clearly visible.")

## Exercise 3: Recovery Speed - SOLUTION

**Task:** Calculate how long it took for monthly arrests to return to at least 90% of the baseline average.

In [None]:
# SOLUTION
# First, calculate baseline monthly average
baseline_monthly = baseline.groupby(baseline['ARREST_DATE'].dt.to_period('M')).size()
baseline_avg = baseline_monthly.mean()

print(f"Baseline average: {baseline_avg:.1f} arrests per month")
print(f"90% of baseline: {baseline_avg * 0.9:.1f} arrests per month")

# Calculate monthly arrests in recovery period
recovery_monthly = recovery.groupby(recovery['ARREST_DATE'].dt.to_period('M')).size().reset_index(name='arrests')
recovery_monthly.columns = ['month', 'arrests']

# Convert period to timestamp for display
recovery_monthly['month'] = recovery_monthly['month'].dt.to_timestamp()

print("\nRecovery period monthly arrests:")
print(recovery_monthly)

# Find first month that hit 90% threshold
threshold = baseline_avg * 0.9
recovery_months = recovery_monthly[recovery_monthly['arrests'] >= threshold]

if len(recovery_months) > 0:
    first_recovery_month = recovery_months.iloc[0]['month']
    months_to_recover = len(recovery_monthly[recovery_monthly['month'] < first_recovery_month])
    print(f"\nFirst month to reach 90% of baseline: {first_recovery_month.strftime('%Y-%m')}")
    print(f"Time to recovery: {months_to_recover} months after pullback ended")
else:
    print("\nArrests did not return to 90% of baseline during the recovery period.")

## Exercise 4: Borough Analysis - SOLUTION

**Task:** Investigate whether different boroughs experienced the pullback differently.

In [None]:
# SOLUTION
# Count arrests by borough and period
borough_period = df.groupby(['ARREST_BORO', 'period']).size().reset_index(name='arrests')

print("Arrests by borough and period:")
print(borough_period.pivot(index='ARREST_BORO', columns='period', values='arrests'))

# Calculate percentage change for each borough
results = []

for borough in df['ARREST_BORO'].unique():
    # Filter to this borough
    boro_baseline = df[(df['ARREST_BORO'] == borough) & (df['period'] == 'Baseline')]
    boro_pullback = df[(df['ARREST_BORO'] == borough) & (df['period'] == 'Pullback')]
    
    # Calculate daily rates
    baseline_rate = len(boro_baseline) / baseline_days
    pullback_rate = len(boro_pullback) / pullback_days
    
    # Calculate percentage change
    pct_change = ((pullback_rate - baseline_rate) / baseline_rate) * 100
    
    results.append({
        'Borough': borough,
        'Baseline_Daily': baseline_rate,
        'Pullback_Daily': pullback_rate,
        'Pct_Change': pct_change
    })

results_df = pd.DataFrame(results).sort_values('Pct_Change')

print("\nPercentage change by borough (baseline to pullback):")
print(results_df)

# Visualize
sns.barplot(data=results_df,
            y='Borough',
            x='Pct_Change',
            palette='Set2')

print("\nInterpretation: All boroughs saw decreases during the pullback,")
print("but some (like Brooklyn and Bronx) had larger decreases than others.")

## Exercise 5: Most Common Offense Descriptions - SOLUTION

**Task:** Find the top 5 most common offense descriptions for felonies and misdemeanors.

In [None]:
# SOLUTION
# Top 5 felony offense descriptions
felony_offenses = df[df['LAW_CAT_CD'] == 'F'].groupby('OFNS_DESC').size().sort_values(ascending=False)

print("Top 5 Felony Offense Descriptions:")
print(felony_offenses.head(5))

# Top 5 misdemeanor offense descriptions
misdemeanor_offenses = df[df['LAW_CAT_CD'] == 'M'].groupby('OFNS_DESC').size().sort_values(ascending=False)

print("\nTop 5 Misdemeanor Offense Descriptions:")
print(misdemeanor_offenses.head(5))

## Exercise 6: Pullback Analysis for Specific Offense - SOLUTION

**Task:** Analyze how the most common offense changed during the pullback.

In [None]:
# SOLUTION
# Choose the most common offense from Exercise 5 (likely DANGEROUS DRUGS for felonies)
# Let's analyze the most common felony offense
most_common_felony = felony_offenses.index[0]

print(f"Analyzing: {most_common_felony}")

# Filter for this offense
offense_df = df[df['OFNS_DESC'] == most_common_felony]

# Count by period
offense_by_period = offense_df.groupby('period').size()

print(f"\nArrests for {most_common_felony} by period:")
print(offense_by_period)

# Calculate daily rates
baseline_count = offense_by_period.get('Baseline', 0)
pullback_count = offense_by_period.get('Pullback', 0)

baseline_rate = baseline_count / baseline_days
pullback_rate = pullback_count / pullback_days

# Calculate percentage change
pct_change = ((pullback_rate - baseline_rate) / baseline_rate) * 100

print(f"\nDaily rate during baseline: {baseline_rate:.1f} arrests/day")
print(f"Daily rate during pullback: {pullback_rate:.1f} arrests/day")
print(f"Percentage change: {pct_change:+.1f}%")

# Compare to overall average from lecture (26.4% decrease)
print(f"\nOverall average decrease was approximately 26.4%")
if abs(pct_change) > 26.4:
    print(f"{most_common_felony} dropped MORE than the overall average.")
    print("This suggests it's a more discretionary offense type.")
else:
    print(f"{most_common_felony} dropped LESS than the overall average.")
    print("This suggests it's a less discretionary offense type.")

## Summary

These exercises demonstrated key datetime operations:

1. **Extracting date components** using `.dt` accessor (Exercise 1)
2. **Creating time periods** with `.dt.to_period()` (Exercise 2)
3. **Filtering by date ranges** using comparison operators and `.between()` (Exercise 3)
4. **Calculating time differences** using `.days` (Exercise 3)
5. **Creating period indicators** using `.loc` assignment (Exercise 4)
6. **Grouping by multiple categories** including time periods (Exercises 1, 4, 6)

### Key Datetime Principles:

- Always convert string dates to datetime using `pd.to_datetime()`
- Use `.dt` accessor for all date operations
- Use `.to_period('M')` for proper monthly aggregation (not just `.dt.month`)
- Use `.between()` for cleaner date range filtering
- Calculate daily rates when comparing periods of different lengths
- Use percentage change formula: `((new - old) / old) * 100`

### Remember:

- Datetime operations enable before/after comparisons
- Natural experiments (like the NYPD pullback) create opportunities for causal analysis
- Different categories (boroughs, offense types) can respond differently to the same event
- Visualization helps reveal patterns that numbers alone might hide