
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zjelveh/zjelveh.github.io/blob/master/files/cfc/8_datetime_operations.ipynb)

**IMPORTANT**: Save your own copy!
1. Click File â†’ Save a copy in Drive
2. Rename it
3. Work in YOUR copy, not the original


---


# 8. Datetime Operations with Pandas
## CCJS 418E: Coding for Criminology

Today's Goals:
- Convert string dates to datetime objects
- Extract year, month, day from dates
- Filter data by date ranges
- Create time period indicators
- Use `.iloc` and `.loc` for indexing
- Aggregate data by time periods

We will learn to work with dates and analyze time-based patterns

**Research Context:**

In December 2014, NYPD officers reduced enforcement activities following political tensions in New York City. This created a natural experiment to study:
- Which arrests depend on officer-initiated contact vs. citizen complaints
- How enforcement patterns change during pullback periods
- Whether different arrest types respond differently to policy changes

**Key Research Question:** When police reduce enforcement, which types of arrests change most? What does this reveal about discretionary vs. mandatory policing activities?


## Setup: Import Libraries and Load Data

We'll use NYPD arrest data from 2013-2015 to analyze enforcement patterns.

In [None]:
# First, import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

# Load the data
df = pd.read_csv('https://raw.githubusercontent.com/zjelveh/zjelveh.github.io/refs/heads/master/files/cfc/nypd_arrests_2013_2015_garner.csv')
print(f"Loaded {len(df):,} arrest records")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst few rows:")
df.head()

---

## Part 1: Understanding Datetime Objects

**Problem**: Our ARREST_DATE column is stored as text, not as a datetime object.

**Why this matters**: You can't do date math on "12/03/2014" - Python sees it as text. But datetime objects allow:

- **Filtering by date ranges**
```python
df[df['ARREST_DATE'] >= pd.to_datetime('2014-01-01')]
```

- **Extracting year, month, day**
```python
df['ARREST_DATE'].dt.year
```

- **Calculating time differences**
```python
(end_date - start_date).days
```

- **Sorting chronologically**
```python
df.sort_values('ARREST_DATE')
```


### Check the data type

Let's first examine what type ARREST_DATE currently is.

In [None]:
# Check the data type of ARREST_DATE
print(f"Data type: {df['ARREST_DATE'].dtype}")
print(f"\nFirst few dates:")
print(df['ARREST_DATE'].head())

# It's currently 'object' which means text/string

### String vs Datetime: The Problem

When dates are stored as strings, Python can't understand them as dates.

In [None]:
# Let's see the problem with string dates
print("As a string (current state):")
print(f"  First value: {df['ARREST_DATE'].head(1).values[0]}")
print(f"  Type: {type(df['ARREST_DATE'].head(1).values[0])}")

# Try to filter by date - THIS WON'T WORK CORRECTLY:
# String comparison gives wrong results
print("\nString comparison problem:")
print("  '2015-12-01' > '2015-02-01':", '2015-12-01' > '2015-02-01')  # Correct
print("  '2015-2-01' > '2015-12-01':", '2015-2-01' > '2015-12-01')   # WRONG! (string comparison)

print("\nWe need to convert to datetime to do date operations properly!")

### Converting to Datetime

Use `pd.to_datetime()` to convert strings to datetime objects.

In [None]:
# Convert ARREST_DATE to datetime
df['ARREST_DATE'] = pd.to_datetime(df['ARREST_DATE'])

# Check the result
print(f"Data type after conversion: {df['ARREST_DATE'].dtype}")
print(f"\nFirst few dates:")
print(df['ARREST_DATE'].head())

# Now it's datetime64[ns] - a proper datetime object!

### The .dt Accessor: Your Gateway to Date Components

Once a column is datetime, you can use the `.dt` accessor to extract parts of the date.

**Think of `.dt` as unlocking date-specific operations:**

```python
df['date_column'].dt.year        # Extract year (2014, 2015, etc.)
df['date_column'].dt.month       # Extract month (1-12)
df['date_column'].dt.day         # Extract day (1-31)
df['date_column'].dt.day_name()  # Day of week ('Monday', 'Tuesday', etc.)
df['date_column'].dt.to_period('M')  # Convert to monthly period (2014-01, 2014-02, etc.)
```

**Important**: `.dt` ONLY works on datetime columns, not strings!

In [None]:
# Examples of .dt accessor
# Show on the whole column first
print("Extracting date components from the entire column:")
print("\nFirst 5 years:")
print(df['ARREST_DATE'].dt.year.head())

print("\nFirst 5 months:")
print(df['ARREST_DATE'].dt.month.head())

print("\nFirst 5 day names:")
print(df['ARREST_DATE'].dt.day_name().head())

### ðŸŽ¯ QUICK CHECK #1
Extract the year, month, and day of week from ARREST_DATE.
Store them in new columns called 'year', 'month', and 'day_of_week'.

Use the `.dt` accessor!

In [None]:
# Your code here:




<details>
<summary>Click for solution</summary>

```python
# Extract date components using .dt
df['year'] = df['ARREST_DATE'].dt.year
df['month'] = df['ARREST_DATE'].dt.month
df['day_of_week'] = df['ARREST_DATE'].dt.day_name()

print(df[['ARREST_DATE', 'year', 'month', 'day_of_week']].head(10))
```
</details>

---

## Part 2: Understanding `.iloc` vs `.loc`

Before we work with time series data, let's understand two important indexing methods:
- **`.iloc`**: Position-based indexing (like array indices: 0, 1, 2...)
- **`.loc`**: Label-based indexing (uses index labels or boolean conditions)

We'll need both of these throughout the lecture.

### Position-Based Indexing (.iloc)

Use `.iloc` when you want to select by row position (like counting: 1st, 2nd, 3rd...).

In [None]:
# Position-based indexing with .iloc
print("First 5 rows using .iloc[0:5]:")
print(df[['ARREST_DATE', 'LAW_CAT_CD']].iloc[0:5])

print("\nLast 3 rows using .iloc[-3:]:")
print(df[['ARREST_DATE', 'LAW_CAT_CD']].iloc[-3:])


# Get a specific row by position
print(f"\n10th row (position 9): {df['ARREST_DATE'].iloc[9]}")

### .loc vs Non-.loc Filtering: What's the Difference?

You might be wondering: "We've been filtering with `df[df['column'] == value]` - why do we need `.loc`?"

**Short answer:** For simple row filtering, they're the same! But `.loc` gives you more power.

**Example - These are equivalent for filtering:**
```python
# Without .loc
felonies = df[df['LAW_CAT_CD'] == 'F']

# With .loc
felonies = df.loc[df['LAW_CAT_CD'] == 'F']
```

**But .loc lets you do MORE:**

1. **Select specific columns from filtered rows:**
```python
# Get just ARREST_DATE and ARREST_BORO for felonies
df.loc[df['LAW_CAT_CD'] == 'F', ['ARREST_DATE', 'ARREST_BORO']]
```

2. **Assign values to filtered rows:**
```python
# Create a new column for just felonies
df.loc[df['LAW_CAT_CD'] == 'F', 'serious'] = 'Yes'
```

**When to use which:**
- `df[condition]` - Simple filtering, getting all columns
- `df.loc[condition]` - Filtering + column selection OR when assigning values

### Label-Based Indexing (.loc)

Use `.loc` when you want to:
1. Select by label values (like month names)
2. Use boolean conditions to filter rows
3. Assign values to filtered rows

In [None]:
# Label-based indexing with .loc

# Example 1: Simple filtering (same as df[condition])
print("Method 1 - Without .loc:")
felonies_v1 = df[df['LAW_CAT_CD'] == 'F']
print(f"  Found {len(felonies_v1):,} felonies")

print("\nMethod 2 - With .loc (same result):")
felonies_v2 = df.loc[df['LAW_CAT_CD'] == 'F']
print(f"  Found {len(felonies_v2):,} felonies")

# Example 2: .loc's extra power - select specific columns
print("\n.loc's extra power - filter AND select columns:")
felony_dates = df.loc[df['LAW_CAT_CD'] == 'F', ['ARREST_DATE', 'ARREST_BORO']].head()
print(felony_dates)

print("\n.loc is especially useful for:")
print("  1. Filtering with conditions")
print("  2. Selecting specific columns from filtered data")
print("  3. Assigning new values to filtered rows (we'll use this later!)")

### ðŸŽ¯ QUICK CHECK #2
Use `.iloc` to select rows 100 through 105 (6 rows total).
Then use `.loc` to select all felony arrests (LAW_CAT_CD == 'F') and show just the ARREST_DATE column for the first 5.

In [None]:
# Your code here:




<details>
<summary>Click for solution</summary>

```python
# Position-based with .iloc
print("Rows 100-105 using .iloc:")
print(df.iloc[100:106])  # Remember: end is exclusive

# Label-based with .loc
print("\nFirst 5 felony arrests using .loc:")
felonies = df.loc[df['LAW_CAT_CD'] == 'F', 'ARREST_DATE'].head(5)
print(felonies)
```
</details>

---

## Part 3: Understanding Time Periods

When analyzing trends over time, we often want to group dates into periods (months, quarters, years).

**The `.to_period()` method converts dates to time periods.**

### Why Use .to_period()?

Compare these two approaches:

**Bad approach** - Extract month as number:
```python
df['month'] = df['ARREST_DATE'].dt.month  # Returns 1, 2, 3, ...
# Problem: Can't tell which year! All Januarys become "1"
```

**Good approach** - Use periods:
```python
df['year_month'] = df['ARREST_DATE'].dt.to_period('M')  # Returns 2014-01, 2014-02, ...
# Each month is unique and sorts correctly!
```

### Create Monthly Periods

In [None]:
# Create monthly period using .dt.to_period('M')
# This converts each date to its month: 2014-12-15 becomes 2014-12
df['year_month'] = df['ARREST_DATE'].dt.to_period('M')

print("Original date vs Monthly period:")
print(df[['ARREST_DATE', 'year_month']].head(10))

print(f"\nData type: {df['year_month'].dtype}")
print(f"Unique months: {df['year_month'].nunique()}")
print(f"Date range: {df['year_month'].min()} to {df['year_month'].max()}")

### Other Period Types

You can create different time periods:

In [None]:
# Different period types
df['year_quarter'] = df['ARREST_DATE'].dt.to_period('Q')  # Quarterly
df['year_only'] = df['ARREST_DATE'].dt.to_period('Y')     # Yearly

print("Different period types:")
print(df[['ARREST_DATE', 'year_month', 'year_quarter', 'year_only']].head(10))

# Clean up demo columns
df = df.drop(columns=['year_quarter', 'year_only'])

### ðŸŽ¯ QUICK CHECK #3
Create a quarterly period column called 'quarter' using `.dt.to_period('Q')`.
How many unique quarters are in the dataset?

In [None]:
# Your code here:




<details>
<summary>Click for solution</summary>

```python
df['quarter'] = df['ARREST_DATE'].dt.to_period('Q')
print(f"Unique quarters: {df['quarter'].nunique()}")
print(f"\nQuarters in dataset:")
print(df['quarter'].unique())

# Clean up
df = df.drop(columns=['quarter'])
```
</details>

---

## Part 4: Exploratory Analysis - Finding the Pullback Period

Before we can study the pullback effect, we need to see it in the data!

Let's count arrests by month and visualize the trend to identify when enforcement dropped.

### Count Arrests by Month

In [None]:
# Count arrests by month using groupby
monthly_arrests = df.groupby('year_month').size()

print("Monthly arrests:")
print(monthly_arrests)

# Find the month with fewest arrests
min_month = monthly_arrests.idxmin()
min_arrests = monthly_arrests.min()

print(f"\nMonth with fewest arrests: {min_month} ({min_arrests:,} arrests)")

### Visualize the Trend

Let's plot monthly arrests to see the pullback visually using seaborn.

In [None]:
# Create time series plot with seaborn
plt.figure(figsize=(14, 6))

# Convert to DataFrame for seaborn
monthly_df = monthly_arrests.reset_index()
monthly_df.columns = ['year_month', 'arrests']
monthly_df['year_month_str'] = monthly_df['year_month'].astype(str)

# Create the plot
sns.lineplot(data=monthly_df, x='year_month_str', y='arrests', marker='o', linewidth=2)

plt.title('NYPD Monthly Arrests (2013-2015)', fontsize=14, fontweight='bold')
plt.xlabel('Month')
plt.ylabel('Number of Arrests')
plt.xticks(range(0, len(monthly_df), 3), monthly_df['year_month_str'][::3], rotation=45)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nWhat do you notice about the pattern?")
print("When do arrests drop dramatically?")

### Define the Pullback Period

Based on the visualization, we can see arrests drop sharply around December 2014 and stay low through early 2015.

**Research context**: Political tensions escalated in early December 2014, and the enforcement pullback lasted through February 2015.

In [None]:
# Define key dates based on what we observed
pullback_start = pd.to_datetime('2014-12-01')
pullback_end = pd.to_datetime('2015-02-28')

print(f"Pullback period: {pullback_start.strftime('%B %d, %Y')} to {pullback_end.strftime('%B %d, %Y')}")
print(f"\nThese are datetime objects we'll use for filtering:")
print(f"  Type: {type(pullback_start)}")

---

## Part 5: Filtering by Date Ranges

Now that we've identified the pullback period, let's filter the data to compare different time periods.

### Method 1: Using Comparison Operators

The straightforward way to filter by dates.

In [None]:
# Create three datasets using comparison operators:
# 1. Baseline: before Dec 1, 2014
baseline = df[df['ARREST_DATE'] < pullback_start]

# 2. Pullback: Dec 1, 2014 - Feb 28, 2015
pullback = df[(df['ARREST_DATE'] >= pullback_start) & (df['ARREST_DATE'] <= pullback_end)]

# 3. Recovery: after Feb 28, 2015
recovery = df[df['ARREST_DATE'] > pullback_end]

print(f"Baseline arrests: {len(baseline):,}")
print(f"Pullback arrests: {len(pullback):,}")
print(f"Recovery arrests: {len(recovery):,}")

### Method 2: Using .between()

A cleaner way to filter for dates in a range.

In [None]:
# Using .between() for the pullback period
pullback_between = df[df['ARREST_DATE'].between(pullback_start, pullback_end)]

print(f"Pullback arrests (using .between): {len(pullback_between):,}")

# Verify it's the same as Method 1
print(f"Same result as Method 1? {len(pullback_between) == len(pullback)}")

# .between() is especially useful for complex date ranges
print("\nExample: Arrests in Q4 2014 (Oct 1 - Dec 31):")
q4_2014 = df[df['ARREST_DATE'].between(pd.to_datetime('2014-10-01'), pd.to_datetime('2014-12-31'))]
print(f"  Count: {len(q4_2014):,}")

### ðŸŽ¯ QUICK CHECK #4
Filter the data for arrests that occurred in January 2015 only.
How many arrests were there?

Try both methods: comparison operators and `.between()`

In [None]:
# Your code here:




<details>
<summary>Click for solution</summary>

```python
# Method 1: Comparison operators
jan_2015_v1 = df[(df['ARREST_DATE'] >= pd.to_datetime('2015-01-01')) &
                 (df['ARREST_DATE'] <= pd.to_datetime('2015-01-31'))]

# Method 2: .between()
jan_2015_v2 = df[df['ARREST_DATE'].between(pd.to_datetime('2015-01-01'),
                                            pd.to_datetime('2015-01-31'))]

print(f"January 2015 arrests (Method 1): {len(jan_2015_v1):,}")
print(f"January 2015 arrests (Method 2): {len(jan_2015_v2):,}")
print(f"Same result? {len(jan_2015_v1) == len(jan_2015_v2)}")
```
</details>

---

## Part 6: Calculating Daily Rates

To fairly compare periods of different lengths, we need to calculate daily arrest rates.

### Understanding .days for Time Differences

When you subtract two datetime objects, you get a `timedelta` object. Use `.days` to get the number of days.

In [None]:
# Calculate number of days in each period
# Subtracting datetimes gives a timedelta object
time_difference = pullback_start - df['ARREST_DATE'].min()

print(f"Time difference object: {time_difference}")
print(f"Type: {type(time_difference)}")

# Extract just the number of days using .days
baseline_days = (pullback_start - df['ARREST_DATE'].min()).days
pullback_days = (pullback_end - pullback_start).days + 1  # +1 to include both endpoints
recovery_days = (df['ARREST_DATE'].max() - pullback_end).days

print(f"\nBaseline period: {baseline_days} days")
print(f"Pullback period: {pullback_days} days")
print(f"Recovery period: {recovery_days} days")

### Calculate Daily Arrest Rates

In [None]:
# Calculate daily rates for each period
baseline_daily = len(baseline) / baseline_days
pullback_daily = len(pullback) / pullback_days
recovery_daily = len(recovery) / recovery_days

print("Daily arrest rates:")
print(f"  Baseline: {baseline_daily:.1f} arrests/day")
print(f"  Pullback: {pullback_daily:.1f} arrests/day")
print(f"  Recovery: {recovery_daily:.1f} arrests/day")

### Understanding Percentage Change

**Percentage change formula:**

```
percentage_change = ((new_value - old_value) / old_value) Ã— 100
```

**Example:**
- Old value: 100 arrests/day
- New value: 60 arrests/day
- Change: (60 - 100) / 100 = -0.40 = -40%

A negative percentage means a decrease.

In [None]:
# Calculate percentage change from baseline to pullback
pct_change = ((pullback_daily - baseline_daily) / baseline_daily) * 100

print(f"Percentage change during pullback: {pct_change:.1f}%")

if pct_change < 0:
    print(f"\nInterpretation: Arrests dropped by {abs(pct_change):.1f}% during the pullback")
else:
    print(f"\nInterpretation: Arrests increased by {pct_change:.1f}% during the pullback")

### ðŸŽ¯ QUICK CHECK #5
Calculate the percentage change in daily arrest rates from pullback to recovery.
Did enforcement return to baseline levels?

In [None]:
# Your code here:




<details>
<summary>Click for solution</summary>

```python
# Calculate percentage change from pullback to recovery
pct_change_recovery = ((recovery_daily - pullback_daily) / pullback_daily) * 100

print(f"Daily rate during pullback: {pullback_daily:.1f} arrests/day")
print(f"Daily rate during recovery: {recovery_daily:.1f} arrests/day")
print(f"Percentage change: {pct_change_recovery:+.1f}%")

# Compare to original baseline
print(f"\nOriginal baseline: {baseline_daily:.1f} arrests/day")
print(f"Recovery as % of baseline: {(recovery_daily / baseline_daily * 100):.1f}%")
```
</details>

---

## Part 7: Creating Time Period Indicators

Instead of keeping three separate filtered datasets, we can create a column that labels each arrest by period. This makes groupby operations easier.

### Using .loc to Assign Period Labels

We'll use `.loc` with boolean conditions to assign labels.

In [None]:
# Create period indicator using .loc assignment
# Step 1: Initialize all rows as 'Baseline'
df['period'] = 'Baseline'

# Step 2: Update rows in the pullback period
df.loc[df['ARREST_DATE'].between(pullback_start, pullback_end), 'period'] = 'Pullback'

# Step 3: Update rows in the recovery period
df.loc[df['ARREST_DATE'] > pullback_end, 'period'] = 'Recovery'

print("Distribution of arrests by period:")
print(df['period'].value_counts().sort_index())

# Verify our counts match the filtered datasets
print("\nVerification (should match our earlier filtered datasets):")
print(f"  Baseline: {(df['period'] == 'Baseline').sum():,} (should be {len(baseline):,})")
print(f"  Pullback: {(df['period'] == 'Pullback').sum():,} (should be {len(pullback):,})")
print(f"  Recovery: {(df['period'] == 'Recovery').sum():,} (should be {len(recovery):,})")

### Compare Periods Using Groupby

Now we can easily compare periods using groupby.

In [None]:
# Count arrests by period
period_counts = df.groupby('period').size()

print("Arrests by period:")
print(period_counts)

# Calculate daily rates for each period
period_days = {'Baseline': baseline_days, 'Pullback': pullback_days, 'Recovery': recovery_days}
period_daily_rates = period_counts / pd.Series(period_days)

print("\nDaily arrest rates by period:")
print(period_daily_rates.round(1))

---

## Part 8: Visualizing the Pullback Effect with Context

Let's create an informative visualization showing the baseline average and marking the pullback period.

In [None]:
# Create enhanced time series plot
plt.figure(figsize=(14, 6))

# Prepare data
monthly_df = monthly_arrests.reset_index()
monthly_df.columns = ['year_month', 'arrests']
monthly_df['month_num'] = range(len(monthly_df))

# Find indices for pullback period
pullback_start_idx = list(monthly_arrests.index).index(pd.Period('2014-12'))
pullback_end_idx = list(monthly_arrests.index).index(pd.Period('2015-02'))

# Plot the line
sns.lineplot(data=monthly_df, x='month_num', y='arrests', marker='o', linewidth=2, markersize=4)

# Shade the pullback period
plt.axvspan(pullback_start_idx, pullback_end_idx, alpha=0.2, color='red', label='Pullback Period')

# Add vertical lines at boundaries
plt.axvline(x=pullback_start_idx, color='red', linestyle='--', alpha=0.7)
plt.axvline(x=pullback_end_idx, color='red', linestyle='--', alpha=0.7)

# Add horizontal line showing baseline average
baseline_avg = monthly_arrests.iloc[:pullback_start_idx].mean()
plt.axhline(y=baseline_avg, color='blue', linestyle=':', alpha=0.5, linewidth=2, label='Baseline Average')

# Labels
x_labels = [str(period) for period in monthly_arrests.index]
plt.title('NYPD Monthly Arrests (2013-2015): Pullback Effect', fontsize=14, fontweight='bold')
plt.xlabel('Month')
plt.ylabel('Number of Arrests')
plt.xticks(range(0, len(monthly_df), 3), [x_labels[i] for i in range(0, len(x_labels), 3)], rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Baseline average: {baseline_avg:.0f} arrests/month")
print(f"Pullback average: {monthly_arrests.iloc[pullback_start_idx:pullback_end_idx+1].mean():.0f} arrests/month")

---

## Part 9: Differential Impact by Arrest Type

**Research Question**: Did all arrest types drop equally during the pullback, or were some affected more than others?

**Hypothesis**: Discretionary arrests (misdemeanors) should drop more than mandatory arrests (felonies).

### Compare Felonies vs Misdemeanors

Let's see how different offense levels changed during the pullback.

In [None]:
# Group by period and law category
period_lawcat = df.groupby(['period', 'LAW_CAT_CD']).size().unstack(fill_value=0)

print("Arrests by period and law category:")
print(period_lawcat)

# Calculate percentage change for each category
print("\nPercentage change during pullback (from baseline):")

for lawcat in ['F', 'M']:  # Felony, Misdemeanor
    if lawcat in period_lawcat.columns:
        baseline_count = period_lawcat.loc['Baseline', lawcat]
        pullback_count = period_lawcat.loc['Pullback', lawcat]

        baseline_rate = baseline_count / baseline_days
        pullback_rate = pullback_count / pullback_days

        pct_change = ((pullback_rate - baseline_rate) / baseline_rate) * 100

        cat_name = 'Felony' if lawcat == 'F' else 'Misdemeanor'
        print(f"  {cat_name}: {pct_change:+.1f}%")

### Visualize the Comparison

In [None]:
# Create side-by-side comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Absolute counts by period - using seaborn
period_lawcat_df = period_lawcat.reset_index()
period_lawcat_melted = period_lawcat_df.melt(id_vars='period', value_vars=['F', 'M'],
                                               var_name='Law Category', value_name='Arrests')
period_lawcat_melted['Law Category'] = period_lawcat_melted['Law Category'].map({'F': 'Felony', 'M': 'Misdemeanor'})

sns.barplot(data=period_lawcat_melted, x='period', y='Arrests', hue='Law Category', ax=ax1)
ax1.set_title('Arrests by Law Category and Period', fontweight='bold')
ax1.set_xlabel('Period')
ax1.set_ylabel('Total Arrests')

# Plot 2: Percentage change from baseline
categories = []
pct_changes = []

for lawcat, name in [('F', 'Felony'), ('M', 'Misdemeanor')]:
    if lawcat in period_lawcat.columns:
        baseline_count = period_lawcat.loc['Baseline', lawcat]
        pullback_count = period_lawcat.loc['Pullback', lawcat]

        baseline_rate = baseline_count / baseline_days
        pullback_rate = pullback_count / pullback_days

        pct_change = ((pullback_rate - baseline_rate) / baseline_rate) * 100
        categories.append(name)
        pct_changes.append(pct_change)

pct_df = pd.DataFrame({'Category': categories, 'Pct_Change': pct_changes})
colors = ['red' if x < 0 else 'green' for x in pct_changes]
sns.barplot(data=pct_df, y='Category', x='Pct_Change', palette=colors, ax=ax2)
ax2.axvline(x=0, color='black', linestyle='-', linewidth=0.5)
ax2.set_title('Percentage Change During Pullback', fontweight='bold')
ax2.set_xlabel('% Change from Baseline')
ax2.set_ylabel('')

plt.tight_layout()
plt.show()

**Interpretation**:
- Which type of arrest dropped more during the pullback?
- What does this tell us about discretionary vs. mandatory policing?
- Did enforcement return to baseline levels after the pullback?

---

## Hands-On Exercises: Your Turn to Analyze

Use everything you learned today to answer these questions:

### Exercise 1: Day of Week Analysis
Calculate which day of the week has the highest average arrest count. Did this pattern change during the pullback?

Hint: Use the 'day_of_week' column you created earlier and groupby with 'period'.

In [None]:
# Your code here:




### Exercise 2: Quarterly Analysis
Create quarterly periods using `.dt.to_period('Q')`.
Count arrests by quarter and create a visualization showing the trend.

In [None]:
# Your code here:




### Exercise 3: Recovery Speed
Calculate how long it took for monthly arrests to return to at least 90% of the baseline average after the pullback ended.

Hint: Filter for recovery period months, compare each to baseline average using .loc.

In [None]:
# Your code here:




### Exercise 4: Borough Analysis
Investigate whether different boroughs (ARREST_BORO) experienced the pullback differently.
Calculate the percentage change for each borough from baseline to pullback.

Use the period column and groupby to make this easier!

In [None]:
# Your code here:




---

## Using AI for Datetime Questions

### Effective Prompts:

```
I have a pandas DataFrame with a column 'arrest_date' that contains dates.

How do I:
1. Convert it from string to datetime format
2. Extract the year, month, and day of week
3. Filter for dates between Jan 1, 2020 and Dec 31, 2020
4. Create monthly aggregates using pd.to_period()
5. Create a column that labels dates as 'before' or 'after' a cutoff date using .loc assignment

Please show examples with explanation.
```

### Common AI Questions:
- "What's the difference between `.dt.to_period('M')` and `.dt.month`?"
- "How do I filter a DataFrame for dates in a specific range using .between()?"
- "What's the difference between `.iloc` and `.loc`?"
- "How do I convert a string like '12/25/2020' to datetime?"
- "How do I calculate the number of days between two dates using .days?"
- "Why do I get an error when using `.dt.year`?" (Answer: column must be datetime type!)
- "How do I use .loc to assign values to filtered rows?"

### When Asking for Help:
Always include:
1. Your data types (`df.dtypes`)
2. Sample of your data (`df.head()`)
3. What you tried
4. The error message (if any)

---

## Summary: Key Datetime Operations

**What we learned:**

1. **Converting to datetime**: `pd.to_datetime()` converts strings to datetime objects
2. **The .dt accessor**: Unlocks date-specific operations (`.dt.year`, `.dt.month`, etc.)
3. **Indexing**: `.iloc` (position) vs `.loc` (label/condition)
4. **Time periods**: Using `pd.to_period('M')` for proper monthly periods
5. **Filtering by dates**: Using comparison operators and `.between()`
6. **Time differences**: `.days` attribute to get number of days
7. **Creating indicators**: Using `.loc` assignment to label time periods
8. **Percentage change**: `((new - old) / old) * 100`
9. **Visualization**: Using seaborn for cleaner time series plots

**Why this matters for research:**

- **Natural experiments**: Events like the NYPD pullback create opportunities to study causal effects
- **Before/after comparisons**: Datetime operations enable rigorous temporal analysis
- **Differential effects**: We can identify which outcomes are most sensitive to policy changes
- **Policy evaluation**: Understanding timing helps separate different factors affecting outcomes

## Quick Reference Card

```python
# Converting to datetime
df['date_col'] = pd.to_datetime(df['date_col'])

# Extracting date components (requires .dt accessor)
df['year'] = df['date_col'].dt.year
df['month'] = df['date_col'].dt.month
df['day'] = df['date_col'].dt.day
df['day_of_week'] = df['date_col'].dt.day_name()

# Creating period columns for aggregation
df['year_month'] = df['date_col'].dt.to_period('M')  # Monthly (2014-01, 2014-02)
df['year_quarter'] = df['date_col'].dt.to_period('Q')  # Quarterly (2014Q1, 2014Q2)

# Filtering by dates (always wrap strings in pd.to_datetime!)
cutoff_date = pd.to_datetime('2020-03-16')
before = df[df['date_col'] < cutoff_date]
after = df[df['date_col'] >= cutoff_date]
between = df[df['date_col'].between(start_date, end_date)]

# Time differences (get number of days)
num_days = (end_date - start_date).days

# Creating period indicators with .loc assignment
df['period'] = 'Baseline'  # Initialize
df.loc[df['date_col'].between(date1, date2), 'period'] = 'Period2'
df.loc[df['date_col'] > date2, 'period'] = 'Period3'

# Indexing
df.iloc[0:5]        # First 5 rows (position-based)
df.iloc[-5:]        # Last 5 rows (position-based)
df.loc[condition]   # Filter by condition (label-based)
df.loc[condition, 'column'] = value  # Assign to filtered rows

# Time aggregation
monthly_counts = df.groupby(df['date_col'].dt.to_period('M')).size()

# Percentage change
pct_change = ((new_value - old_value) / old_value) * 100
```

## Before Next Class

1. **Practice datetime operations:**
   - Convert string dates to datetime
   - Extract different date components
   - Filter by various date ranges

2. **Experiment with aggregation:**
   - Create monthly, quarterly aggregates
   - Use `.iloc` and `.loc` for different tasks
   - Combine groupby with time periods

3. **Apply to your own questions:**
   - Think of time-based questions about the data
   - Create before/after comparisons
   - Visualize trends over time

4. **Use AI when stuck:**
   - Ask about datetime formatting issues
   - Get help with complex date filtering
   - Learn about additional datetime methods