[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zjelveh/zjelveh.github.io/blob/master/files/cfc/7_visualization_simplified_solutions.ipynb)

**IMPORTANT**: Save your own copy!
1. Click File â†’ Save a copy in Drive
2. Rename it
3. Work in YOUR copy, not the original

---

# Data Visualization with Seaborn - SOLUTIONS
## CCJS 418E: Coding for Criminology

This notebook contains worked solutions for Lab 7 exercises.

In [None]:
import pandas as pd
import seaborn as sns

# Load the pretrial decisions data
df = pd.read_csv('https://raw.githubusercontent.com/zjelveh/zjelveh.github.io/refs/heads/master/files/cfc/bail_decisions_monthly.csv')
df.drop(columns=['pct_felony'], inplace=True)
df = df[df.county != 'STATEWIDE']

# Create the large_county column needed for some exercises
df['large_county'] = df.county.isin(['Baltimore City', 'Prince George',
                                      'Baltimore', 'Anne Arundel', 'Montgomery'])

# Create the post_reform column needed for some exercises
df['post_reform'] = df['months_from_reform'] > 0

df.head()

## Exercise 1: Line Plot Practice - SOLUTION

**Task:** Filter the data to just Baltimore City and create a line plot showing how `pct_hwob` (cash bail) changed over time.

In [None]:
# SOLUTION
# Filter to just Baltimore City
baltimore_data = df[df['county'] == 'Baltimore City']

# Create line plot showing cash bail (HWOB) over time
sns.lineplot(data=baltimore_data,
             x='months_from_reform',
             y='pct_hwob',
             marker='o')

print("\nInterpretation: We can clearly see cash bail usage dropped dramatically")
print("after reform (month 0) in Baltimore City.")

## Exercise 2: Grouped Line Plot - SOLUTION

**Task:** Create a line plot comparing `pct_hdob` (detention rates) over time for large vs small counties.

In [None]:
# SOLUTION
# Use the large_county column we created earlier
sns.lineplot(data=df,
             x='months_from_reform',
             y='pct_hdob',
             hue='large_county',
             errorbar=None,
             palette='Set2')

print("\nInterpretation: Large counties (True) and small counties (False)")
print("both show similar patterns - detention rates increased slightly after reform.")

## Exercise 3: Bar Chart Comparison - SOLUTION

**Task:** Create a bar chart comparing average `pct_hdob` (detention) across the top 5 largest counties.

In [None]:
# SOLUTION
# Filter to just the top 5 largest counties
top5_counties = df[df['county'].isin(['Baltimore City', 'Prince George', 
                                       'Baltimore', 'Anne Arundel', 'Montgomery'])]

# Create bar chart comparing detention rates
sns.barplot(data=top5_counties,
            x='county',
            y='pct_hdob',
            errorbar=None,
            palette='Set2')

print("\nInterpretation: This shows the average detention rate for each of the")
print("5 largest counties, aggregated across all time periods.")

## Exercise 4: Histogram Exploration - SOLUTION

**Task:** Create a histogram of `pct_hwob` with 15 bins.

In [None]:
# SOLUTION
# Create histogram of cash bail percentages
sns.histplot(data=df,
             x='pct_hwob',
             bins=15)

print("\nInterpretation: The distribution shows many county-months had very low")
print("cash bail usage (near 0%), likely after the reform. There's also a cluster")
print("of higher usage rates, probably from before the reform.")

## Exercise 5: Histogram Comparison - SOLUTION

**Task:** Create a histogram comparing the distribution of `pct_hwob` before and after reform.

In [None]:
# SOLUTION
# Compare distributions before and after reform
sns.histplot(data=df,
             x='pct_hwob',
             hue='post_reform',
             bins=15,
             stat='density',
             palette='Set2')

print("\nInterpretation: Before reform (False), cash bail usage was spread across")
print("a wide range. After reform (True), usage became concentrated near 0%,")
print("showing the reform successfully reduced cash bail. The distribution became")
print("LESS variable after reform - more concentrated around low values.")

## Exercise 6: Scatter Plot - SOLUTION

**Task:** Create a scatter plot showing the relationship between `pct_ror` and `pct_hwob`.

In [None]:
# SOLUTION
# Create scatter plot of ROR vs cash bail
sns.scatterplot(data=df,
                x='pct_ror',
                y='pct_hwob',
                alpha=0.5)

print("\nInterpretation: There appears to be a NEGATIVE relationship - as ROR")
print("(release) rates increase, cash bail rates tend to decrease. This makes")
print("sense: judges have limited options, so using more of one means less of another.")

## Exercise 7: Advanced Scatter Plot - SOLUTION

**Task:** Create a scatter plot of `pct_ror` vs `pct_hwob` with `hue='post_reform'`.

In [None]:
# SOLUTION
# Scatter plot colored by reform period
sns.scatterplot(data=df,
                x='pct_ror',
                y='pct_hwob',
                hue='post_reform',
                alpha=0.5,
                palette='Set2')

print("\nInterpretation: Before reform (False), points are spread across higher")
print("cash bail values. After reform (True), points cluster in the lower region")
print("showing less cash bail usage, regardless of ROR rate.")

In [None]:
# SOLUTION - Adding separate regression lines
# This is a bit advanced! We create two separate regplots, one for each period

# Filter data by period
pre_reform = df[df['post_reform'] == False]
post_reform = df[df['post_reform'] == True]

# Plot regression lines for each period
sns.regplot(data=pre_reform,
            x='pct_ror',
            y='pct_hwob',
            scatter=True,
            label='Pre-reform',
            color='blue',
            scatter_kws={'alpha': 0.3})

sns.regplot(data=post_reform,
            x='pct_ror',
            y='pct_hwob',
            scatter=True,
            label='Post-reform',
            color='orange',
            scatter_kws={'alpha': 0.3})

print("\nInterpretation: The regression lines show the negative relationship")
print("exists in both periods, but the post-reform line is lower (less cash bail)")
print("across all ROR levels.")

## Exercise 8: Scatter Plot II - SOLUTION

**Task:** Create a scatter plot of `month` vs `n_total_hearings` with `hue='year'`.

In [None]:
# SOLUTION
# First, convert year to string so seaborn treats it as categorical
df['year_str'] = df['year'].astype(str)

# Create scatter plot
sns.scatterplot(data=df,
                x='month',
                y='n_total_hearings',
                hue='year_str',
                alpha=0.6,
                palette='Set2')

print("\nInterpretation: There's a clear SEASONAL PATTERN! Total hearings tend to")
print("be lower in certain months (likely summer months) and higher in others.")
print("This pattern repeats across years, shown by the vertical clustering of different")
print("colored points at similar months.")

## Summary

These exercises demonstrated:

1. **Line plots** show trends over time (Exercises 1-2)
2. **Bar charts** compare categories (Exercise 3)
3. **Histograms** reveal distributions (Exercises 4-5)
4. **Scatter plots** explore relationships between variables (Exercises 6-8)

### Key Seaborn Principles:

- Always specify `data=`, `x=`, and `y=` parameters
- Use `hue=` to add grouping by color
- Use `palette='Set2'` or `'colorblind'` for nice colors
- Use `alpha=0.5` to make overlapping points visible
- Use `errorbar=None` to turn off confidence intervals
- Filter your data BEFORE plotting to focus on specific subsets

### Remember:

- Seaborn makes visualization simple - you don't need matplotlib for basic plots
- Always interpret your visualizations - what story do they tell?
- Pandas preparation (filtering, grouping) often comes before visualization
- Different plot types answer different questions - choose wisely!