# Urban Crime Analytics & Geospatial Intelligence System

## 1. Research Topic and Background
### 1.1. Introduction
This study examines arrest patterns across two major American metropolitan police departments—the New York Police Department (NYPD) and the Los Angeles Police Department (LAPD)—from 2010 to 2019. Understanding temporal and spatial crime patterns is crucial for effective law enforcement resource allocation, community safety initiatives, and evidence-based policy development. Recent research by David Weisburd et al.[1] has highlighted the importance of examining micro-geographic concentrations of crime, while John MacDonald [3] demonstrated how policy changes have influenced arrest patterns in major urban centers. This analysis builds upon these works by comparing how arrest patterns manifest differently in America's two largest cities.

### 1.2. Research Questions
- How do temporal arrest patterns differ between NYPD and LAPD, and what might explain these differences?
- What geographic clustering patterns emerge in arrests, and how do these reflect the different urban structures of New York and Los Angeles?
- How have arrest patterns evolved over the 2010-2019 period, and what policy implications might this suggest?

### 1.3. Key Criminological Concepts
1. **Temporal crime patterns**: Cyclical variations in criminal activity based on time of day, day of week, month, or year
2. **Crime hotspots**: Geographic areas with disproportionately high concentrations of criminal activity
3. **Enforcement discretion**: The latitude officers have in deciding whether to make arrests for certain offenses
4. **Broken windows policing**: Enforcement strategy targeting minor offenses to prevent more serious crime
5. **Density-crime relationship**: Theoretical frameworks linking population density to crime rates and patterns
6. **Enforcement density**: The concentration of police resources relative to population and geography

### 1.4. Data Sources
The datasets used in this analysis come from publicly available police arrest records from the NYPD and LAPD, standardized to allow for direct comparison. The NYPD dataset (https://data.cityofnewyork.us/Public-Safety/NYPD-Arrests-Data-Historic-/8h9b-rp9u/about_data) contains more than 5 million records of which only 3,414,946 records are relevant for our analysis, while the LAPD dataset(https://data.lacity.org/Public-Safety/Arrest-Data-from-2010-to-2019/yru6-6re4/about_data) contains 1,320,817 records over the same preiod of time.

These data were originally collected for administrative and operational purposes by each department, then released as part of open data initiatives to increase transparency in policing. While useful for research, it's important to note these datasets reflect enforcement actions rather than actual crime rates—a crucial distinction when interpreting patterns.

### 1.5. Ethical and Legal Considerations
This analysis uses de-identified, aggregated data to protect individual privacy. However, historical biases in policing that have disproportionately affected minority communities must be acknowledged when interpreting these results. As Sampson and Loeffler [4] note, arrest concentrations often reflect a complex interplay of actual crime patterns, policy decisions, and potential enforcement biases.

### 1.6. Data Validity Assessment
While these datasets are comprehensive, several potential limitations exist:
1. Arrest data may reflect enforcement priorities rather than actual crime rates
2. Geocoding accuracy varies, particularly in densely populated areas
3. Cross-city comparisons require careful interpretation due to differences in legal codes, department policies, and reporting practices

In [None]:
import pandas as pd
import sys
import os
import matplotlib.pyplot as plt

# Add the project root to the python path to import src modules
sys.path.append(os.path.abspath('..'))

from src.io import load_data
from src.standardize import rename_nypd_columns, rename_lapd_columns, clean_nypd_data, clean_lapd_data
from src.data_processing import (
    standardize_age_categories, 
    standardize_gender, 
    standardize_race_ethnicity, 
    standardize_offense_categories,
    create_aligned_datasets,
    filter_datasets_by_year_range
)
import src.visualization as viz

# Define paths
DATA_DIR = '../data'
NYPD_PATH = os.path.join(DATA_DIR, 'nypd_aligned.csv')
LAPD_PATH = os.path.join(DATA_DIR, 'lapd_aligned.csv')

# Load data
print("Loading datasets...")
nypd_df = load_data(NYPD_PATH)
lapd_df = load_data(LAPD_PATH)

nypd_final = None
lapd_final = None

if nypd_df is not None and lapd_df is not None:
    # Check if data is already aligned (has standardized columns)
    required_aligned_cols = {'Data_Source', 'Arrest_Year', 'Offense_Std'}
    is_nypd_aligned = required_aligned_cols.issubset(nypd_df.columns)
    is_lapd_aligned = required_aligned_cols.issubset(lapd_df.columns)

    if is_nypd_aligned and is_lapd_aligned:
        print("Data is already aligned. Skipping ETL pipeline.")
        nypd_final, lapd_final = filter_datasets_by_year_range(nypd_df, lapd_df)
    else:
        print("Running ETL pipeline on raw data...")
        # 1. Rename columns to standard names
        nypd_df = rename_nypd_columns(nypd_df)
        lapd_df = rename_lapd_columns(lapd_df)

        # 2. Clean individual datasets
        nypd_df = clean_nypd_data(nypd_df)
        lapd_df = clean_lapd_data(lapd_df)

        # 3. Standardize categories across datasets
        nypd_df, lapd_df = standardize_age_categories(nypd_df, lapd_df)
        nypd_df, lapd_df = standardize_gender(nypd_df, lapd_df)
        nypd_df, lapd_df = standardize_race_ethnicity(nypd_df, lapd_df)
        nypd_df, lapd_df = standardize_offense_categories(nypd_df, lapd_df)

        # 4. Create aligned datasets with common columns
        nypd_aligned, lapd_aligned, common_cols = create_aligned_datasets(nypd_df, lapd_df)

        # 5. Filter to overlapping years
        nypd_final, lapd_final = filter_datasets_by_year_range(nypd_aligned, lapd_aligned)
    
    print(f"Final NYPD shape: {nypd_final.shape}")
    print(f"Final LAPD shape: {lapd_final.shape}")
else:
    print("Failed to load one or both datasets.")

## 3.1. Temporal Crime Pattern
The primary goal of this visualization is to identify how population density might influence arrest patterns in these contrasting urban environments. New York City, with its extremely high population density, consistently shows significantly higher arrest volumes than Los Angeles, despite LA County's larger geographic area. This visualization helps explore whether this disparity is merely a function of overall population or if density itself creates unique crime dynamics.

Additional aims of this visualization include:
- Identifying whether temporal crime patterns are universal across different urban environments or are influenced by local density factors
- Examining how arrest patterns have evolved over a decade in both cities
- Detecting specific time periods where enforcement activities peak in each jurisdiction
- Providing visual evidence for how urban structure (compact vs. sprawling) might influence not just the volume but also the timing of arrests

In [None]:
if nypd_final is not None and lapd_final is not None:
    # Yearly Trend
    fig_year = viz.plot_crime_by_year(nypd_final, lapd_final)
    fig_year.show()
    
    # Monthly Seasonality
    fig_month = viz.plot_crime_by_month(nypd_final, lapd_final)
    fig_month.show()
    
    # Day of Week
    fig_week = viz.plot_crime_by_weekday(nypd_final, lapd_final)
    fig_week.show()
    
    # Day of Month
    fig_dom = viz.plot_crime_by_day_of_month(nypd_final, lapd_final)
    fig_dom.show()

## 3.2. Map Visualization
This visualization explores the geographic distribution and intensity of crime in America's two largest cities, New York City and Los Angeles. New York City is the most populous city in the United States with 8.8 million residents (2020 census) concentrated in just 302 square miles, creating an exceptional population density of 29,302 people per square mile. In contrast, Los Angeles County is the most populous county in the United States with 9.8 million residents spread across 4,751 square miles, resulting in a much lower population density of 2,063 people per square mile. The City of Los Angeles itself contains 3.9 million residents across 503 square miles, with a density of 7,754 people per square mile—still only about a quarter of NYC's density.

Building on our previous temporal analysis—which revealed significant differences in arrest volumes despite LA County's larger geographic area—these heat maps illuminate how these dramatic population density differences and contrasting urban structures shape the spatial distribution of criminal activity.

The primary goal of this visualization is to demonstrate the "density effect" in criminal activity. While our earlier analysis showed that NYPD consistently records 2.6 times more arrests than LAPD, these maps help explain why: crime in New York concentrates intensely in specific areas due to the city's compact geography and high population density, while Los Angeles's sprawling urban landscape disperses criminal activity across a much wider area.

Additional aims of this visualization include:
- Identifying how different urban designs influence crime clustering patterns
- Examining the relationship between transportation infrastructure and crime hotspots
- Visualizing the stark contrast in crime density between a transit-oriented city (NYC) and a car-dependent city (LA)
- Providing visual evidence for why per-square-mile crime rates differ so dramatically between these cities

In [None]:
if nypd_final is not None and lapd_final is not None:
    fig = viz.create_crime_density_comparison(nypd_final, lapd_final)
    plt.show()

## 3.3. Categorical Data Visualization
This visualization examines the demographic patterns of arrests across NYC and LA, America's two most populous urban centers with dramatically different population densities and urban structures. Building on our previous temporal and spatial analyses—which revealed both higher arrest volumes and more concentrated crime hotspots in dense NYC compared to sprawling LA—this demographic breakdown explores whether these contrasting urban environments also produce different patterns in who gets arrested.

The primary goal of this visualization is to identify how population density and urban structure might influence not just where and when arrests occur, but who experiences them. While our previous analyses showed that NYPD makes significantly more arrests per capita and per square mile than LAPD, these charts help us understand whether the demographic composition of those arrests differs in ways that might reflect each city's unique population distribution, enforcement priorities, or community-police dynamics.

Additional aims of this visualization include:
- Examining racial disparities in arrests between high-density and lower-density urban environments
- Identifying similarities and differences in age and gender patterns across these contrasting cities
- Comparing offense type distributions to understand different enforcement priorities or criminal opportunities in dense versus sprawling urban settings
- Providing context for how demographic factors intersect with the spatial patterns observed in our crime density maps

In [None]:
if nypd_final is not None and lapd_final is not None:
    # Race Distribution
    fig_race = viz.plot_race_distribution(nypd_final, lapd_final)
    fig_race.show()
    
    # Gender Distribution
    fig_gender = viz.plot_gender_distribution(nypd_final, lapd_final)
    fig_gender.show()
    
    # Age Distribution
    fig_age = viz.plot_age_distribution(nypd_final, lapd_final)
    fig_age.show()
    
    # Offense Type Distribution
    fig_offense = viz.plot_offense_distribution(nypd_final, lapd_final)
    fig_offense.show()

### 3.3.1 Analysis Interpretation
This comprehensive visualization provides a detailed demographic breakdown of arrests made by the NYPD and LAPD across four key dimensions: race, gender, age, and offense type. The charts reveal both striking similarities and notable differences in arrest patterns between these departments.
1. Race Distribution
- Black: NYPD shows a significantly higher proportion of Black arrestees (48.3%) compared to LAPD (28.7%), a nearly 20-percentage-point difference.
- Hispanic: LAPD shows a markedly higher proportion of Hispanic arrestees (46.6%) compared to NYPD (33.9%).
- White: LAPD arrests a higher percentage of White individuals (18.8%) than NYPD (12.1%).
- Asian/Pacific Islander: NYPD shows a higher percentage (4.4%) compared to LAPD's minimal representation.
- Other category: Only present in LAPD data (5.1%), potentially representing demographic groups not captured in NYPD's classification system.

2. Gender Distribution
- Male predominance: Both departments show a strong male predominance in arrests, reflecting a universal pattern in criminal justice data.
- Slight variation: NYPD shows a slightly higher proportion of male arrestees (82.9%) compared to LAPD (79.2%).
- Female representation: LAPD shows a somewhat higher proportion of female arrestees (20.8%) compared to NYPD (17.1%).

3. Age Distribution
- Remarkable similarity: Age distribution shows striking similarity between departments, with nearly identical proportions across all age categories.
- Prime adult focus: Both departments show the highest concentration of arrests in the 25-44 age group (NYPD: 46.5%, LAPD: 45.4%).
- Young adult arrests: Both show similar proportions in the 18-24 category, though NYPD is slightly higher (26.0% vs. 22.4%).
- Older adults: LAPD shows slightly more arrests in the 45-64 category (23.6% vs. NYPD's 19.1%).
- Juvenile arrests: Nearly identical proportions for those under 18 (NYPD: 7.4%, LAPD: 7.0%).

4. Offense Type Distribution
- "Other" category dominance: Both departments classify a large proportion of arrests as "Other," though LAPD's proportion (63.9%) is significantly higher than NYPD's (49.6%).
- Drug offenses: NYPD shows a higher proportion of drug-related arrests (19.2%) compared to LAPD (12.5%).
- Violent crime: NYPD shows a higher proportion of violent crime arrests (17.8%) compared to LAPD (13.4%).
- Property crime: NYPD shows more than double the proportion of property crime arrests (9.7%) compared to LAPD (3.8%).
- Traffic violations: Present only in LAPD data, reflecting different enforcement or classification systems.

# 4. Conclusions
This comprehensive analysis of NYPD and LAPD arrest data from 2010 to 2019 provides compelling evidence that population density fundamentally shapes urban crime patterns. Our initial question—why Los Angeles County arrests fewer people despite its larger geographic area—is answered through several converging lines of evidence. New York City's exceptional population density (29,302 people per square mile compared to Los Angeles' 7,754) creates a "crime multiplier effect" by concentrating human interactions within confined spaces. This concentration leads to both higher per-capita arrest rates and a dramatically higher spatial concentration of criminal activity.[5]

These findings align with criminological theory, particularly routine activities theory, which posits that crime occurs when motivated offenders and suitable targets converge in the absence of capable guardians[6]. High-density urban environments naturally increase the frequency of such convergences, creating more opportunities for criminal behavior. The "law of crime concentration" suggests that criminal activity tends to cluster in specific micro-geographic locations; this effect is observed much more intensely in New York City than in Los Angeles.

The temporal analysis showed that both cities exhibit similar weekly and seasonal crime patterns, indicating universal trends that go beyond urban design. However, New York City (NYC) consistently records 2.6 times more arrests than the Los Angeles Police Department (LAPD). The mid-week peaks and seasonal variations observed in both cities reflect the ambient population effect.

Our spatial analysis, conducted using heat maps, clearly demonstrated that crime in New York City clusters around transit corridors and central business districts, creating hotspots with significantly higher crime density compared to Los Angeles, which has a more dispersed crime pattern influenced by its highway systems across a sprawling landscape. This observation supports crime pattern theory, which highlights how urban "nodes," "paths," and "edges" shape opportunities for criminal activity. In NYC, the subway system creates high-traffic nodes and paths that concentrate crime, while LA's highway-oriented structure tends to distribute it more evenly across the city.[5]

The demographic analysis revealed that while age distributions are similar in both cities, the racial composition of arrests differs significantly. The NYPD arrests a higher proportion of Black individuals (48.3% compared to 28.7% in LA), while the LAPD arrests more Hispanic individuals (46.6% compared to 33.9% in NYC). These differences likely reflect both local demographics and varying enforcement strategies. Research on the "concentration of incarceration" suggests that contacts with the justice system often follow patterns of residential segregation and neighborhood inequality specific to each city.

The findings from this reaseach might have significant policy implications. Research on urban density and crime suggests that cities with varying density profiles may require tailored approaches to law enforcement deployment, crime prevention, and urban design. For example, transit-oriented, densely populated cities like New York may benefit from targeted interventions at high-traffic locations, while car-centric, sprawling cities like Los Angeles might need more dispersed and mobile enforcement strategies.

Furthermore, these results indicate that caution is necessary when comparing crime rates across cities with vastly different density profiles. Enforcement outcomes must be understood within the unique urban geography of each city. The 40-to-1 difference in arrests per square mile between the NYPD and LAPD demonstrates how simplistic geographic comparisons can be misleading without considering density.

Collectively, these findings show that urban density is a powerful factor that shapes not only crime volumes but also its spatial distribution, temporal patterns, and, to some extent, demographic characteristics. This "density effect" helps clarify the seeming paradox of LA County's lower arrest numbers despite its larger area. These insights are valuable for urban planners, law enforcement agencies, and policymakers as they consider how city design influences public safety. Future research should explore additional pairs of high and low-density cities to test the generalizability of these findings and investigate how emerging technologies and evolving urban mobility patterns may change these density-crime relationships in the post-pandemic urban landscape.

# References
1. Weisburd, D., Cave, B., Nelson, M., et al. 2016, Place Matters: Criminology for the 21st Century, Cambridge University Press, Cambridge. [online] https://assets.cambridge.org/97811070/29521/frontmatter/9781107029521_frontmatter.pdf
2. Oliveira M. 2021, Disentangling community-level changes in crime trends during the COVID-19 pandemic in Chicago. Crime Science, 10(1), Article 15. [online] https://doi.org/10.1186/s40163-021-00155-8
3. MacDonald, J., Fagan, J. & Geller, A. 2016, 'The effects of local police surges on crime and arrests in New York City', PLoS One, vol. 14, no. 6, e0218171. [online] https://doi.org/10.1371/journal.pone.0157223
4. Sampson, R.J. & Loeffler, C. 2010, 'Punishment's place: the local concentration of mass incarceration', Daedalus, vol. 139, no. 3, pp. 20-31. [online] https://direct.mit.edu/daed/article/139/3/20/26847/Punishment-s-place-the-local-concentration-of-mass
5. Glaeser, E. L., 2011. Triumph of the City: How Our Greatest Invention Makes Us Richer, Smarter, Greener, Healthier, and Happier.
6. Felson, M., & Cohen, L. E. (1979). "Social Change and Crime Rate Trends: A Routine Activity Approach." American Sociological Review, 44(4), 588–608.

## Appendices

## Word Count

The following code will count the number of words in Markdown cells. Code cells are not included.

- `Main word count` is the number of words in the main body of the text, *excluding* references or appendices.
- `References and appendices word count` is the number of words in any references or appendices.

Only `Main word count` relates to the assignment word limit. There is no limit to the number of words that can be included in references or appendices. Please note that appendices should only be used to provide context or supporting information. *No marks will be directly awarded for material submitted in appendices*.

Important:

- Please do not modify the word count code!
- To exclude references from your word count **you must** have a cell that starts with the text `## References`. Everything below this cell will not count towards the main word count.
- If you are submitting additional material as appendices **you must** have a cell that starts with the text `## Appendices`. Everything below this cell will not count towards the main word count. If you do not have any appendices you can delete the `## Appendices` cell.
- Code comments should only be used to explain details of the implementation, not for discussing your findings. All analysis commentary **must** be written in Markdown cells. *No marks will be awarded for analysis discussion submitted as comments in code cells*.

In [None]:
%%js

// Run this cell to update your word count.

function wordcount() {
    let wordCount = 0
    let extraCount = 0
    let mainBody = true

    let cells = Jupyter.notebook.get_cells()
    cells.forEach((cell) => {
        if (cell.cell_type == 'markdown') {
            let text = cell.get_text()
            // Stop counting as main body when get to References or Appendices.
            if (text.startsWith('## References') ||
                text.startsWith('## Appendices')) {
                mainBody = false
            }
            if (text.startsWith('## Word Count')) {
                text = ''
            }
            if (text) {
                let words = text.toLowerCase().match(/\b[a-z\d]+\b/g)
                if (words) {
                    let cellCount = words.length
                    if (mainBody) {
                        wordCount += cellCount
                    } else {
                        extraCount += cellCount
                    }
                }
            }
        }
    })
    return [wordCount, extraCount]
}

let wc = wordcount()
element.append(`Main word count: ${wc[0]} (References and appendices word count: ${wc[1]})`)