# Boston Crime Data Analysis
This notebook presents a comprehensive geospatial and temporal analysis of major crime incidents in Boston. I leverage Python's data science stack, including pandas, geopandas, and folium, to explore, clean, and visualize the data. Each section is accompanied by code and detailed explanations to ensure clarity and reproducibility for a technical audience.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Importing Required Libraries
I begin by importing the essential Python libraries for data manipulation and geospatial visualization. `pandas` is used for data handling, while `folium` and its plugins enable interactive mapping.

In [2]:
import pandas as pd

In [3]:
import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster

## Initializing the Base Map
I initialize a base map centered on Boston using Folium. This map serves as the foundation for subsequent geospatial visualizations.

In [4]:
map = folium.Map(location=[42.32,-71.0589], titles="openstreetmap", zoom_start=10)

map

## Loading the Boston Crime Dataset
The Boston crime dataset is loaded from a CSV file.

In [5]:
crimesBoston = pd.read_csv("/content/drive/MyDrive/geoSpatial_dataSet/crimes-in-boston/crime.csv", encoding="latin-1")
crimesBoston.head()

Unnamed: 0,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
0,I182070945,619,Larceny,LARCENY ALL OTHERS,D14,808,,2018-09-02 13:00:00,2018,9,Sunday,13,Part One,LINCOLN ST,42.357791,-71.139371,"(42.35779134, -71.13937053)"
1,I182070943,1402,Vandalism,VANDALISM,C11,347,,2018-08-21 00:00:00,2018,8,Tuesday,0,Part Two,HECLA ST,42.306821,-71.0603,"(42.30682138, -71.06030035)"
2,I182070941,3410,Towed,TOWED MOTOR VEHICLE,D4,151,,2018-09-03 19:27:00,2018,9,Monday,19,Part Three,CAZENOVE ST,42.346589,-71.072429,"(42.34658879, -71.07242943)"
3,I182070940,3114,Investigate Property,INVESTIGATE PROPERTY,D4,272,,2018-09-03 21:16:00,2018,9,Monday,21,Part Three,NEWCOMB ST,42.334182,-71.078664,"(42.33418175, -71.07866441)"
4,I182070938,3114,Investigate Property,INVESTIGATE PROPERTY,B3,421,,2018-09-03 21:05:00,2018,9,Monday,21,Part Three,DELHI ST,42.275365,-71.090361,"(42.27536542, -71.09036101)"


## Data Cleaning and Filtering
let's start cleaning the dataset by removing records without location or district information. Next, I'll filter for major crime categories and restrict the analysis to incidents from 2018 onwards.

In [6]:
#droppin rows which does not contain any location information
crimesBoston.dropna(subset=['Lat','Long','DISTRICT'], inplace=True)

#selecting major crimes which we will be analyzing
crimesBoston = crimesBoston[crimesBoston.OFFENSE_CODE_GROUP.isin(['Larceny', 'Auto Theft', 'Robbery', 'Larceny From Motor Vehicle', 'Residential Burglary',
    'Simple Assault', 'Harassment', 'Ballistics', 'Aggravated Assault', 'Other Burglary',
    'Arson', 'Commercial Burglary', 'HOME INVASION', 'Homicide', 'Criminal Harassment',
    'Manslaughter'])]

#selecting year from where onwards we will be performing our analysis
crimesBoston = crimesBoston[crimesBoston.YEAR>=2018]

crimesBoston.head()

Unnamed: 0,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
0,I182070945,619,Larceny,LARCENY ALL OTHERS,D14,808,,2018-09-02 13:00:00,2018,9,Sunday,13,Part One,LINCOLN ST,42.357791,-71.139371,"(42.35779134, -71.13937053)"
6,I182070933,724,Auto Theft,AUTO THEFT,B2,330,,2018-09-03 21:25:00,2018,9,Monday,21,Part One,NORMANDY ST,42.306072,-71.082733,"(42.30607218, -71.08273260)"
8,I182070931,301,Robbery,ROBBERY - STREET,C6,177,,2018-09-03 20:48:00,2018,9,Monday,20,Part One,MASSACHUSETTS AVE,42.331521,-71.070853,"(42.33152148, -71.07085307)"
19,I182070915,614,Larceny From Motor Vehicle,LARCENY THEFT FROM MV - NON-ACCESSORY,B2,181,,2018-09-02 18:00:00,2018,9,Sunday,18,Part One,SHIRLEY ST,42.325695,-71.068168,"(42.32569490, -71.06816778)"
24,I182070908,522,Residential Burglary,BURGLARY - RESIDENTIAL - NO FORCE,B2,911,,2018-09-03 18:38:00,2018,9,Monday,18,Part One,ANNUNCIATION RD,42.335062,-71.093168,"(42.33506218, -71.09316781)"


## Extracting Daytime Robbery Incidents
In this section I will be isolating robbery incidents that occurred during daytime hours (9 AM to 5 PM). This subset will be used for focused spatial analysis.

In [7]:
daytime_robberies = crimesBoston[((crimesBoston.OFFENSE_CODE_GROUP == 'Robbery') & (crimesBoston.HOUR.isin(range(9,18))))]

## Visualizing Daytime Robberies with Markers
Now plotting the locations of daytime robberies on an interactive map using Folium markers. This visualization helps identify spatial patterns in robbery incidents.

In [8]:
marker_map = folium.Map(location=[42.32,-71.0589], tiles='cartodbpositron', zoom_start=13)

for n, row in daytime_robberies.iterrows():
  Marker([row['Lat'], row['Long']]).add_to(marker_map)


marker_map

## Clustering Crime Locations
To better understand the spatial distribution of all major crimes, lets use marker clustering. This approach aggregates nearby incidents, making dense areas more interpretable.

In [9]:
import math

cluster_map = folium.Map(location=[42.32,-71.0589], tiles='cartodbpositron', zoom_start=13)

mc = MarkerCluster()
for n, row in crimesBoston.iterrows():
  if not math.isnan(row['Long']) and not math.isnan(row['Lat']):
    mc.add_child(Marker([row['Lat'], row['Long']]))

cluster_map.add_child(mc)

cluster_map

## Bubble Map of Daytime Robberies by Hour
Create a bubble map where each robbery is represented by a circle. The color of each circle encodes the hour of the incident.

In [10]:
bubble_map = folium.Map(location=[42.32, -71.0589], tiles='cartodbpositron', zoom_start=13)

def color_producer(val):
  if val <= 12:
    return 'forestgreen'
  else:
    return 'darkred'



for i in range (0,len(daytime_robberies)):
  Circle(
      location=[daytime_robberies.iloc[i]['Lat'], daytime_robberies.iloc[i]['Long']],
      radius=20, color=color_producer(daytime_robberies.iloc[i]['HOUR'])
  ).add_to(bubble_map)

bubble_map

## Heatmap of Crime Density
A heatmap is generated to visualize the density of all major crime incidents across Boston. This helps highlight crime hotspots and areas requiring further investigation.

In [11]:
heat_map = folium.Map(location=[42.32, -71.0589], tiles='cartodbpositron', zoom_start=12)

HeatMap(data=crimesBoston[['Lat','Long']], radius=10).add_to(heat_map)

heat_map

## Loading Boston Police District Boundaries
Load the shapefile containing Boston police district boundaries using GeoPandas. This geospatial data will be used for district-level aggregation and mapping.

In [12]:
import geopandas as gpd

districts_full = gpd.read_file  ("/content/drive/MyDrive/geoSpatial_dataSet/Police_Districts/Police_Districts.shp")


districts = districts_full[["DISTRICT", "geometry"]].set_index("DISTRICT")
districts.head()

Unnamed: 0_level_0,geometry
DISTRICT,Unnamed: 1_level_1
A15,"MULTIPOLYGON (((-71.07416 42.39051, -71.07415 ..."
A7,"MULTIPOLYGON (((-70.99644 42.39557, -70.99644 ..."
A1,"POLYGON ((-71.052 42.36884, -71.05169 42.3687,..."
C6,"POLYGON ((-71.04406 42.35403, -71.04412 42.353..."
D4,"POLYGON ((-71.07416 42.35724, -71.07359 42.357..."


## Aggregating Crime Counts by District
We compute the number of major crime incidents per police district. This aggregation is essential for district-level analysis and visualization.

In [13]:
plot_dict = crimesBoston.DISTRICT.value_counts()
plot_dict.head()

Unnamed: 0_level_0,count
DISTRICT,Unnamed: 1_level_1
D4,2885
B2,2231
A1,2130
C11,1899
B3,1421


## Choropleth Map of Crime by District
A choropleth map is created to visualize the distribution of major crimes across Boston's police districts. Districts are shaded according to the number of incidents, providing a clear overview of spatial disparities in crime rates.

In [14]:
choropleth_map = folium.Map(location=[42.32,-71.0589], tiles='cartodbpositron', zoom_start=12)

Choropleth(
    geo_data=districts.__geo_interface__,
    data=plot_dict,
    key_on="feature.id",
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Major criminal incidents (Jan-Aug 2018)'
).add_to(choropleth_map)

choropleth_map

## Conclusion
This notebook demonstrated a robust workflow for analyzing and visualizing Boston crime data. By combining data cleaning, aggregation, and advanced geospatial visualization, we derived actionable insights into the spatial and temporal patterns of major crimes in Boston.

In [15]:
#hint