# 30 July 2022: Los Angeles Traffic Accidents, 2010–2022 — Heatmap

Next order of business is sorting out where accidents tend to occur most. My guess it that we'll see freeway corridors, since it's obvious that more traffic = more accidents, but maybe we'll also discover other points of curiosity.

_Note: To reduce the size of this notebook in order that it will render on GitHub, I've cleared all the heatmaps from the outputs and am inserting images instead._

In [1]:
%matplotlib inline

# Standard imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

# Load dataset. Force dtype 'str' to preserve leading zeroes in 'Time Occurred' col
traffic = pd.read_csv('../data/LA-traffic-collision-2010-to-present.csv', dtype='str')

traffic.head()

Unnamed: 0,DR Number,Date Reported,Date Occurred,Time Occurred,Area ID,Area Name,Reporting District,Crime Code,Crime Code Description,MO Codes,Victim Age,Victim Sex,Victim Descent,Premise Code,Premise Description,Address,Cross Street,Location
0,190319651,08/24/2019,08/24/2019,450,3,Southwest,356,997,TRAFFIC COLLISION,3036 3004 3026 3101 4003,22.0,M,H,101,STREET,JEFFERSON BL,NORMANDIE AV,"(34.0255, -118.3002)"
1,190319680,08/30/2019,08/30/2019,2320,3,Southwest,355,997,TRAFFIC COLLISION,3037 3006 3028 3030 3039 3101 4003,30.0,F,H,101,STREET,JEFFERSON BL,W WESTERN,"(34.0256, -118.3089)"
2,190413769,08/25/2019,08/25/2019,545,4,Hollenbeck,422,997,TRAFFIC COLLISION,3101 3401 3701 3006 3030,,M,X,101,STREET,N BROADWAY,W EASTLAKE AV,"(34.0738, -118.2078)"
3,190127578,11/20/2019,11/20/2019,350,1,Central,128,997,TRAFFIC COLLISION,0605 3101 3401 3701 3011 3034,21.0,M,H,101,STREET,1ST,CENTRAL,"(34.0492, -118.2391)"
4,190319695,08/30/2019,08/30/2019,2100,3,Southwest,374,997,TRAFFIC COLLISION,0605 4025 3037 3004 3025 3101,49.0,M,B,101,STREET,MARTIN LUTHER KING JR,ARLINGTON AV,"(34.0108, -118.3182)"


Here I am interested in the <code>Location</code> column, which appears to consist of a latitude/longitude pair.

In [2]:
locations = pd.DataFrame({'coords':traffic['Location']})
locations.value_counts()

coords              
(0.0, 0.0)              762
(33.9892, -118.3089)    596
(34.2012, -118.4662)    527
(33.9601, -118.2827)    526
(34.2216, -118.4488)    517
                       ... 
(34.0052, -118.2685)      1
(34.0052, -118.2991)      1
(34.1533, -118.4742)      1
(34.0052, -118.3518)      1
(34.692, -118.1746)       1
Length: 50947, dtype: int64

The first thing we need to do is drop the useless <code>(0.0, 0.0)</code> entries, and then I'll extract the latitude and longitude numbers, since we'll use those to plot points.

In [3]:
# Drop records without meaningful coords
to_drop = locations[locations['coords'] == '(0.0, 0.0)'].index
locations.drop(index=to_drop, inplace=True)

In [4]:
# Extract lat/lon data
locations['lat'] = locations['coords'].str.extract('^\((.+?),')
locations['lon'] = locations['coords'].str.extract(', (.+?)\)')
locations.head()

Unnamed: 0,coords,lat,lon
0,"(34.0255, -118.3002)",34.0255,-118.3002
1,"(34.0256, -118.3089)",34.0256,-118.3089
2,"(34.0738, -118.2078)",34.0738,-118.2078
3,"(34.0492, -118.2391)",34.0492,-118.2391
4,"(34.0108, -118.3182)",34.0108,-118.3182


Now I'll map this data using a mapping library called folium.

In [None]:
import folium
from folium import plugins

m = folium.Map(location=[34.052235, -118.243683], zoom_start=10)

# Reformat dataset as (2, n) array
locArr = locations[['lat', 'lon']].values

# Generate heatmap
m.add_child(plugins.HeatMap(locArr, radius=15))

Pretty cool! At the default zoom level, it looks like the entire city is a crash zone, but once you zoom in you can see problematic intersections.

![title](images/folium.plugins.HeatMap/1.png)
![title](images/folium.plugins.HeatMap/2.png)
![title](images/folium.plugins.HeatMap/3.png)
![title](images/folium.plugins.HeatMap/4.png)

I'm curious to try a heatmap with plotly. This time, I'll count up the incidents for each set of coordinates and feed that into the map as well, so first I'll re-create the dataset.

In [6]:
# Extract lat/lon data we need
locations = pd.DataFrame({
    'coords':traffic['Location'].value_counts().index, 
    'count':traffic['Location'].value_counts().values
})

# Drop the first record, which I've discovered corresponds to the (0.0, 0.0) coordinates pair
locations.drop(index=0, inplace=True)

# Extract lat and lon into new cols
locations['lat'] = locations['coords'].str.extract('^\((.+?),')
locations['lon'] = locations['coords'].str.extract(', (.+?)\)')

locations.head()

Unnamed: 0,coords,count,lat,lon
1,"(33.9892, -118.3089)",596,33.9892,-118.3089
2,"(34.2012, -118.4662)",527,34.2012,-118.4662
3,"(33.9601, -118.2827)",526,33.9601,-118.2827
4,"(34.2216, -118.4488)",517,34.2216,-118.4488
5,"(34.2355, -118.5536)",497,34.2355,-118.5536


In [None]:
import plotly.express as px

fig = px.density_mapbox(
    locations, lat='lat', lon='lon',
    z='count', center={'lat':34.052235, 'lon': -118.243683},
    zoom=10, mapbox_style='open-street-map', opacity=.8, radius=20)

fig.show()

I think I prefer this second heatmap. The legend, which specifies the total number of accidents at a given location from 2010–2022, is a very nice addition.

![title](images/plotly.express.density_mapbox/static_1.png)
![title](images/plotly.express.density_mapbox/static_2.png)
![title](images/plotly.express.density_mapbox/static_3.png)

I'm seeing that the above plotly map also includes time functionality, so let's see if I can create an animation where we see changing patterns by year. This means creating a new DataFrame once again that includes information about the date of the accident.

In [8]:
df = traffic[['Location', 'Date Occurred']]

# Get index for all records with (0.0, 0.0)
to_drop = df[df['Location'] == '(0.0, 0.0)'].index

# Drop all records with (0.0, 0.0)
df = df.drop(index=to_drop)

# Extract year
df['year'] = df['Date Occurred'].str[-4:]

# Set count for each record to 1
df['count'] = 1

# Group by year and location and reset index
df = df.groupby(['year', 'Location']).sum().reset_index()

# Extract lat and lon into new cols
df['lat'] = df['Location'].str.extract('^\((.+?),')
df['lon'] = df['Location'].str.extract(', (.+?)\)')

df.head()

Unnamed: 0,year,Location,count,lat,lon
0,2010,"(33.7065, -118.2928)",5,33.7065,-118.2928
1,2010,"(33.707, -118.2907)",1,33.707,-118.2907
2,2010,"(33.707, -118.2939)",2,33.707,-118.2939
3,2010,"(33.7089, -118.2855)",1,33.7089,-118.2855
4,2010,"(33.7096, -118.2879)",5,33.7096,-118.2879


In [None]:
fig = px.density_mapbox(
    df, lat='lat', lon='lon',
    z='count', center={'lat':34.052235, 'lon': -118.243683},
    animation_frame='year', animation_group='Location',
    zoom=10, mapbox_style='open-street-map', opacity=.8, radius=20)

fig.show()

This is more of a curiosity for me at this point—I'm not sure how much useful information one can glean from this animation—but it's nevertheless fun to experiment with this functionality. If anything it helps us see how consistent problem areas are from year to year (plus the precipitous decline in accidents overall beginning in 2020 thanks to Covid).

![title](images/plotly.express.density_mapbox/animation_1.png)