# New York City 311 Data

***Important! The output of this notebook has not been included, because it makes it too big for GitHub!  Want to see the output?  Go to https://colab.research.google.com/drive/1gSLCqA-gCMRhaOlZvryMi4i6DBW0BwFj. ***

## Overview

In the city of New York, citizens with non-emergency complaints (e.g. trash non-collection, rodent infestations) can call 311 to make a Service Request.  These are recorded and shared on New York's open data site at  https://nycopendata.socrata.com/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9.

## High-Level Description

The data dates from 2010 to the current day, with data being updated on a daily basis.  At the time of this writing, there are over 20 million rows, each row representing a single service request, and over 40 columns which represent aspects of each service request, such as the street address being referenced, the type of complaint, the agency responsible, the date of the service request, etc.

## Bring in Data via pandas

I'm only going to bring in only the rows that have 'Pothole' in the `descriptor` field.  I'll set an upper limit of 5 million rows.


In [None]:
import pandas as pd
import numpy as np
import datetime as dt
potholes = pd.read_csv("https://data.cityofnewyork.us/resource/fhrw-4uyv.csv?descriptor=Pothole&$limit=5000000")

Let's take a quick peek at what the data looks like.  Then we'll use pandas to work with it!

In [None]:
potholes.head()

In [None]:
potholes.shape

OK, we have around 570 k rows, much less than our 5 million upper limit, but plenty to work with!  Let's do a bit of cleanup.  First, we'll do some date work.

In [None]:
for col in ['created_date', 'closed_date', 'due_date', 'resolution_action_updated_date']:
    potholes[col] = pd.to_datetime(potholes[col])
    potholes.loc[potholes[col] < '2007-01-01', col] = pd.NaT
    potholes.loc[potholes[col] > pd.Timestamp(dt.date.today())] = pd.NaT

potholes['resolved_date'] = np.where(potholes['closed_date'].notnull(), potholes['closed_date'], potholes['resolution_action_updated_date'])
potholes['days_to_close'] = (potholes['resolved_date'].dt.date - potholes['created_date'].dt.date).dt.days


# Get names of indexes for which days_to_close < 0
indexNames = potholes[potholes['days_to_close'] <0 ].index
# Drop them
potholes.drop(indexNames , inplace=True)

Let's change some of the geographic stuff.

In [None]:
new_lat_long = (potholes['location'].str.extract('.+(\-\d{2}\.*\d*) (\d{2}\.*\d*).+')).astype(float)
potholes.loc[:, 'longitude'] = new_lat_long[0]
potholes.loc[:, 'latitude'] = new_lat_long[1]

And let's remove "unspecified" boroughs and tickets that weren't closed.

In [None]:
indexNames = potholes[potholes['borough'] == 'Unspecified' ].index
potholes.drop(indexNames , inplace=True)

potholes.drop(potholes[potholes['status'] != "Closed"].index, axis=0, inplace=True)

In [None]:
potholes['resolution_description'].unique()

And create better, briefer resolution descriptions.

In [None]:
resolution_map = zip(potholes['resolution_description'].unique(), ["Repaired",
                                                                  "Did Not Find",
                                                                  "Repaired Already",
                                                                  "Duplicate",
                                                                  "Referred: Maintenance Unit",
                                                                  "Repaired: Capital Project",
                                                                  "No Description",
                                                                  "Rescheduled: Inaccessible",
                                                                  "Assigned: Field Crew",
                                                                  "Referred: Inspections Unit",
                                                                  "Future Maintenance Will Repair (Incomplete Decription)",
                                                                  "Status Not Available",
                                                                  "Future Maintenance Will Repair (Complete Decription)",
                                                                  "Not in DOT Jurisdiction (Not Specified)",
                                                                  "Completed or Corrected",
                                                                  "See Customer Notes",
                                                                  "Requires 6 Months for Response",
                                                                  "Not Repaired, was in Compliance",
                                                                  "Repair to be Scheduled",
                                                                  "Insufficient Information to Respond",
                                                                  "Not in DOT Jurisdiction (MTA)",
                                                                  "Not in DOT Jurisdiction (Parks and Rec)",
                                                                  "Referred: Barricaded",
                                                                  "Temporarily Repaired",
                                                                  "Not in DOT Jurisdiction (Other)",
                                                                  "Referred: Other DOT",
                                                                  "In Progress",
                                                                  "Referred: Dept. Environmental Protection",
                                                                  "Not in DOT Jurisdiction (State DOT)"
                                                                  ])

In [None]:
simple_map = zip(potholes['resolution_description'].unique(), ["Repaired",
                                                              "Not Repaired",
                                                              "Repaired",
                                                              "Duplicate",
                                                              "Not Repaired",
                                                              "Repaired",
                                                              "Unknown",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Unknown",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Repaired",
                                                              "Unknown",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired",
                                                              "Repaired",
                                                              "Not Repaired",
                                                              "Not Repaired"
                                                                  ])

In [None]:
potholes['shorter_resolution_desc'] = potholes['resolution_description'].map(dict(resolution_map))
potholes['shortest_resolution_desc'] = potholes['resolution_description'].map(dict(simple_map))

In [None]:
potholes.drop(potholes[potholes['shorter_resolution_desc'] == "Duplicate"].index, axis=0, inplace=True)

We're going to use Plotly, specifically, `plotly.express` to visualize some pothole trends... let's bring that  in. 

In [None]:
!pip install plotly.express
import plotly.express as px

In [None]:
fig = px.box(potholes,  x="borough", y="days_to_close")
fig.show()

Well, the presence of extreme outliers makes this hard to understand.  Let's constrain what's shown on the y axis.  Yes, the viewer could do this, but let's make it easier.  

In [None]:
fig = px.box(potholes,  x="borough", y="days_to_close", 
             range_y = [0,20],
            labels={"days_to_close": "Time to Resolution", "borough": "Borough"})
fig.show()

Let's bin our dates by month, so we can show aggregate data by month!

In [None]:
potholes_by_month = potholes.groupby(['borough', pd.Grouper(key='created_date', freq='M')])['days_to_close'].median()
type(potholes_by_month)

In [None]:
potholes_by_month = potholes_by_month.reset_index()
potholes_by_month

In [None]:
fig = px.line(potholes_by_month, x='created_date', y='days_to_close', color='borough')
fig.show()

Interesting!  It looks like there's some uptick in the time it takes to resolve a ticket near the middle of the year.  We might want to take a closer look at that.

Now let's take another look at the data, using bokeh.  This time we'll check out resolution types by borough.

In [None]:
potholes_resolution_borough = potholes.groupby(['borough', 'shortest_resolution_desc'])['unique_key'].count()
potholes_resolution_borough = potholes_resolution_borough.reset_index()
potholes_resolution_borough = potholes_resolution_borough.pivot(index='borough', columns='shortest_resolution_desc', values='unique_key').transpose()
potholes_resolution_borough


In [None]:
from bokeh.core.properties import value
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
from bokeh.palettes import viridis
from math import pi

output_notebook()

resolutions = list(potholes_resolution_borough.index)
boroughs = list(potholes_resolution_borough.columns)
colors = viridis(5)

data = potholes_resolution_borough.copy()
data['resolution'] = data.index


p = figure(x_range=resolutions, plot_height=750, title="Resolutions by Borough",
           toolbar_location=None, tools="")



p.vbar_stack(boroughs, x='resolution', width=0.9, color=colors, source=data,
             legend=[value(x) for x in boroughs])

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.major_label_orientation = pi/2

show(p)

