# Open Data Rat Complaints

We know that 311 service requests can be found on Open Data [here](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/about_data). This dataset is large, so we should use the API query to access it!

## Socrata API

In [3]:
# Ensure you have all of your packages installed
# !pip install sodapy

Collecting sodapy
  Downloading sodapy-2.2.0-py2.py3-none-any.whl (15 kB)
Installing collected packages: sodapy
Successfully installed sodapy-2.2.0


In [4]:
import pandas as pd
import os
from sodapy import Socrata
import time
import math

Using the [API documentation](https://dev.socrata.com/foundry/data.cityofnewyork.us/erm2-nwe9), we can see that the dataset id is "erm2-nwe9" (which is also in the open data hyperlink).

You should generally use an [application token](https://dev.socrata.com/docs/app-tokens.html) when accessing the Socrata Open Data API. Otherwise you'll be subjected to strict throttling limits. I recommend saving your token as an environment variable, but you can also use a string (just be careful not to save your token in a public place, like in a public github repo).

In [23]:
# I would recommend saving your socrata token as an environment variable!
app_token = os.getenv('SOCRATA_API')
# You could also save it as a string. The default is None. 
# SOCRATA_API = None

# Save the 311 dataset id
dataset_id = 'erm2-nwe9'

The Socrata API lets you filter your data, which helps when you're using large datasets. 
Some of the standard filters can be found [here](https://dev.socrata.com/docs/queries/).
It's often helpful to filter on dates, which can be a bit trickier ([link](https://dev.socrata.com/docs/transforms/)). 

In [30]:
where_str = '''
1=1 AND (
((complaint_type = 'School Maintenance') AND (descriptor = 'Rodents/Mice'))
OR ((complaint_type = 'Food Establishment') AND (descriptor = 'Rodents/Insects/Garbage'))
OR ((complaint_type = 'Rodent') AND (descriptor = 'Rat Sighting'))
OR ((complaint_type = 'Rodent') AND (descriptor = 'Signs of Rodents'))
OR ((complaint_type = 'Maintenance or Facility') AND (descriptor = 'Rodent Sighting'))
OR ((complaint_type = 'Dead Animal') AND (descriptor = 'Rat or Mouse'))
OR ((complaint_type = 'UNSANITARY CONDITION') AND (descriptor = 'PESTS'))
) AND (date_trunc_ymd(created_date) >= '2019-01-01')
'''

In [34]:
%%time

# Note that you likely need to change the data limit
client = Socrata("data.cityofnewyork.us", app_token)
# timeout; default is 10sec
client.timeout = 120

results = client.get(dataset_id, where=where_str, limit=1000000)
opendata_df = pd.DataFrame.from_records(results)
print('Check if shape looks reasonable (does it match the provided limit?)')
opendata_df.shape

Check if shape looks reasonable
CPU times: user 11.3 s, sys: 1.38 s, total: 12.7 s
Wall time: 35.9 s


(450235, 38)

I also wrote a small program (saved in socrata.py) that does this for you.

In [35]:
from socrata import socrata_api_query

rat_df = socrata_api_query(
    dataset_id=dataset_id,
    where=where_str,
    limit=1000000
)
rat_df.shape