## A quick review on accessing data via APIs

Let's get some street tree data from the San Francisco Open Data Portal and use it to practice with APIs.

In [2]:
%matplotlib inline

import pandas as pd

import json      # library for working with JSON-formatted text strings
import requests  # library for accessing content from web URLs

import pprint    # library for cleanly printing Python data structures
pp = pprint.PrettyPrinter()

First get familiar with the API endpoint from the portal documentation:
https://data.sfgov.org/City-Infrastructure/Street-Tree-Map/337t-q2b4

Under Export / SODA (Socrata Open Data API) we can see the API endpoint url, and the columns available.

In [3]:
# download data
endpoint_url = "https://data.sfgov.org/resource/337t-q2b4.json?"
response = requests.get(endpoint_url)
results = response.text

In [5]:
# print the first 500 characters to see a sample of the data
print(results[:500])

print(type(results))

[ {
  "permitnotes" : "Permit Number 20649",
  "qspecies" : "Tree(s) ::",
  "plantdate" : "1983-04-25T00:00:00",
  "qcaretaker" : "Private",
  "siteorder" : "1",
  "treeid" : "60867",
  "qlegalstatus" : "Permitted Site",
  "qsiteinfo" : "Sidewalk: Curb side : Cutout",
  "planttype" : "Tree"
}
, {
  "permitnotes" : "Permit Number 43835",
  "qspecies" : "Tree(s) ::",
  "plantdate" : "2001-04-29T00:00:00",
  "qcaretaker" : "Private",
  "siteorder" : "60",
  "treeid" : "44642",
  "qlegalstatus" : "P
<class 'str'>


In [6]:
# parse the string into a Python dictionary (loads = "load string")
data = json.loads(results)
print(type(data))


<class 'list'>


In [7]:
# A general way to do it:

dictionary = {'permitnotes': [d['permitnotes'] for d in data if "permitnotes" in d],
              'qspecies': [d['qspecies'] for d in data if "permitnotes" in d],
              'treeid': [d['treeid'] for d in data if "permitnotes" in d],
             'planttype': [d['planttype'] for d in data if "permitnotes" in d],
             'qcaretaker': [d['qcaretaker'] for d in data if "permitnotes" in d]}



df = pd.DataFrame.from_dict(dictionary)
df.head()

Unnamed: 0,permitnotes,planttype,qcaretaker,qspecies,treeid
0,Permit Number 20649,Tree,Private,Tree(s) ::,60867
1,Permit Number 43835,Tree,Private,Tree(s) ::,44642
2,Permit Number 43922,Tree,Private,Tree(s) ::,44994
3,Permit Number 39461,Tree,Private,Robinia x ambigua :: Locust,36641
4,Permit Number 39461,Tree,Private,Robinia x ambigua :: Locust,36617


In this particular case, the JSON data happens to be a list of simple dictionaries. That enables us to use a much simpler approach to convert it to a dataframe:

In [9]:
# Converting list of dicts to Pandas dataframe
df2=pd.DataFrame.from_records(data)
df2.head()

Unnamed: 0,permitnotes,plantdate,planttype,plotsize,qcaretaker,qlegalstatus,qsiteinfo,qspecies,siteorder,treeid
0,Permit Number 20649,1983-04-25T00:00:00,Tree,,Private,Permitted Site,Sidewalk: Curb side : Cutout,Tree(s) ::,1,60867
1,Permit Number 43835,2001-04-29T00:00:00,Tree,,Private,Permitted Site,Sidewalk: Curb side : Cutout,Tree(s) ::,60,44642
2,Permit Number 43922,2001-04-25T00:00:00,Tree,,Private,Permitted Site,Sidewalk: Curb side : Cutout,Tree(s) ::,106,44994
3,Permit Number 39461,1998-05-20T00:00:00,Tree,,Private,Permitted Site,Sidewalk: Curb side : Cutout,Robinia x ambigua :: Locust,13,36641
4,Permit Number 39461,1998-05-20T00:00:00,Tree,,Private,Permitted Site,Sidewalk: Curb side : Cutout,Robinia x ambigua :: Locust,8,36617


## A quick tutorial on JSON 

In [10]:
import numpy as np
import pandas as pd
import requests 
import pprint    # library for cleanly printing Python data structures
pp = pprint.PrettyPrinter()

## What is JSON?

* **JSON**: JavaScript Object Notation is a very standard semi-structured file format used to store nested data.

```javascript
{
    "field1": "value1",
    "field2": ["list", "of", "values"],
    "myfield3": {"is_recursive": true, "a null value": null}
}
```

A few key points:
* JSON is a recursive format in that JSON fields can also contain JSON objects
* JSON closely matches Python Dictionaries:
```python
d = {
    "field1": "value1",
    "field2": ["list", "of", "values"],
    "myfield3": {"is_recursive": True, "a null value": None}
}
print(d['myfield3'])
```

## Getting the JSON Data

For this tutorial I'm using the Stop Data from City of Berkely Open Data Portal, Public Safety Department. https://data.cityofberkeley.info/api/views/6e9j-pj9p/rows.json?accessType=DOWNLOAD. I downloaded the json file directly from the website rather than using API because I wanted to show you how to work with json and nested dictionaries. The API output is a simple list that you already know how to work with. If you want to get the data by API, see the bottom of this notebook(1).   

In [11]:
#To see a sample of the data I'll look at the first 20 line of my file
!head -n 20 data/stops.json

{
  "meta" : {
    "view" : {
      "id" : "6e9j-pj9p",
      "name" : "Berkeley PD - Stop Data",
      "attribution" : "Berkeley Police Department",
      "averageRating" : 0,
      "category" : "Public Safety",
      "createdAt" : 1444171604,
      "description" : "This data was extracted from the Department’s Public Safety Server and covers the data beginning January 26, 2015.  On January 26, 2015 the department began collecting data pursuant to General Order B-4 (issued December 31, 2014).  Under that order, officers were required to provide certain data after making all vehicle detentions (including bicycles) and pedestrian detentions (up to five persons).  This data set lists stops by police in the categories of traffic, suspicious vehicle, pedestrian and bicycle stops.  Incident number, date and time, location and disposition codes are also listed in this data.\r\n\r\nAddress data has been changed from a specific address, where applicable, and listed as the block where 

## Loading the JSON Data

In [12]:
#import the entire JSON datafile into a python dictionary
import json
with open("data/stops.json", "rb") as f:
     stops_json = json.load(f)

The `stops_json` variable is now a dictionary encoding the data in the file:

In [13]:
type(stops_json)

dict

### Let's examine what keys are in the top level json object

We can list the keys to determine what data is stored in the object. 

In [14]:
stops_json.keys()

dict_keys(['meta', 'data'])

We see Stops json/dict has two elements,'meta' and 'data'. Let's explore are these two objects. 

In [15]:
print(type(stops_json['meta']))
print(type(stops_json['data']))

<class 'dict'>
<class 'list'>


### Digging into the MetaData

We see 'meta' itself is a dictionary nested in stops dictionary. We can investigate the 'meta' dictionary, by examining the keys associated with the meta. 

In [16]:
stops_json['meta'].keys()  
# We see meta has one element in it, 'view'. 

dict_keys(['view'])

In [17]:
# We can print the meta 
meta= stops_json['meta']
pp.pprint(meta)

{'view': {'attribution': 'Berkeley Police Department',
          'averageRating': 0,
          'category': 'Public Safety',
          'columns': [{'dataTypeName': 'meta_data',
                       'fieldName': ':sid',
                       'flags': ['hidden'],
                       'format': {},
                       'id': -1,
                       'name': 'sid',
                       'position': 0,
                       'renderTypeName': 'meta_data'},
                      {'dataTypeName': 'meta_data',
                       'fieldName': ':id',
                       'flags': ['hidden'],
                       'format': {},
                       'id': -1,
                       'name': 'id',
                       'position': 0,
                       'renderTypeName': 'meta_data'},
                      {'dataTypeName': 'meta_data',
                       'fieldName': ':position',
                       'flags': ['hidden'],
                       'format': {},
              

                                                  {'count': 19,
                                                   'item': '37.8658096520001'},
                                                  {'count': 18,
                                                   'item': '37.8672009970001'},
                                                  {'count': 17,
                                                   'item': '37.8690965960001'},
                                                  {'count': 16,
                                                   'item': '37.8752049780001'},
                                                  {'count': 15,
                                                   'item': '37.8808345250001'},
                                                  {'count': 14,
                                                   'item': '37.8785003840001'},
                                                  {'count': 13,
                                                   'item': '37.858532199

The `meta` key contains another dictionary called `view`. Let's explore what is in the view dictionary? 

In [18]:
# getting the keys
stops_json['meta']['view'].keys()

dict_keys(['id', 'name', 'attribution', 'averageRating', 'category', 'createdAt', 'description', 'displayType', 'downloadCount', 'hideFromCatalog', 'hideFromDataJson', 'indexUpdatedAt', 'newBackend', 'numberOfComments', 'oid', 'provenance', 'publicationAppendEnabled', 'publicationDate', 'publicationGroup', 'publicationStage', 'rowsUpdatedAt', 'rowsUpdatedBy', 'tableId', 'totalTimesRated', 'viewCount', 'viewLastModified', 'viewType', 'columns', 'grants', 'metadata', 'owner', 'query', 'rights', 'tableAuthor', 'tags', 'flags'])

Notice that this a nested/recursive data structure.  As we dig deeper we reveal more and more keys and the corresponding data:

```
meta
|-> data
    | ... 
|-> view
    | -> id
    | -> name
    | -> attribution
    ...
```

There is a key called description in the view sub dictionary.  This likely contains a description of the data:

In [19]:
print(stops_json['meta']['view']['description'])

This data was extracted from the Department’s Public Safety Server and covers the data beginning January 26, 2015.  On January 26, 2015 the department began collecting data pursuant to General Order B-4 (issued December 31, 2014).  Under that order, officers were required to provide certain data after making all vehicle detentions (including bicycles) and pedestrian detentions (up to five persons).  This data set lists stops by police in the categories of traffic, suspicious vehicle, pedestrian and bicycle stops.  Incident number, date and time, location and disposition codes are also listed in this data.

Address data has been changed from a specific address, where applicable, and listed as the block where the incident occurred.  Disposition codes were entered by officers who made the stop.  These codes included the person(s) race, gender, age (range), reason for the stop, enforcement action taken, and whether or not a search was conducted.

The officers of the Berkeley Police Dep

### Columns Meta data

Another potentially useful key in the meta data dictionary is the `columns`.  This returns a list:

In [20]:
type(stops_json['meta']['view']['columns'])

list

In [22]:
#We can examine the list :
for c in stops_json['meta']['view']['columns']:
    print(c["name"])


sid
id
position
created_at
created_meta
updated_at
updated_meta
meta
Incident Number
Call Date/Time
Location
Incident Type
Dispositions
Location - Latitude
Location - Longitude


### Examining the Data Field

In [23]:
#looking at a few entires in the data field
stops_json['data'][0:2]

[[1,
  '29A1B912-A0A9-4431-ADC9-FB375809C32E',
  1,
  1444146408,
  '932858',
  1444146408,
  '932858',
  None,
  '2015-00004825',
  '2015-01-26T00:10:00',
  'SAN PABLO AVE / MARIN AVE',
  'T',
  'M',
  None,
  None],
 [2,
  '1644D161-1113-4C4F-BB2E-BF780E7AE73E',
  2,
  1444146408,
  '932858',
  1444146408,
  '932858',
  None,
  '2015-00004829',
  '2015-01-26T00:50:00',
  'SAN PABLO AVE / CHANNING WAY',
  'T',
  'M',
  None,
  None]]

## Building a Dataframe from JSON

In [24]:
# Load the data from JSON and assign column titles
stops = pd.DataFrame(
    stops_json['data'],
    columns=[c['name'] for c in stops_json['meta']['view']['columns']])

# Remove columns that are missing descriptions
bad_cols = [c['name'] for c in stops_json['meta']['view']['columns'] if "description" not in c]

stops.drop(bad_cols, axis=1, inplace=True)
stops.head()

Unnamed: 0,Incident Number,Call Date/Time,Location,Incident Type,Dispositions,Location - Latitude,Location - Longitude
0,2015-00004825,2015-01-26T00:10:00,SAN PABLO AVE / MARIN AVE,T,M,,
1,2015-00004829,2015-01-26T00:50:00,SAN PABLO AVE / CHANNING WAY,T,M,,
2,2015-00004831,2015-01-26T01:03:00,UNIVERSITY AVE / NINTH ST,T,M,,
3,2015-00004848,2015-01-26T07:16:00,2000 BLOCK BERKELEY WAY,1194,BM4ICN,,
4,2015-00004849,2015-01-26T07:43:00,1700 BLOCK SAN PABLO AVE,1194,BM4ICN,,


### Footnotes: 
(1) Downloading the STOP data with API

In [25]:
endpoint_url = "https://data.cityofberkeley.info/resource/4p7k-drdw.json"
response = requests.get(endpoint_url)
result = response.text

# print the first 100 characters to see a sample of the data
print(result[:100])

# parse the string into a Python dictionary (loads = "load string")
stops = json.loads(result)
print(type(stops))

[{"call_date_time":"2017-08-07T07:18:25.000","dispositions":"M; ","incident_type":"T","location":" S
<class 'list'>
