## Understanding JSON structured data

Using APIs on the web typically involves dealing with JSON formatted results, so let's start with that. 

### What is JSON?

* **JSON**: JavaScript Object Notation is a very standard semi-structured file format used to store nested data.

```javascript
{
    "field1": "value1",
    "field2": ["list", "of", "values"],
    "myfield3": {"is_recursive": true, "a null value": null}
}
```

A few key points:
* JSON is a recursive or hierarchical format in that JSON fields can also contain JSON objects
* JSON closely matches Python Dictionaries:
```python
d = {
    "field1": "value1",
    "field2": ["list", "of", "values"],
    "myfield3": {"is_recursive": True, "a null value": None}
}
print(d['myfield3'])
```



## Introduction to Accessing Data via APIs and Processing JSON Formatted Results

Let's get some street tree data from the San Francisco Open Data Portal and use it to practice with APIs.

In [45]:
%matplotlib inline

import pandas as pd

import json      # library for working with JSON-formatted text strings
import requests  # library for accessing content from web URLs

import pprint    # library for cleanly printing Python data structures
pp = pprint.PrettyPrinter()

First get familiar with the API endpoint from the portal documentation:
https://data.sfgov.org/City-Infrastructure/Tree-Caretakers-in-San-Francisco/frdy-nem6

Under Export / SODA (Socrata Open Data API) we can see the API endpoint url, and the columns available.

In [8]:
# download data
endpoint_url = "https://data.sfgov.org/resource/frdy-nem6.json"
response = requests.get(endpoint_url)
results = response.text

In [12]:
# print the first 500 characters to see a sample of the data
print(results[:500])

print(type(results))

[ {
  "siteorder" : "1",
  "xcoord" : "6016267.25355",
  "location" : {
    "latitude" : "37.7363616200932",
    "human_address" : "{\"address\": \"\", \"city\": \"\", \"state\": \"\", \"zip\": \"\"}",
    "needs_recoding" : false,
    "longitude" : "-122.38620200123"
  },
  "qlegalstatus" : "DPW Maintained",
  "ycoord" : "2096084.36716",
  "planttype" : "Tree",
  "dbh" : "16",
  "qaddress" : "9 Young Ct",
  "latitude" : "37.7363616200932",
  "qcaretaker" : "Private",
  "qsiteinfo" : "Sidewalk: 
<class 'str'>


In [21]:
# parse the string into a Python dictionary (loads = "load string")
data = json.loads(results)
print(data[:3])
print(type(data))


[{'siteorder': '1', 'xcoord': '6016267.25355', 'location': {'latitude': '37.7363616200932', 'human_address': '{"address": "", "city": "", "state": "", "zip": ""}', 'needs_recoding': False, 'longitude': '-122.38620200123'}, 'qlegalstatus': 'DPW Maintained', 'ycoord': '2096084.36716', 'planttype': 'Tree', 'dbh': '16', 'qaddress': '9 Young Ct', 'latitude': '37.7363616200932', 'qcaretaker': 'Private', 'qsiteinfo': 'Sidewalk: Curb side : Cutout', 'longitude': '-122.38620200123', 'qspecies': 'Pyrus calleryana :: Ornamental Pear', 'plotsize': 'Width 3ft', 'treeid': '196949'}, {'siteorder': '1', 'xcoord': '5993354.86667', 'location': {'latitude': '37.738391538344', 'human_address': '{"address": "", "city": "", "state": "", "zip": ""}', 'needs_recoding': False, 'longitude': '-122.465506999949'}, 'qlegalstatus': 'DPW Maintained', 'ycoord': '2097295.22775', 'planttype': 'Tree', 'dbh': '2', 'qaddress': '9 Yerba Buena Ave', 'latitude': '37.738391538344', 'qcaretaker': 'Private', 'qsiteinfo': 'Sidew

Note that using print to see the first three items in the results list is very hard to read since it has no structured formatting.  To solve this, you can use Pretty Print:

In [23]:
pp.pprint(data[:3])

[{'dbh': '16',
  'latitude': '37.7363616200932',
  'location': {'human_address': '{"address": "", "city": "", "state": "", '
                                '"zip": ""}',
               'latitude': '37.7363616200932',
               'longitude': '-122.38620200123',
               'needs_recoding': False},
  'longitude': '-122.38620200123',
  'planttype': 'Tree',
  'plotsize': 'Width 3ft',
  'qaddress': '9 Young Ct',
  'qcaretaker': 'Private',
  'qlegalstatus': 'DPW Maintained',
  'qsiteinfo': 'Sidewalk: Curb side : Cutout',
  'qspecies': 'Pyrus calleryana :: Ornamental Pear',
  'siteorder': '1',
  'treeid': '196949',
  'xcoord': '6016267.25355',
  'ycoord': '2096084.36716'},
 {'dbh': '2',
  'latitude': '37.738391538344',
  'location': {'human_address': '{"address": "", "city": "", "state": "", '
                                '"zip": ""}',
               'latitude': '37.738391538344',
               'longitude': '-122.465506999949',
               'needs_recoding': False},
  'longitud

Now that we can see the structure of the data, let's explore how we can get data out of this into a Pandas dataframe to work with it more easily.

In [25]:
# One general (but tedious) way to do it:

dictionary = {'permitnotes': [d['permitnotes'] for d in data if "permitnotes" in d],
              'qspecies': [d['qspecies'] for d in data if "permitnotes" in d],
              'treeid': [d['treeid'] for d in data if "permitnotes" in d],
             'planttype': [d['planttype'] for d in data if "permitnotes" in d],
             'qcaretaker': [d['qcaretaker'] for d in data if "permitnotes" in d]}



df = pd.DataFrame.from_dict(dictionary)
df.head()

Unnamed: 0,permitnotes,qspecies,treeid,planttype,qcaretaker
0,Permit Number 776557,Acer rubrum :: Red Maple,115737,Tree,Private
1,Permit Number 769765,Prunus serrulata 'Kwanzan' :: Kwanzan Flowerin...,102172,Tree,Private
2,Permit Number 769765,Prunus serrulata 'Kwanzan' :: Kwanzan Flowerin...,102171,Tree,Private
3,Permit Number 50252,Tristaniopsis laurina :: Swamp Myrtle,82730,Tree,Private
4,Permit Number 51916,Prunus cerasifera :: Cherry Plum,89325,Tree,Private


In this particular case, the JSON data happens to be a list of simple dictionaries. That enables us to use a much simpler approach to convert it to a dataframe:

In [26]:
# Converting list of dicts to Pandas dataframe
df2=pd.DataFrame.from_records(data)
df2.head()

Unnamed: 0,siteorder,xcoord,location,qlegalstatus,ycoord,planttype,dbh,qaddress,latitude,qcaretaker,qsiteinfo,longitude,qspecies,plotsize,treeid,plantdate,permitnotes,qcareassistant
0,1,6016267.25355,"{'latitude': '37.7363616200932', 'human_addres...",DPW Maintained,2096084.36716,Tree,16,9 Young Ct,37.7363616200932,Private,Sidewalk: Curb side : Cutout,-122.38620200123,Pyrus calleryana :: Ornamental Pear,Width 3ft,196949,,,
1,1,5993354.86667,"{'latitude': '37.738391538344', 'human_address...",DPW Maintained,2097295.22775,Tree,2,9 Yerba Buena Ave,37.738391538344,Private,Sidewalk: Curb side : Yard,-122.465506999949,Acer rubrum :: Red Maple,Width 4ft,203422,,,
2,1,5993642.27748,"{'latitude': '37.7377517864641', 'human_addres...",Significant Tree,2097056.19499,Tree,3,9x Yerba Buena Ave,37.7377517864641,Private,Sidewalk: Curb side : Cutout,-122.46449593033,Acer rubrum :: Red Maple,,115737,2016-02-24T00:00:00.000,Permit Number 776557,
3,7,6018697.2048701,"{'latitude': '37.7392189485182', 'human_addres...",DPW Maintained,2097076.1109207,Tree,12,9X Newhall St,37.7392189485182,DPW,Sidewalk: Curb side : Cutout,-122.377869364283,Eucalyptus nicholii :: Nichol's Willow-Leafed ...,3X3,16473,,,
4,10,6018697.2048701,"{'latitude': '37.7392189485182', 'human_addres...",Permitted Site,2097076.1109207,Tree,12,9X Newhall St,37.7392189485182,Private,Sidewalk: Curb side : Cutout,-122.377869364283,Eucalyptus nicholii :: Nichol's Willow-Leafed ...,3X3,16476,,,


### Police Stops in San Francisco

Let's examine a second dataset from the San Francisco Open Data Portal for practice.  Police Stops.

Lets get a bit of practice on this.  Go to the City Open Data Portal and get the url for a JSON request for the Police Stops dataset.  Here is a shortcut to the dataset: https://data.sfgov.org/Public-Safety/Police-Department-Calls-for-Service/hz9m-tj6z

Once you have it, review the notebook above and load this data, find the columns it contains, and convert it to a Pandas Dataframe.  I'll review this in the next video.