## Starting Off

`

In [2]:
data = [ {}, 
         {'class': 'DS033020',
         'start_date': '03-30-20',
         'instructors': [{'name':'Sean Abu Wilson',
                     'dob' : '02-06',
                    'preferred_name' :'SeanAbu'},
                    {'name': 'Yish Lim',
                     'dob' : '07-22',
                    'preferred_name': 'Yish'},
                    {'name': 'Matthew Wasserman',
                    'dob' : '08-26',
                    'preferred_name': 'Matt'}]},
         {}
       ]

1. What type of data structure is `data`?
2. How many elements are in `data`? (i.e. what is the length of `data`)
3. Write out the code to access the value `'07-22'`.
4. Write code that will compile all of the instructors preferred name into a list. 

In [3]:
# your code here

# Using APIs and JSON Data

## Objectives
You will be able to:
* Access and manipulate data inside a JSON file
* Pull data from an API and parse/transform the data

## Agenda

* Review JSON Schemas
* Introduce APIs.
* Walk through how to make an API request. 
* Read the documentation of Yelp's API


### What is the difference between a JSON and a python dictionary?

## Loading the JSON file

As before, we begin by importing the json package, opening a file with python's built in function, and then loading that data in.

In [None]:
import json
f = open('data.json')
data = json.load(f)

## Exploring JSON Schemas  

Recall that JSON files have a nested structure. The most granular level of raw data will be individual numbers (float/int) and strings. These in turn will be stored in the equivalent of python lists and dictionaries. Because these can be combined, we'll start exploring by checking the type of our root object, and start mapping out the hierarchy of the json file.

In [2]:
type(data)

dict

In [3]:
data['businesses'][0]

{'id': 'hYWsMDz0ms7TOnFTcsxYcw',
 'alias': 'manolis-ice-cream-pastries-and-cakes-austin',
 'name': 'Manolis Ice Cream, Pastries, & Cakes',
 'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/aAmOnvITNNjs04XnKFvSGw/o.jpg',
 'is_closed': False,
 'url': 'https://www.yelp.com/biz/manolis-ice-cream-pastries-and-cakes-austin?adjust_creative=F3qypMKtTmS20ke5XNx-Sg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=F3qypMKtTmS20ke5XNx-Sg',
 'review_count': 156,
 'categories': [{'alias': 'desserts', 'title': 'Desserts'},
  {'alias': 'icecream', 'title': 'Ice Cream & Frozen Yogurt'},
  {'alias': 'bakeries', 'title': 'Bakeries'}],
 'rating': 5.0,
 'coordinates': {'latitude': 30.24441, 'longitude': -97.75808},
 'transactions': ['pickup'],
 'price': '$',
 'location': {'address1': '603 W Live Oak St',
  'address2': None,
  'address3': '',
  'city': 'Austin',
  'zip_code': '78704',
  'country': 'US',
  'state': 'TX',
  'display_address': ['603 W Live Oak St', 'Austin, TX 78704']

As you can see, in this case, the first level of the hierarchy is a dictionary. Let's explore what keys are within this:

In [4]:
data.keys()

dict_keys(['businesses', 'total', 'region'])

In this case, there is only a single key, 'albums', so we'll continue on down the pathway exploring and mapping out the hierarchy. Once again, let's start by checking the type of this nested data structure.

In [5]:
type(data['businesses'])

list

Now we have a list of within a dictionary. Once again, let's investigate what's within this list (JSON calls the equivalent of Python dictionaries Objects.)

In [6]:
data['businesses'][0]

{'id': 'hYWsMDz0ms7TOnFTcsxYcw',
 'alias': 'manolis-ice-cream-pastries-and-cakes-austin',
 'name': 'Manolis Ice Cream, Pastries, & Cakes',
 'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/aAmOnvITNNjs04XnKFvSGw/o.jpg',
 'is_closed': False,
 'url': 'https://www.yelp.com/biz/manolis-ice-cream-pastries-and-cakes-austin?adjust_creative=F3qypMKtTmS20ke5XNx-Sg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=F3qypMKtTmS20ke5XNx-Sg',
 'review_count': 156,
 'categories': [{'alias': 'desserts', 'title': 'Desserts'},
  {'alias': 'icecream', 'title': 'Ice Cream & Frozen Yogurt'},
  {'alias': 'bakeries', 'title': 'Bakeries'}],
 'rating': 5.0,
 'coordinates': {'latitude': 30.24441, 'longitude': -97.75808},
 'transactions': ['pickup'],
 'price': '$',
 'location': {'address1': '603 W Live Oak St',
  'address2': None,
  'address3': '',
  'city': 'Austin',
  'zip_code': '78704',
  'country': 'US',
  'state': 'TX',
  'display_address': ['603 W Live Oak St', 'Austin, TX 78704']

In [7]:
data['businesses'][0].keys()

dict_keys(['id', 'alias', 'name', 'image_url', 'is_closed', 'url', 'review_count', 'categories', 'rating', 'coordinates', 'transactions', 'price', 'location', 'phone', 'display_phone', 'distance'])

At this point, you also probably have a good idea of the general structure of the json file. However, there is still a lot more we could investigate further:

## What is an API?

**Application Program Interfaces**, or APIs, are commonly used to retrieve data from remote websites. Sites like Reddit, Twitter, and Facebook all offer certain data through their APIs. 

To use an API, you make a request to a remote web server, and retrieve the data you need.

Python with two built-in modules, `urllib` and `urllib2` to handle these requests but these could be very confusing  and the documentation is not clear.

To make these things simpler, one easy-to-use third-party library, known as` Requests`, is what most developers prefer to use it instead or urllib/urllib2. With this library, you can access content like web page headers, form data, files, and parameters via simple Python commands. It also allows you to access the response data in a simple way.

![](logo.png)

Below is how you would install and import the requests library before making any requests. 
```python
# Uncomment and install requests if you dont have it already
# !pip install requests

# Import requests to working environment
import requests
```

In [8]:
import requests



## The `.get()` Method

Now we have requests library ready in our working environment, we can start making some requests using the `.get()` method as shown below:


We can use a simple GET request to retrieve information from the OpenNotify API.




OpenNotify has several API endpoints. An endpoint is a server route that is used to retrieve different data from the API. For example, the /comments endpoint on the Reddit API might retrieve information about comments, whereas the /users endpoint might retrieve data about users. To access them, you would add the endpoint to the base url of the API.



In [9]:
# Make a get request to get the latest position of the international space station from the opennotify api.
response = requests.get("http://api.open-notify.org/iss-now.json")
# Print the status code of the response.
print(response.status_code)

200


In [10]:
response.

<Response [200]>


GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. 
GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. 

## Status Codes
The request we make may not be always successful. The best way is to check the status code which gets returned with the response. Here is how you would do this. 


In [None]:
# Code here 
response.status_code == requests.codes.ok

So this is a good check to see if our request was successful. Depending on the status of the web server, the access rights of the clients and availibility of requested information. A web server may return a number of status codes within the response. Wikipedia has an exhaustive details on all these codes. [Check them out here](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

### Common status codes

* 200 — everything went okay, and the result has been returned (if any)
* 301 — the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
* 401 — the server thinks you’re not authenticated. This happens when you don’t send the right credentials to access an API (we’ll talk about authentication in a later post).
* 400 — the server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
* 403 — the resource you’re trying to access is forbidden — you don’t have the right permissions to see it.
* 404 — the resource you tried to access wasn’t found on the server.

In [11]:
request2 = requests.get('http://api.open-notify.org/fake-endpoint')
print(request2.status_code)

404


### Hitting the right endpoint
iss-pass wasn’t a valid endpoint, so we got a 404 status code in response. We forgot to add `.json` at the end, as the API documentation states.

We’ll now make a GET request to http://api.open-notify.org/iss-pass.json.


## Response Contents
Once we know that our request was successful and we have a valid response, we can check the returned information using `.text` property of the response object. 
```python
print (resp.text)
```

In [12]:
people = requests.get('http://api.open-notify.org/astros.json')
print(people.text)

{"people": [{"name": "Christina Koch", "craft": "ISS"}, {"name": "Alexander Skvortsov", "craft": "ISS"}, {"name": "Luca Parmitano", "craft": "ISS"}, {"name": "Andrew Morgan", "craft": "ISS"}, {"name": "Oleg Skripochka", "craft": "ISS"}, {"name": "Jessica Meir", "craft": "ISS"}], "number": 6, "message": "success"}


### Query parameters

If you look at the documentation for the OpenNotify API, we see that the ISS Pass endpoint requires two parameters.

We can do this by adding an optional keyword argument, params, to our request. In this case, there are two parameters we need to pass:

* lat — The latitude of the location we want.
* lon — The longitude of the location we want.
We can make a dictionary with these parameters, and then pass them into the requests.get function.

We can also do the same thing directly by adding the query parameters to the url, like this: http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74.

It’s almost always preferable to setup the parameters as a dictionary, because requests takes care of some things that come up, like properly formatting the query parameters.

We’ll make a request using the coordinates of New York City, and see what response we get.

In [13]:
# Set up the parameters we want to pass to the API.
# This is the latitude and longitude of New York City.
parameters = {"lat": 40.71, "lon": -74}


# Make a get request with the parameters.
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)


# Print the content of the response (the data the server returned)
print(response.content)


b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1571930507, \n    "latitude": 40.71, \n    "longitude": -74.0, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 618, \n      "risetime": 1571932240\n    }, \n    {\n      "duration": 642, \n      "risetime": 1571938042\n    }, \n    {\n      "duration": 396, \n      "risetime": 1571943927\n    }, \n    {\n      "duration": 425, \n      "risetime": 1571992509\n    }, \n    {\n      "duration": 649, \n      "risetime": 1571998160\n    }\n  ]\n}\n'


In [14]:
type(response.content)

bytes

Now we want to convert the response into a python data structure. We do this by using the `json` package.

In [15]:
import json

In [16]:
json.loads(response.content)

{'message': 'success',
 'request': {'altitude': 100,
  'datetime': 1571930507,
  'latitude': 40.71,
  'longitude': -74.0,
  'passes': 5},
 'response': [{'duration': 618, 'risetime': 1571932240},
  {'duration': 642, 'risetime': 1571938042},
  {'duration': 396, 'risetime': 1571943927},
  {'duration': 425, 'risetime': 1571992509},
  {'duration': 649, 'risetime': 1571998160}]}

In [17]:
results = json.loads(response.content)

In [18]:
results['response'][0]

{'duration': 618, 'risetime': 1571932240}

So this returns a lot of information which by default is not really human understandable due to data encoding, HTML tags and other styling information that only a web browser can truly translate. In later lessons we shall look at how we can use ** Regular Exprerssions**  to clean this information and extract the required bits and pieces for analysis. 

## Response Headers
The response of an HTTP request can contain many headers that holds different bits of information. We can use `.header` property of the response object to access the header information as shown below:


In [19]:
# Code here 
dict(response.headers)

{'Server': 'nginx/1.10.3',
 'Date': 'Thu, 24 Oct 2019 15:23:45 GMT',
 'Content-Type': 'application/json',
 'Content-Length': '519',
 'Connection': 'keep-alive',
 'Via': '1.1 vegur'}

The content of the headers is our required element. You can see the key-value pairs holding various pieces of  information about the resource and request. Let's try to parse some of these values using the requests library:

```python
print(resp.headers['Content-Length'])  # length of the response
print(resp.headers['Date'])  # Date the response was sent
print(resp.headers['server'])   # Server type (google web service - GWS)
```


## HTTP POST method 

Sometimes we need to send one or more files simultaneously to the server. For example, if a user is submitting a form and the form includes different fields for uploading files, like user profile picture, user resume, etc. Requests can handle multiple files on a single request. This can be achieved by putting the files to a list of tuples in the form (`field_name, file_info)`.


```python
import requests

url = 'http://httpbin.org/post'  
file_list = [  
    ('image', ('fi.png', open('fi.png', 'rb'), 'image/png')),
    ('image', ('fi2.jpeg', open('fi2.jpeg', 'rb'), 'image/png'))
]

r = requests.post(url, files=file_list)  
print(r.text)  
```

![](quota.png)


## Generating Access Tokens

As discussed, in order to use many APIs, one needs to use OAuth which requires an access token. As such, our first step will be to generate this login information so that we can start making some requests.  

With that, lets go grab an access token from an API site and make some API calls!
Point your browser over to this [yelp page](https://www.yelp.com/developers/v3/manage_app) and start creating an app in order to obtain and api access token:


![](./images/yelp_app.png)

You can either sign in to an existing Yelp account, or create a new one, if needed.

On the page you see above, simply fill out some sample information such as "Flatiron Edu API Example" for the app name, or whatever floats your boat. Afterwards, you should be presented with an API key that you can use to make requests!

With that, it's time to start making some api calls!

As a general rule of thumb, don't store passwords in a main file like this! Instead, you would normally store those passwords under a sub file like passwords.py which you would then import.

Or even better, as an environment variable that could then be imported!


In [20]:

client_id = 'bVX1Jsfp4dkIOqw5HOVplg' #Your client ID goes here (as a string)
api_key = 'RTzp-q-TgkJW_NFQogubFvZNRDziXyoR38VbtZMWibDI-FlvB25OE7GmafFEqhTL8_Bk2HlcX24-hRWLMP7Nc6WHO_VXMXldpPBjP0LoPv5EFFELMSI2oll8njhbXHYx' #Your api key goes here (as a string)

## An Example Request with OAuth <a id="oauth_request"></a>
https://www.yelp.com/developers/documentation/v3/get_started

Look at the documentation in the link above and answer the following questions:

- How many api endpoints does the API offer?
- If you are searching for businesses, what are the required arguments you need to pass to the API?
- What is limit on the number of results an API request will return to you?
- When trying to pull reviews from the API what inromation do you need to pass in your API request?

In [21]:
term = 'Mexican'
location = 'Astoria NY'
SEARCH_LIMIT = 10

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT
            }
response = requests.get(url, headers=headers, params=url_params)
print(response)
print(type(response.text))
print(response.text[:1000])

<Response [200]>
<class 'str'>
{"businesses": [{"id": "jeWIYbgBho9vBDhc5S1xvg", "alias": "chanos-cantina-astoria", "name": "Chano's Cantina", "image_url": "https://s3-media1.fl.yelpcdn.com/bphoto/B34FXjfQrAxMkWUpb3Pv5A/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/chanos-cantina-astoria?adjust_creative=bVX1Jsfp4dkIOqw5HOVplg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=bVX1Jsfp4dkIOqw5HOVplg", "review_count": 162, "categories": [{"alias": "cocktailbars", "title": "Cocktail Bars"}, {"alias": "newmexican", "title": "New Mexican Cuisine"}, {"alias": "beerbar", "title": "Beer Bar"}], "rating": 4.0, "coordinates": {"latitude": 40.756621, "longitude": -73.929336}, "transactions": ["pickup", "delivery", "restaurant_reservation"], "price": "$$", "location": {"address1": "35-55 31st", "address2": "", "address3": "", "city": "Astoria", "zip_code": "11106", "country": "US", "state": "NY", "display_address": ["35-55 31st", "Astoria, NY 11106"]}, "phone": "+1

## Breaking Down the Request

As you can see, there are three main parts to our request.  
  
They are:
* The url
* The header
* The parameters
  
The url is fairly straightforward and is simply the base url as described in the documentation (again more details in the upcoming lesson).

The header is a dictionary of key-value pairs. In this case, we are using a fairly standard header used by many APIs. It has a strict form where 'Authorization' is the key and 'Bearer YourApiKey' is the value.

The parameters are the filters which we wish to pass into the query. These will be embedded into the url when the request is made to the api. Similar to the header, they form key-value pairs. Valid key parameters by which to structure your queries, are described in the API documentation which we'll look at further shortly. A final important note however, is the need to replace spaces with "+". This is standard to many requests as URLs cannot contain spaces. (Note that the header itself isn't directly embedded into the url itself and as such, the space between 'Bearer' and YourApiKey is valid.)


## The Response

As before, our response object has both a status code, as well as the data itself. With that, let's start with a little data exploration!

In [22]:
response.json().keys()


dict_keys(['businesses', 'total', 'region'])

Now let's go a bit further and start to preview what's stored in each of the values for these keys.


In [23]:
for key in response.json().keys():
    print(key)
    value = response.json()[key] #Use standard dictionary formatting
    print(type(value)) #What type is it?
    print('\n\n') #Seperate out data

businesses
<class 'list'>



total
<class 'int'>



region
<class 'dict'>





Let's continue to preview these further to get a little better acquainted.


In [24]:
yelp_data =response.json()
yelp_data['businesses'][:2]



[{'id': 'jeWIYbgBho9vBDhc5S1xvg',
  'alias': 'chanos-cantina-astoria',
  'name': "Chano's Cantina",
  'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/B34FXjfQrAxMkWUpb3Pv5A/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/chanos-cantina-astoria?adjust_creative=bVX1Jsfp4dkIOqw5HOVplg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=bVX1Jsfp4dkIOqw5HOVplg',
  'review_count': 162,
  'categories': [{'alias': 'cocktailbars', 'title': 'Cocktail Bars'},
   {'alias': 'newmexican', 'title': 'New Mexican Cuisine'},
   {'alias': 'beerbar', 'title': 'Beer Bar'}],
  'rating': 4.0,
  'coordinates': {'latitude': 40.756621, 'longitude': -73.929336},
  'transactions': ['pickup', 'delivery', 'restaurant_reservation'],
  'price': '$$',
  'location': {'address1': '35-55 31st',
   'address2': '',
   'address3': '',
   'city': 'Astoria',
   'zip_code': '11106',
   'country': 'US',
   'state': 'NY',
   'display_address': ['35-55 31st', 'Astoria, NY 11106']},
  'phon

As you can see, we're primarily interested in the 'bussinesses' entry. 


## Summary <a id="sum"></a>

Congratulations! We've covered a lot here! We took some of your previous knowledge with HTTP requests and OAuth in order to leverage an enterprise API! Then we made some requests to retrieve information that came back as a json format. We then transformed this data into a dataframe using the Pandas package. In the next lab, we'll break down how to read API documentation and then put it all together to make a nifty map!