# Section 09: JSON & APIs

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:

<img src="images/schema_detailed.png" width="500">


You can see that the master structure is a dictionary and has a key named 'response'. This is also a dictionary and has two keys: 'docs' and 'meta'. As you continue to examine the schema hierarchy, you'll notice the vast majority, in this case, are dictionaries. 

## Loading the Data File

As you saw before, let's start by importing this data from the file. Here's how to open the file and load its contents.

In [None]:
import json

In [None]:
f = open('ny_times_response.json', 'r')
data = json.load(f)

In [None]:
print(type(data))
print(data.keys())

You should see that there are two additional keys 'status' and 'copyright' which were not shown in the schema documentation.

## Loading Specific Data

Looking at the schema, you might be interested in retrieving a specific piece of data, such as the articles' headlines. Notice that this is a key under **'docs'**, which is under 'response'. So the schema is roughly: **data['response']['docs']['headline']**. While this is close to the code you'll use to extract headlines, something is a bit off. Notice that if you look closely at the schema outline, that the 'docs' subheading is actually a list. Each item within this list should be a dictionary with the keys shown above, but that is an important distinction. Breaking it into two steps you have:

In [None]:
data['response']['docs'][0].keys()

In [None]:
docs = data['response']['docs']
print(type(docs), len(docs))

In [None]:
for doc in docs:
    print(doc['headline'])

Or if you want to just print the main headlines themselves:

In [None]:
for doc in docs:
    print(doc['headline']['main'])
    print('\n')

## Transforming JSON to Alternative Formats

You've also previously started to take a look at how to transform JSON to DataFrames. Investigating the schema, a good option for this could again be the 'docs' subheading. While this still has nested data itself, it's often easier to load the entire section as a dataframe and then use additional functions to break apart the internally nested data from there.

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(data['response']['docs'])
df

In [None]:
df.headline[0]

## Breaking out nested data

Now that you have the data loaded, it's time to clean it up by breaking out some of the nested data. For example, you should notice that the headline entries are actualy dictionaries. You could transform these into singular data columns with something like this:

In [None]:
keys = df.headline.iloc[0].keys() #Get dictionary keys
#Keep track of columns we make for subsequent preview
new_cols = []
#Create a new feature for each of these keys
for key in keys:
    new_col = 'headline_{}'.format(key) #Create new column name
    df[new_col] = df.headline.map(lambda x: x[key]) #Create a new column
    new_cols.append(new_col)
df[new_cols].head()

Wahoo! This is a good general strategy for transforming nested JSON: create a DataFrame and then break out nested features into their own column features.

## Outputing to JSON

Finally, take a look at how you can write data back to JSON. Like loading, you first open a file (this time with write permission) and use the json package to transfer data to that file container.

In [None]:
with open('output.json', 'w') as f:
    json.dump(data, f)

## Summary
There you have it! In this, you took another look at JSON, taking a look at an example schema diagram and retrieving information. You also looked at a general procedure for transforming nested data to Pandas DataFrames (create a DataFrame, and then break apart nested data using lambda functions to create additional columns). Finally, you also took a brief look at saving data to json files. 

# Introduction to APIs

## Introduction 

**_APIs_** (short for **_Application Programming Interfaces_**) are an important aspect of the modern internet. APIs are what allows everything on the internet to play nicely with each other and work together.

### What is an API made of?

APIs are very common in the tech world, which means that are many, many different kinds that you're going to run into. While each API you work with will be unique in some way, there are some common traits you can expect to see overall. An API has three main components as listed below:

* **Access Permissions:** Is the user allowed to ask for data or services?
* **Request:** The service being asked for (e.g., if I give you current location using GPS, tell me the map around that place - as we see in Pokemon Go).  A Request has two main parts:

    * **Methods:** Once the access is permitted, what questions can be asked.
    
    * **Parameters:** Additional details that can be sent with requests or responses

* **Response:** The data or service as a result of the request.

We'll look more deeply at how to use these components in the upcoming lessons for this section. For now, our goal is to understand that APIs:

* Provide a standardized way of letting us interact with 3rd party software/services
* Consist of a **_Request_** and a **_Response_**
* Can have special **_Access Permissions_** depending on the API and the user making the request. 


# The New York Times API

NYT has [several different APIs](https://developer.nytimes.com/apis) for various data, let's look at the Movie Reviews API.

<img src="images/nytimes_movie_schema_detailed.png" width=500>

More about the documentation [here](https://developer.nytimes.com/docs/movie-reviews-api/1/overview).

## Getting Data from APIs: Requests

`requests` is a third-party library that allows you to send HTTP requests using Python. With this library, you can access content like web page headers, form data, files, and parameters via simple Python commands. It also allows you to access the response data in a simple way.


In [None]:
import requests

In [None]:
response = requests.get('https://api.nytimes.com/svc/movies/v2/reviews/search.json?query=godfather&api-key=')

In [None]:
response.text

## Response codes: 

https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

What is a 401?

## API Keys!!

https://developer.nytimes.com/get-started



In [None]:
from config2 import yish_key, yish_secret

In [None]:
response = requests.get('https://api.nytimes.com/svc/movies/v2/reviews/search.json?query=godfather&api-key={}'.format(yish_key))

In [None]:
response.json()

### Should I publicly share my passwords on Github?

When using an API that requires an API key and password you should **NEVER** hardcode theses values into your main file. When you upload your project onto github it is completely public and vulnerable to attack. Assume that if you put sensitive information publicly on the internet it will be found and abused. 

To this end, how can we easily access our API key without opening ourselves up to vulnerabilities?

There are many ways to store sensitive information!

1. Create a `config.py` file to store passwords
2. Create a `.gitignore` on your GitHub repository

## Making API Requests

Getting movie reviews from movies released in 2020.

In [None]:
uri = 'https://api.nytimes.com/svc/movies/v2/reviews/search.json?opening-date=2020-01-01&api-key=' 
# uniform resource identifier

In [None]:
results = requests.get(uri+yish_key).json()
len(results['results'])

In [None]:
results['results'][0].keys()

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(results['results'])

In [None]:
df

## Pagination x Offset

In [None]:
uri2 = 'https://api.nytimes.com/svc/movies/v2/reviews/search.json?opening-date=2020-01-01&offset=20&api-key='

results2 = requests.get(uri2+yish_key).json()
results2


In [None]:
def get2020reviews(n): # where n*20 is the number of reviews we want
    results = []
    for i in range(0, n):
        if i != 0:
            uri = 'https://api.nytimes.com/svc/movies/v2/reviews/search.json?opening-date=2020-01-01&offset={}&api-key='.format(n*20)
        else:
            uri = 'https://api.nytimes.com/svc/movies/v2/reviews/search.json?opening-date=2020-01-01&api-key='
        
        response = requests.get(uri+yish_key).json()
        if response['status'] == 'OK':
            results += response['results']
            
    return pd.DataFrame(results)
        

In [None]:
get2020reviews(4)