# Working with Known JSON Schemas - Lab

## Introduction
In this lab, you'll practice working with JSON files whose schema you know beforehand.

## Objectives

You will be able to:

* Use the `json` module to load and parse JSON documents
* Extract data using predefined JSON schemas
* Convert JSON to a pandas dataframe

## Reading a JSON Schema

Here's the JSON schema provided for a section of the NY Times API:
<img src="images/nytimes_movie_schema.png" width=500>

or a fully expanded view:

<img src="images/nytimes_movie_schema_detailed.png" width=500>

You can more about the documentation [here](https://developer.nytimes.com/docs/movie-reviews-api/1/routes/reviews/%7Btype%7D.json/get).



## Loading the JSON Data

Open the JSON file located at `ny_times_movies.json`, and use the `json` module to load the data into a variable called `data`.

In [57]:
# Your code here
#import json doc using pandas
import pandas as pd

data = pd.read_json('ny_times_movies.json')

data


Unnamed: 0,status,copyright,has_more,num_results,results
0,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'Can You Ever Forgive Me', '..."
1,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'Charm City', 'mpaa_rating':..."
2,OK,Copyright (c) 2018 The New York Times Company....,True,20,{'display_title': 'Horn from the Heart: The Pa...
3,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'The Price of Everything', '..."
4,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'Impulso', 'mpaa_rating': ''..."
5,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'Watergate', 'mpaa_rating': ..."
6,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'Barbara', 'mpaa_rating': ''..."
7,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'Over the Limit', 'mpaa_rati..."
8,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'The Kindergarten Teacher', ..."
9,OK,Copyright (c) 2018 The New York Times Company....,True,20,"{'display_title': 'Classical Period', 'mpaa_ra..."


Run the code below to investigate its contents:

In [25]:
# Run this cell without changes
print("`data` has type", type(data))
print("The keys are", list(data.keys()))

`data` has type <class 'pandas.core.frame.DataFrame'>
The keys are ['status', 'copyright', 'has_more', 'num_results', 'results']


## Loading Results

Create a variable `results` that contains the value associated with the `'results'` key.

In [62]:
# your code here
#variable named results to display values under the results key

results = data['results']

print(results)



0     {'display_title': 'Can You Ever Forgive Me', '...
1     {'display_title': 'Charm City', 'mpaa_rating':...
2     {'display_title': 'Horn from the Heart: The Pa...
3     {'display_title': 'The Price of Everything', '...
4     {'display_title': 'Impulso', 'mpaa_rating': ''...
5     {'display_title': 'Watergate', 'mpaa_rating': ...
6     {'display_title': 'Barbara', 'mpaa_rating': ''...
7     {'display_title': 'Over the Limit', 'mpaa_rati...
8     {'display_title': 'The Kindergarten Teacher', ...
9     {'display_title': 'Classical Period', 'mpaa_ra...
10    {'display_title': 'Bad Times at the El Royale'...
11    {'display_title': 'Beautiful Boy', 'mpaa_ratin...
12    {'display_title': 'The Oath', 'mpaa_rating': '...
13    {'display_title': 'Bikini Moon', 'mpaa_rating'...
14    {'display_title': 'Goosebumps 2: Haunted Hallo...
15    {'display_title': 'The Sentence', 'mpaa_rating...
16    {'display_title': 'All Square', 'mpaa_rating':...
17    {'display_title': 'Sadie', 'mpaa_rating': 

Below we display this variable as a table using pandas:

In [63]:
# Run this cell without changes
import pandas as pd
df = pd.DataFrame(results)
df

Unnamed: 0,results
0,"{'display_title': 'Can You Ever Forgive Me', '..."
1,"{'display_title': 'Charm City', 'mpaa_rating':..."
2,{'display_title': 'Horn from the Heart: The Pa...
3,"{'display_title': 'The Price of Everything', '..."
4,"{'display_title': 'Impulso', 'mpaa_rating': ''..."
5,"{'display_title': 'Watergate', 'mpaa_rating': ..."
6,"{'display_title': 'Barbara', 'mpaa_rating': ''..."
7,"{'display_title': 'Over the Limit', 'mpaa_rati..."
8,"{'display_title': 'The Kindergarten Teacher', ..."
9,"{'display_title': 'Classical Period', 'mpaa_ra..."


## Data Analysis

Now that you have a general sense of the data, answer some questions about it.

### How many results are in the file?

The metadata says this:

In [28]:
# Run this cell without changes
data['num_results']

0     20
1     20
2     20
3     20
4     20
5     20
6     20
7     20
8     20
9     20
10    20
11    20
12    20
13    20
14    20
15    20
16    20
17    20
18    20
19    20
Name: num_results, dtype: int64

Double-check that by looking at `results`. Does it line up?

In [29]:
# Your code here
print(results)

0     {'display_title': 'Can You Ever Forgive Me', '...
1     {'display_title': 'Charm City', 'mpaa_rating':...
2     {'display_title': 'Horn from the Heart: The Pa...
3     {'display_title': 'The Price of Everything', '...
4     {'display_title': 'Impulso', 'mpaa_rating': ''...
5     {'display_title': 'Watergate', 'mpaa_rating': ...
6     {'display_title': 'Barbara', 'mpaa_rating': ''...
7     {'display_title': 'Over the Limit', 'mpaa_rati...
8     {'display_title': 'The Kindergarten Teacher', ...
9     {'display_title': 'Classical Period', 'mpaa_ra...
10    {'display_title': 'Bad Times at the El Royale'...
11    {'display_title': 'Beautiful Boy', 'mpaa_ratin...
12    {'display_title': 'The Oath', 'mpaa_rating': '...
13    {'display_title': 'Bikini Moon', 'mpaa_rating'...
14    {'display_title': 'Goosebumps 2: Haunted Hallo...
15    {'display_title': 'The Sentence', 'mpaa_rating...
16    {'display_title': 'All Square', 'mpaa_rating':...
17    {'display_title': 'Sadie', 'mpaa_rating': 

In [30]:
"""
It lines up as 20 results starting with the index 0 to 19
"""

'\nIt lines up as 20 results starting with the index 0 to 19\n'

### How many unique critics are there?

A critic's name can be identified using the `'byline'` key. Assign your answer to the variable `unique_critics`.

In [69]:
# Your code here


unique_critics = []

unique_critics = results_df['byline'].unique()

print(unique_critics)

len(unique_critics)


['A.O. SCOTT' 'BEN KENIGSBERG' 'GLENN KENNY' 'JEANNETTE CATSOULIS'
 'MANOHLA DARGIS' 'KEN JAWOROWSKI' 'TEO BUGBEE']


7

This code checks your answer.

In [70]:
# Run this cell without changes
assert unique_critics == 7

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

## Flattening Data

Create a list `review_urls` that contains the URL for each review. This can be found using the `'url'` key nested under `'link'`.

In [71]:
# Your code here (create more cells as needed)
#Create empty list titled review_urls
review_urls = []

#code for list comprehension to extract urls nested under 'link'
review_urls = [review['link']['url'] for review in results if 'link' in review and 'url' in review['link']]

print(review_urls)



['http://www.nytimes.com/2018/10/16/movies/can-you-ever-forgive-me-review-melissa-mccarthy.html', 'http://www.nytimes.com/2018/10/16/movies/charm-city-review-baltimore.html', 'http://www.nytimes.com/2018/10/16/movies/horn-from-the-heart-review-paul-butterfield.html', 'http://www.nytimes.com/2018/10/16/movies/the-price-of-everything-review-documentary.html', 'http://www.nytimes.com/2018/10/16/movies/impulso-review-documentary.html', 'http://www.nytimes.com/2018/10/11/movies/watergate-review-documentary.html', 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html', 'http://www.nytimes.com/2018/10/11/movies/over-the-limit-review.html', 'http://www.nytimes.com/2018/10/11/movies/the-kindergarten-teacher-review.html', 'http://www.nytimes.com/2018/10/11/movies/classical-period-review.html', 'http://www.nytimes.com/2018/10/11/movies/bad-times-at-the-el-royale-review.html', 'http://www.nytimes.com/2018/10/11/movies/beautiful-boy-review-steve-carell.html', 'http://www.nytimes.com/2018/10

The following code will check your answer:

In [72]:
# Run this cell without changes

# review_urls should be a list
assert type(review_urls) == list

# The length should be 20, same as the length of reviews
assert len(review_urls) == 20

# The data type contained should be string
assert type(review_urls[0]) == str and type(review_urls[-1]) == str

# Spot checking a specific value
assert review_urls[6] == 'http://www.nytimes.com/2018/10/11/movies/barbara-review.html'

## Summary
In this lab you practiced extracting and transforming data from JSON files with known schemas.