# Module 1: JSON files

There are lot of different file types, and some are more common then others. And others require more work than others. JSON is one of the most important file types.

During this training we will go into detail about JSON files, and how to work with them. For those familiar with Python and dictionaries, it will feel quite similar. During the training we will follow the following outline:
1. JSON file basics
2. The json library
3. Navigating a JSON file structure
4. JSON to information
5. Nested JSON files

Enjoy!

## Section 1: JSON file basics

JSON, or **J**ava**S**cript **O**bject **N**otation, is one of the golden standards for information exhange in the world of data. When transporting data, for example throught an API, then JSON is the way to go. JSON has been built to be readable by a lot of programming languages, and that includes Python.

Due to its structure, and its widespread availability and accessability, knowledge about JSON is essential for an aspiring data engineer. So let's have a look at those so called JSON files. In the example below we will retrieve a JSON file through an API. The example is about Game of Thrones.

In [None]:
import requests
import pprint

URL = "https://anapioficeandfire.com/api/characters/583"
jon_snow_json = requests.get(URL).json()

pprint.pprint(jon_snow_json)

The example above shows the structure of a JSON file. And as you can see, a JSON file is very structured. 
JSON files are based on key-value pairs. With each key corresponding to a specific file. The keys can be used to navigate around the JSON file.
One important thing to note is that JSON files are extremely flexible. Almost any key can be used.
There is also a lot of flexibility in the values of the JSON.

The JSON example above is retrieved from a service about the world of Game of Thrones (spoilers). We retrieved information about Jon Snow, an important character in the books and show. The JSON file contains information about this character. Each key contains values with information about what is known. For example, you can see the following things in the JSON file.
- The name, which is of course Jon Snow.
- There is also information about the father, and the mother (both are empty in this case).
- We can find which seasons and which books the character is present.
- And, interestingly, a list of aliases for the character.

If you look closely in the JSON, you can find these points of information, as well as other information.

One thing to remember is that you can see a JSON as one long string. And using Python (and the json library) we can decode those strings and work with them. Also, we can encode Python objects as JSON strings. We can do these things with the json library. 

For those with experience in Python; a JSON structure is very similar to the structure of a Python dictionary.

## Section 2: The json library

Now that we had a look at the structure of a JSON file, we want to work with it! And we want to work with in Python!
For working with JSON files we can use the appropriately named json library.

With the json library we can do a lot of things regarding json files, including:
- Decode JSON files so that we can use them within Python.
- Encode Python objects so that we can store them as JSON files.

The json library is essential in working with JSON files within Python. As with (almost) all decent libraries, there is an extensive amount of documentation that can help you understand the functionalities of the library. 

While working with Python it is essential that you learn how to read documentation. This will help speed your work up, and improve your understanding of the library. So, have a look: https://docs.python.org/3/library/json.html. 

Let's see what we can do with the json library. We're going to have a look at four of the functionalities of the library:
- json.dumps
- json.dump
- json.loads
- json.load

Let's first create a Python dictionary that we can use as a basis for our examples. See below.

In [None]:
import json

json_structure_example = {"name": "Roger Federer",
                          "age": 40,
                          "occupation": "Professional tennis player"}

Using the method of 'json.dumps' we can convert the Python dictionary to a JSON string. In that way Python will see it as a string.

In [None]:
# Use json.dumps
json_string = json.dumps(json_structure)

print(json_string)
print(type(json_string))

We can also save our created dictionary as a JSON file. We can use the method of 'json.dump'.

In [None]:
# Use json.dump
file_name = "my_first_json.json"

with open(file_name, "w") as file:
    json.dump(json_structure, file, indent=4)

So, we can use the 'json.dumps' to create a JSON string, and we can use the 'json.dump' to create a JSON string and save it as a JSON file.

Now let's look at reading JSON files. The 'json.loads' can read JSON strings and convert them to Python dictionaries.

In [None]:
print(json_string)
print(type(json_string))

converted_json_string = json.loads(json_string)
print(converted_json_string)
print(type(converted_json_string))

And using the 'json.load' method, we can read JSON files and load them as Python dictionaries.

In [None]:
file_name = "my_first_json.json"

with open("my_first_json.json", "r") as file:
    loaded_json = json.load(file)

print(loaded_json)

Now that we seen the most important methods of the json library, it's your turn to try them out. Please complete the following assigments.

#### Assignment 1: The json library 1

Create your own Python dictionary, with your name, age and occupation.

In [None]:
### FILL IN
my_dictionary = {"name": "",
                 "age": 0,
                 "occupation": ""}

#### Assignment 2: The json library 2

Convert your dictionary to a JSON string, and print it.
Use the 'json.dumps' method.

In [None]:
### FILL IN

#### Assignment 3: The json library 3

Save your dictionary as a JSON file with the name "my_second_json.json".
Use the json.dump' method.

In [None]:
### FILL IN

#### Assignment 4: The json library 4

Read your save JSON file and print it. It should have the name: "my_second_json.json".
Use the 'json.load' method.

In [None]:
### FILL IN

#### Assignment 5: The json library 5

Create a JSON string from your Python dictionary, and than convert it back to a Python dictionary.
First use the 'json.dumps' method, and then use the 'json.loads' method.

In [None]:
### FILL IN

Good job! These steps should give you some insight in how JSON files are structured, and how we can read, load and save them within Python.

## Section 3: Navigating a JSON file structure

We had a touch of the json library. We had our first of taste of saving a json file and loading a json file.
Let's move on to the most important part of JSON files. Navigating them.

The json library can convert a JSON file to a Python dictionary, and that's how can navigate it. This means that you can index JSON files by their keys.

Let's have a look at an example. 

In [None]:
### You can retrieve any key of a dictionary by indexing it on the dictionary.
json_structure_example = {"name": "Roger Federer",
                          "age": 40,
                          "occupation": "Professional tennis player"}

print(json_structure_example["name"])

In [None]:
for key in json_structure_example:
    print(key)
    print(json_structure_example[key])

This is how you can access information in JSON files. Now it's your turn to try it. Let's use the example about Jon Snow.

In [None]:
import requests
import pprint

URL = "https://anapioficeandfire.com/api/characters/583"
jon_snow_json = requests.get(URL).json()

pprint.pprint(jon_snow_json)

In the following assignments you will try and navigate the JSON file.

#### Assignment 7: Navigating a JSON file structure 1

From the 'jon_snow_json', print all keys.
Use a for loop.

In [None]:
### FILL IN

#### Assignment 7: Navigating a JSON file structure 2

From the 'jon_snow_json', print all keys that have None values.
Use a for loop.

In [None]:
### FILL IN

#### Assignment 8: Navigating a JSON file structure 3

From the 'jon_snow_json', print every value that is part of a list.
Use a for loop.

In [None]:
### FILL IN

JSON files can also be manipulated. You can add or adjust information according to your views. In the following assignments we will have a small taste of that.

#### Assignment 9: Navigating a JSON file structure 4

In the loaded json, add the following values for the following keys.
'mother': 'Lyanna Stark'
'father': 'Rhaeger Targaryen'

In [None]:
### FILL IN

#### Assignment 10: Navigating a JSON file structure 5

In the loaded json, add the following values to the 'tvSeries' key; 'Season 7' and 'Season 8'.

In [None]:
### FILL IN

Great job! Keep it going! :)

## Section 4: JSON to information

Manipulating a json file through a dictionary is quite straightforward. But the complexity changes with larger json files and dictionaries. The most important skill to develop is retrieving and storing information from JSON files / dictionaries. 

JSON files are readable by computers and by humans. But sometimes you'll want to convert a JSON file to another structure, such as lists or pandas DataFrames. In this section we'll have a look at extracting information from JSON files.

First we'll download a larger JSON file. And then you'll get to work on some larger assignments.

In [None]:
import requests
import pprint

URL = "https://anapioficeandfire.com/api/houses"
larger_json_file_1 = requests.get(URL, params={"region": "The North", "page": 5}).json()

# Print the first entry in the JSON file.
pprint.pprint(larger_json_file_1)

We retrieved another JSON file. As you can see, this is a larger file. This file contains information on some houses present in the world of Game of Thrones. There are 10 entries in total.

#### Assignment 11: JSON to information 1

Retrieve all values for the following keys; 'name', 'region', 'coatOfArms', 'words', and put them in separate lists. 
Use a for loop to fill the lists.

In [None]:
### FILL IN

#### Assigment 12: JSON to information 2

Create a pandas dataframe with information for each house from the JSON file. The following pieces of information should be present in the columns: 'name', 'region', 'coatOfArms', 'words', 'seats', 'titles', 'currentLord'.
Use a for loop to fill the pandas dataframe.

In [None]:
### FILL IN

Now let's retrieve another JSON file. It's important to work with a few different files with a bit of variation. This will help you understand the structure of JSON files even better.

In [None]:
import requests
import pprint

URL = "http://hp-api.herokuapp.com/api/characters"
potter_json = requests.get(URL).json()

# Print the first entry in the JSON file.
pprint.pprint(potter_json)

We just retrieved an even larger JSON file on Harry Potter characters. Now it's your turn to retrieve information from this JSON file and convert it to a pandas DataFrame.

#### Assigment 13: JSON to information 3

Fill lists with information on each character from the JSON file. The following pieces of information should be present in the separate lists: 'name', 'actor', 'dateOfBirth', 'gender', 'eyeColour'.
Use a for loop to fill the lists.

In [None]:
### FILL IN

#### Assigment 14: JSON to information 4

Create a pandas dataframe with the information for characters from the JSON file, but only for characters that are wizards. Use the 'wizard' key in order to determine whether a character is a wizard.

The following pieces of information should be present in the columns: 'name', 'house', 'wand', 'patronus'.
Use a for loop to fill the pandas dataframe.

In [None]:
### FILL IN

#### Assigment 15: JSON to information 5

Create a pandas dataframe with information on the characters and their wands in the JSON file, but only for characters that have wands. Have a closer look at the 'wand' key.

The following pieces of information should be present in the columns: 'name', 'core', 'length', 'wood'.
Use a for loop to fill the pandas dataframe.

In [None]:
### FILL IN.

#### Assigment 16: JSON to information 6

This is smaller, easier assignment. Use the powerfull pandas library in order to read the JSON file, and convert it at once to a pandas DataFrame.
Use the pandas.read_json (https://pandas.pydata.org/docs/reference/api/pandas.read_json.html) method.

In [None]:
### FILL IN

## Section 5: Nested JSON files

Up until now the JSON files were quite clear and structured. But sometimes the structure of JSON files can become more and more complex when objects are nested within other objects. For those familiar with Python, you can see it as dictionaries within dictionaries. Depending on the size it can be quite difficult to work with, but with enough knowledge of the structure you should be able to quickly work with it.

Let's have a look at an example.

In [None]:
# Nested JSON example.
import json

nested_json = {"species": "Tiger",
               "binomial_name": "Panthera tigris",
               "status": "Endangered",
               "subspecies": [{"name" : "Amur tiger",
                               "binomial_name": "Panthera tigris altaica",
                               "status": "Endangered"},
                             {"name" : "Bengal tiger",
                               "binomial_name": "Panthera tigris tigris",
                               "status": "Endangered"},
                             {"name" : "South China tiger",
                               "binomial_name": "Panthera tigris amoyensis",
                               "status": "Critically Endangered"},
                             {"name" : "Malayan tiger",
                               "binomial_name": "Panthera tigris jacksoni",
                               "status": ""},
                             {"name" : "Indo-Chinese tiger",
                               "binomial_name": "Panthera tigris corbetti",
                               "status": ""},
                             {"name" : "Sumatran tiger",
                               "binomial_name": "Panthera tigris sumatrae",
                               "status": "Critically endangered"},
                             {"name" : "Bali tiger",
                               "binomial_name": "Panthera tigris balica",
                               "status": "Extinct"},
                             {"name" : "Javan tiger",
                               "binomial_name": "Panthera tigris sondaica",
                               "status": "Extinct"},
                             {"name" : "Caspian tiger",
                               "binomial_name": "Panthera tigris virgata",
                               "status": "Extinct"}],
               }

print(json.dumps(nested_json, indent=4))

As you can see, the structure remains visible. The most important aspect to note is that nested JSON objects can have keys with the same as other levels within the entire JSON object. See below for an example illustrating keys with double names. 

In [None]:
# EXAMPLE
print(nested_json["binomial_name"])
print(nested_json['subspecies'][0]["binomial_name"])

The example above illustrates that a nested JSON is too different from a simpler JSON. It only requires more attention to the structure, and more attention to the navigation within the JSON. Let's try it out. Below are some assignments. First we'll retrieve a nested JSON. Below you'll see a large nested JSON about Pikachu.

In [None]:
import requests
import pprint

URL = "https://pokeapi.co/api/v2/pokemon/pikachu"
nested_json = requests.get(URL).json()

# Print the first entry in the JSON file.
pprint.pprint(nested_json)

#### Assigment 17: Nested JSON files 1

Create a pandas DataFrame with each move Pikachu can perform from the game/version_group gold-silver.

Information that needs to be present in the DataFrame: 'name', 'level_learned_at', 'move_learn_method'.

In [None]:
### FILL IN

#### Assigment 18: Nested JSON files 2

Create your own dictionary with each game/version-group as a key. As value, have a counter for the number of occurences for each game/version-group. Lastly, safe the dictionary as a JSON file.

In [None]:
### FILL IN