### A Datapipeline! (and shows JSON exercise too)
This notebook shows intro to JSON and a simple datapipeline in 2nd half.

In [None]:
import json

In [None]:
with open("zipcoder.json", "r") as read_file:
    zc = json.load(read_file)

In [None]:
print(zc)

In [None]:
type(zc)

In [None]:
type(zc['children'][0]['firstName'])
print(zc['children'][0]['firstName'])

In [None]:
data = {
    "president": {
        "name": "Zaphod Beeblebrox",
        "species": "Betelgeusian",
        'age': 85,
        'theta': 34.78992397
    }
}

In [None]:
type(data)

In [None]:
with open("data_file.json", "w") as write_file:
    json.dump(data, write_file)

In [None]:
json_string = json.dumps(data)

In [None]:
print(json_string[19:])

In [None]:
print(json.dumps(data, indent=2))

## Data Types Mapping


- JSON -> Python
- object -> dict
- array -> list
- string -> str
- number (int) -> int
- number (real) -> float
- true -> True
- false -> False
- null -> None


In [None]:
with open("data_file.json", "r") as read_file:
    data = json.load(read_file)

In [None]:
print(data)

In [None]:
json_string = """
{
    "researcher": {
        "name": "Ford Prefect",
        "species": "Betelgeusian",
        "relatives": [
            {
                "name": "Zaphod J. Beeblebrox",
                "species": "Betelgeusian"
            }
        ]
    }
}
"""


In [None]:
type(json_string)

In [None]:
data = json.loads(json_string)
print(data, type(data))

### A Simple Data pipeline

uses [Requests python package](https://requests.readthedocs.io/en/latest/)

In [None]:
import json
import requests

In [None]:
response = requests.get("https://jsonplaceholder.typicode.com/todos")
todos = json.loads(response.text)

In [None]:
todos == response.json()

In [None]:
print(response)

In [None]:
type(todos)

In [None]:
len(todos)

In [None]:
print(todos)

In [None]:
todos[1:5]

In [None]:
# Map of userId to number of complete TODOs for that user
todos_by_user = {}

# Increment complete TODOs count for each user.
for todo in todos:
    if todo["completed"]:
        try:
            # Increment the existing user's count.
            todos_by_user[todo["userId"]] += 1
        except KeyError:
            # This user has not been seen. Set their count to 1.
            todos_by_user[todo["userId"]] = 1

# Create a sorted list of (userId, num_complete) tuples.
top_users = sorted(todos_by_user.items(), key=lambda x: x[1], reverse=True)

# Get the maximum number of complete TODOs.
max_complete = top_users[0][1]

# Create a list of all users who have completed
# the maximum number of TODOs.
users = []
for user, num_complete in top_users:
    if num_complete < max_complete:
        break
    users.append(str(user))

max_users = " and ".join(users)


In [None]:
print(users)

For your final task, you’ll create a JSON file that contains the completed TODOs for each of the users who completed the maximum number of TODOs.

All you need to do is filter todos and write the resulting list to a file. For the sake of originality, you can call the output file filtered_data_file.json. There are many ways you could go about this, but here’s one:

In [None]:
# Define a function to filter out completed TODOs 
# of users with max completed TODOS.
def keep(todo):
    is_complete = todo["completed"]
    has_max_count = str(todo["userId"]) in users
    return is_complete and has_max_count

# Write filtered TODOs to file.
with open("filtered_data_file.json", "w") as data_file:
    filtered_todos = list(filter(keep, todos))
    json.dump(filtered_todos, data_file, indent=2)
    

But _hold on_ what does that `filtered_todos = list(filter(keep, todos))` line do?!?

In [83]:
# list(filter(keep, todos)) call above is identical to the following code

tlist = []
for t in todos:
    if keep(t):
        tlist.append(t)
# tlist is result

In [84]:
print(tlist)

[{'userId': 5, 'id': 81, 'title': 'suscipit qui totam', 'completed': True}, {'userId': 5, 'id': 83, 'title': 'quidem at rerum quis ex aut sit quam', 'completed': True}, {'userId': 5, 'id': 85, 'title': 'et quia ad iste a', 'completed': True}, {'userId': 5, 'id': 86, 'title': 'incidunt ut saepe autem', 'completed': True}, {'userId': 5, 'id': 87, 'title': 'laudantium quae eligendi consequatur quia et vero autem', 'completed': True}, {'userId': 5, 'id': 89, 'title': 'sequi ut omnis et', 'completed': True}, {'userId': 5, 'id': 90, 'title': 'molestiae nisi accusantium tenetur dolorem et', 'completed': True}, {'userId': 5, 'id': 91, 'title': 'nulla quis consequatur saepe qui id expedita', 'completed': True}, {'userId': 5, 'id': 92, 'title': 'in omnis laboriosam', 'completed': True}, {'userId': 5, 'id': 93, 'title': 'odio iure consequatur molestiae quibusdam necessitatibus quia sint', 'completed': True}, {'userId': 5, 'id': 95, 'title': 'vel nihil et molestiae iusto assumenda nemo quo ut', 'c

## Now...

Figure out which of the cells above are needed to create a "top-users-complete.py" file which when run from the python interepreter, outputs the correct data to a JSON file?

Do that. Make the script, run it, and prove the two outputs are equal.

That's the "take the notebook code which works" and turn it into a "python script that can be run from anywhere".
