In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("practice.ipynb")

In [2]:
import practice_test

# Lab-P8: Lists and Dictionaries

This lab is designed to help you prepare for p8. We will focus on dictionaries, mutating lists, binning, and copying.

## Learning Objectives:

In this lab, you will practice how to...
* Integrate relevant information from various sources (e.g. multiple csv files)
* Build appropriate data structures for organization and informative presentation (e.g. list of dictionaries)
* Practice good coding style

## Note on Academic Misconduct:

**IMPORTANT**: p8 and p9 are two parts of the same data analysis. You **cannot** switch project partners between these two projects. That is if you partner up with someone for p8, you have to sustain that partnership until end of p9. You must acknowledge to the lab TA to receive lab attendance credit.

You may do these lab exercises with only with your project partner; you are not allowed to start working on lab-p8 with one person, then do the project with a different partner.  Now may be a good time to review [our course policies](https://cs220.cs.wisc.edu/f22/syllabus.html).

#### Please make sure `small_movies.csv` and `small_mapping.csv` are in your `lab-p8` folder before continuing.

## Introduction:

In p8 and p9, we will be working on the [IMDb Movies Dataset](https://www.imdb.com/interfaces/). We will use Python to discover some cool facts about our favorite movies, cast, and directors.

In lab-p8, you will work with a small subset of movies, and practice writing some code to parse the data stored in `small_mapping.csv` and `small_movies.csv`. You can then use this code to parse the much larger datasets in p8 and p9.

## The Data:

Open `small_movies.csv` and `small_mapping.csv` in any spreadsheet viewer, and see what the data looks like. When seen with a good spreadsheet viewer, this is what some of `small_movies.csv` wil look like:

|title|year|duration|genres|rating|directors|cast|
| ----  |-----------------|-----------------------------|----------|---------------------------------------------------------------------|----------------------------------------------|--------|
|tt3104988|2018|120|"Comedy, Drama, Romance"|6.9|nm0160840|"nm2090422, nm6525901, nm0000706, nm2110418, nm0523734"|
|tt4846340|2016|127|"Biography, Drama, History"|7.8|nm0577647|"nm0378245, nm0818055, nm1847117"|

However, if you open the raw version of `small_movies.csv`, you are more likely to see something like this. It's the same data, but is sometimes a little harder to read:
```
title,year,duration,genres,rating,directors,cast
tt3104988,2018,120,"Comedy, Drama, Romance",6.9,nm0160840,"nm2090422, nm6525901, nm0000706, nm2110418, nm0523734"
tt4846340,2016,127,"Biography, Drama, History",7.8,nm0577647,"nm0378245, nm0818055, nm1847117"
```

The `title`, `directors`, and `cast` members are represented by their unique *IMDb ID* instead of their actual *names*. Now would be a good time to open `small_mapping.csv` to observe the data stored there. It should look like this:

|         |           |
| --------|-----------|
|tt3104988|Crazy Rich Asians|
|nm0160840|Jon M. Chu|
|nm2090422|Constance Wu|
|nm6525901|Henry Golding|
|nm0000706|Michelle Yeoh|
|nm2110418|Gemma Chan|
|nm0523734|Lisa Lu|
|tt4846340|Hidden Figures|
|nm0577647|Theodore Melfi|
|nm0378245|Taraji P. Henson|
|nm0818055|Octavia Spencer|
|nm1847117|Janelle Monáe|

Note that this file does **not** have a header. This file maps the *IMDB IDs* to the actual *names* (so for example, the *name* of the movie with the *IMDB ID* of *tt3104988* is *Crazy Rich Asians*).

## Segment 2: Loading data from csv file

In this segment, you will learn to parse the data in `small_movies.csv` and `small_mapping.csv`, and convert them into useful data structures.

However, before we do any of that, **open `small_movies.csv` and `small_mapping.csv` with Excel or some other spreadsheet viewer** and look at the format in which the data is stored. Inspecting this data will be extremely useful when you try to read it in Python.

In [3]:
# it is considered a good coding practice to place all import statements at the top of the notebook
# place all your import statements in this cell if you need to import any more modules for this project

# we have imported the csv module for you
import csv

### Task 2.1: Process `small_movies.csv`

In [5]:
# copy/paste the 'process_csv' function from lab-p6 or lab-p7 here
def process_csv(filename):
    example_file = open(filename, encoding="utf-8")
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data

In [6]:
# read data from "small_movies.csv"
file_data = process_csv("small_movies.csv")

**Question 1:** What is the **header** of the file `small_movies.csv`?

Your output **must** be a **list** of **strings**.

In [7]:
# extract the header into csv_header variable
csv_header = file_data[0]

csv_header

['title', 'year', 'duration', 'genres', 'rating', 'directors', 'cast']

In [8]:
grader.check("q1")

**Question 2:** What is the **data** (without the header) in the file `small_movies.csv`?

Your output **must** be a **list** of **lists**.

In [11]:
# extract just the data rows into csv_rows variable
csv_rows = process_csv("small_movies.csv")
csv_rows = csv_rows[1:]

csv_rows

[['tt3104988',
  '2018',
  '120',
  'Comedy, Drama, Romance',
  '6.9',
  'nm0160840',
  'nm2090422, nm6525901, nm0000706, nm2110418, nm0523734'],
 ['tt4846340',
  '2016',
  '127',
  'Biography, Drama, History',
  '7.8',
  'nm0577647',
  'nm0378245, nm0818055, nm1847117']]

In [12]:
grader.check("q2")

### Task 2.2: Convert your `small_mapping` to a `dict`

**Question 3:** What is the **data** in the file `small_mapping.csv`?

Your output **must** be a **list** of **lists**. There is no header in `small_mapping.csv`, so you should **not** slice off the first row here.

In [13]:
# use process_csv to read `small_mapping.csv` into a list of lists data structure
mapping_rows = process_csv("small_mapping.csv")

mapping_rows

[['tt3104988', 'Crazy Rich Asians'],
 ['nm0160840', 'Jon M. Chu'],
 ['nm2090422', 'Constance Wu'],
 ['nm6525901', 'Henry Golding'],
 ['nm0000706', 'Michelle Yeoh'],
 ['nm2110418', 'Gemma Chan'],
 ['nm0523734', 'Lisa Lu'],
 ['tt4846340', 'Hidden Figures'],
 ['nm0577647', 'Theodore Melfi'],
 ['nm0378245', 'Taraji P. Henson'],
 ['nm0818055', 'Octavia Spencer'],
 ['nm1847117', 'Janelle Monáe']]

In [14]:
grader.check("q3")

Currently `mapping_rows` is a **list** of **lists**.  To make it more useful, let us convert it to a **dict** with the *ID* as the **key** and the *name* as the **value**, like this:

```python
{'tt3104988':'Crazy Rich Asians',
'nm0160840': 'Jon M. Chu',
'nm2090422': 'Constance Wu',
'nm6525901': 'Henry Golding',
'nm0000706': 'Michelle Yeoh',
'nm2110418': 'Gemma Chan',
'nm0523734': 'Lisa Lu',
'tt4846340': 'Hidden Figures',
'nm0577647': 'Theodore Melfi',
'nm0378245': 'Taraji P. Henson',
'nm0818055': 'Octavia Spencer',
'nm1847117': 'Janelle Monáe'}
```

**Question 4:** Display the **data** in the file `small_mapping.csv` as a **dictionary**.

Your output **must** be a **dictionary**.

It is acceptable for you to *hardcode* **column indices** here. This is because this csv file does not have a header, and it is **implicit** that the first string is the ID and that the second string is the name associated with the ID. For csv files which have headers, you should **never** hardcode **column indices**.

In [19]:
mapping_dict = {} # initialize an empty dictionary into the variable mapping_dict
# TODO: iterate over each row of the small_mapping dataset
for idx in range (len(mapping_rows)):
    mapping_dict[mapping_rows[idx][0]] = mapping_rows[idx][1]
                                                        
mapping_dict

{'tt3104988': 'Crazy Rich Asians',
 'nm0160840': 'Jon M. Chu',
 'nm2090422': 'Constance Wu',
 'nm6525901': 'Henry Golding',
 'nm0000706': 'Michelle Yeoh',
 'nm2110418': 'Gemma Chan',
 'nm0523734': 'Lisa Lu',
 'tt4846340': 'Hidden Figures',
 'nm0577647': 'Theodore Melfi',
 'nm0378245': 'Taraji P. Henson',
 'nm0818055': 'Octavia Spencer',
 'nm1847117': 'Janelle Monáe'}

In [20]:
grader.check("q4")

**Question 5:** What is the **value** associated with the key *nm0160840*?

In [21]:
# we have done this one for you
nm0160840_value = mapping_dict["nm0160840"]

nm0160840_value

'Jon M. Chu'

In [22]:
grader.check("q5")

**Question 6:** What is the **value** associated with the key *tt4846340*?

In [25]:
# replace the ... with your code
tt4846340_value = mapping_dict["tt4846340"]

tt4846340_value

'Hidden Figures'

In [26]:
grader.check("q6")

### Task 2.3: Convert your `small_movies` into a `list` of `dicts`

### Task 2.3.1: Convert a list of lists to a list of dictionaries

Now, let's go back to `small_movies.csv`. Your variable `csv_rows` (defined above in q2) should look like this:

```python
[['tt3104988',
  '2018',
  '120',
  'Comedy, Drama, Romance',
  '6.9',
  'nm0160840',
  'nm2090422, nm6525901, nm0000706, nm2110418, nm0523734'],
 ['tt4846340',
  '2016',
  '127',
  'Biography, Drama, History',
  '7.8',
  'nm0577647',
  'nm0378245, nm0818055, nm1847117']]
```

It's a list of lists without its header. To make it easier to access data, let us convert it to a **list** of **dictionaries**. The data structure should look like:

```python
[{'title': 'tt3104988',
  'year': '2018',
  'duration': '120',
  'genres': 'Comedy, Drama, Romance',
  'rating': '6.9',
  'directors': 'nm0160840',
  'cast': 'nm2090422, nm6525901, nm0000706, nm2110418, nm0523734'},
 {'title': 'tt4846340',
  'year': '2016',
  'duration': '127',
  'genres': 'Biography, Drama, History',
  'rating': '7.8',
  'directors': 'nm0577647',
  'cast': 'nm0378245, nm0818055, nm1847117'}]
```

**Question 7.1:** Display the **first** movie in the file `small_movies.csv` as a **dictionary**.

In [31]:
first_movie = {} # initialize an empty dictionary
first_movie["title"] = csv_rows[0][csv_header.index("title")] # extract the title of the movie
first_movie["year"] = csv_rows[0][csv_header.index("year")]
first_movie["duration"] = csv_rows[0][csv_header.index("duration")]
first_movie["genres"] = csv_rows[0][csv_header.index("genres")]
first_movie["rating"] = csv_rows[0][csv_header.index("rating")]
first_movie["directors"] = csv_rows[0][csv_header.index("directors")]
first_movie["cast"] = csv_rows[0][csv_header.index("cast")]
# TODO: add the other columns to first_movie

first_movie

{'title': 'tt3104988',
 'year': '2018',
 'duration': '120',
 'genres': 'Comedy, Drama, Romance',
 'rating': '6.9',
 'directors': 'nm0160840',
 'cast': 'nm2090422, nm6525901, nm0000706, nm2110418, nm0523734'}

In [32]:
grader.check("q7-1")

**Question 7.2:** Display the **data** in the file `small_movies.csv` as a **list** of **dictionaries**.

In [45]:
raw_movies_list = [] # use this empty list to append your dictionary
# TODO: loop through all the rows of csv_rows
# TODO: create a dictionary similar to the one in q7.1 for each movie
# TODO: add each movie dictionary to raw_movies_list

for idx in range(len(csv_rows)):
    movie = {}
    movie["title"] = csv_rows[idx][csv_header.index("title")] # extract the title of the movie
    movie["year"] = csv_rows[idx][csv_header.index("year")]
    movie["duration"] = csv_rows[idx][csv_header.index("duration")]
    movie["genres"] = csv_rows[idx][csv_header.index("genres")]
    movie["rating"] = csv_rows[idx][csv_header.index("rating")]
    movie["directors"] = csv_rows[idx][csv_header.index("directors")]
    movie["cast"] = csv_rows[idx][csv_header.index("cast")]
    raw_movies_list.append(movie)

raw_movies_list

[{'title': 'tt3104988',
  'year': '2018',
  'duration': '120',
  'genres': 'Comedy, Drama, Romance',
  'rating': '6.9',
  'directors': 'nm0160840',
  'cast': 'nm2090422, nm6525901, nm0000706, nm2110418, nm0523734'},
 {'title': 'tt4846340',
  'year': '2016',
  'duration': '127',
  'genres': 'Biography, Drama, History',
  'rating': '7.8',
  'directors': 'nm0577647',
  'cast': 'nm0378245, nm0818055, nm1847117'}]

In [46]:
grader.check("q7-2")

**Question 8:** What is the `title` *ID* of the **first** movie in your list?

Your output **must** be a **string**.

In [47]:
# we have done this one for you
first_movie_title = raw_movies_list[0]['title']

first_movie_title

'tt3104988'

In [48]:
grader.check("q8")

**Question 9:** What is the `duration` of the **second** movie in your list?

Your output **must** be a **string**. You **must** answer this question by querying the value from the `raw_movies_list` data structure.

In [49]:
# compute and store the answer in the variable 'second_movie_duration', then display it
second_movie_duration = raw_movies_list[1]["duration"]

In [50]:
grader.check("q9")

**Question 10:** What are the `genres` of the **second** movie in your list?

Your output **must** be a **string**. You **must** answer this question by querying the value from the `raw_movies_list` data structure.

In [53]:
# compute and store the answer in the variable 'second_movie_genres', then display it
second_movie_genres = raw_movies_list[1]["genres"]

In [54]:
grader.check("q10")

### Task 2.3.2: Convert the `int` and `float` values to the correct type

Did you notice that currently all the values in the dictionaries are **strings**? We should convert them into correct types. In particular, the `year` and `duration` should be data type **int** and the `rating` should be a **float**. 

After converting the **int** and **float** values to the right types, your list of dictionaries should look like: 

```python
[{'title': 'tt3104988',
  'year': 2018,
  'duration': 120,
  'genres': 'Comedy, Drama, Romance',
  'rating': 6.9,
  'directors': 'nm0160840',
  'cast': 'nm2090422, nm6525901, nm0000706, nm2110418, nm0523734'},
 {'title': 'tt4846340',
  'year': 2016,
  'duration': 127,
  'genres': 'Biography, Drama, History',
  'rating': 7.8,
  'directors': 'nm0577647',
  'cast': 'nm0378245, nm0818055, nm1847117'}]
```

**Question 11:** Display the **data** in the file `small_movies.csv` as a **list** of **dictionaries**.

The `year` and `duration` values **must** be of data type **int** and the `rating` value **must** be a **float**.

In [76]:
mostly_raw_movies_list = [] # use this empty list to append your dictionary
for movie in range(len(raw_movies_list)): # loop (directly) through movies in raw_movies_list
    new_movie = {} # create an empty dictionary to insert values for each movie
    new_movie['title'] = raw_movies_list[movie]["title"] # extract the title of movie 
    new_movie['year'] = int(raw_movies_list[movie]["year"]) # convert the year of movie into an int
    new_movie['duration'] = int(raw_movies_list[movie]["duration"])
    new_movie['genres'] = (raw_movies_list[movie]["genres"])
    new_movie['rating'] = float(raw_movies_list[movie]["rating"])
    new_movie['directors'] = (raw_movies_list[movie]["directors"])
    new_movie['cast'] = (raw_movies_list[movie]["cast"])
    # TODO: add the other columns (with the required data types) to new_movie
    # TODO: add new_movie to mostly_raw_movies_list
    mostly_raw_movies_list.append(new_movie)

mostly_raw_movies_list

[{'title': 'tt3104988',
  'year': 2018,
  'duration': 120,
  'genres': 'Comedy, Drama, Romance',
  'rating': 6.9,
  'directors': 'nm0160840',
  'cast': 'nm2090422, nm6525901, nm0000706, nm2110418, nm0523734'},
 {'title': 'tt4846340',
  'year': 2016,
  'duration': 127,
  'genres': 'Biography, Drama, History',
  'rating': 7.8,
  'directors': 'nm0577647',
  'cast': 'nm0378245, nm0818055, nm1847117'}]

In [77]:
grader.check("q11")

**Question 12:** What is the `type` of the `duration` of the **first** movie in your list?

In [78]:
# we have done this one for you
first_movie_duration_type = type(mostly_raw_movies_list[0]["duration"])

first_movie_duration_type

int

In [79]:
grader.check("q12")

**Question 12:** What is the `type` of the `rating` of the **first** movie in your list?

In [80]:
# we have done this one for you
first_movie_rating_type = type(mostly_raw_movies_list[0]["rating"])

first_movie_rating_type

float

In [81]:
grader.check("q13")

### Task 2.3.3: Convert the `genres`, `directors`, and `cast` to list of strings

Run the next cell and observe its output.

In [82]:
# these are the 'genres' of the first movie
mostly_raw_movies_list[0]["genres"]

'Comedy, Drama, Romance'

Notice that the `genres` are stored as a *single* **string**. It would be much more useful to store this value as **list** of *three different* **strings**, with each **string** representing a *single* **string**: `['Comedy', 'Drama', 'Romance']`. Unfortunately, the CSV file format cannot represent **lists**, so **lists** of **strings** are often represented as a *single* **string** with the values separated by a comma. So, we will have to convert the **string** into a **list** of **strings** ourselves.

In the `small_movies.csv` and `movies.csv` datasets, the `directors` and `cast` are similarly stored as a *single* **string** with the values separated by a comma(`, `). We are now going to convert the value corresponding to the keys `genres`, `casts`, and `directors` to a **list** of **strings**. The output **must** be a **list** of **dictionaries** in the following format:

```python
   {
        'title': <title-id>,
        'year': <the year as an integer>,
        'duration': <the duration as an integer>,
        'genres': [<genre1>, <genre2>, ...],
        'rating': <the rating as a float>,
        'directors': [<director-id1>, <director-id2>, ...],
        'cast': [<actor-id1>, <actor-id2>, ....]
    }
```

After converting the strings to list of strings, the **list** of **dictionaries** should look like: 

```python
    [{'title': 'tt3104988',
      'year': 2018,
      'duration': 120,
      'genres': ['Comedy', 'Drama', 'Romance'],
      'rating': 6.9,
      'directors': ['nm0160840'],
      'cast': ['nm2090422', 'nm6525901', 'nm0000706', 'nm2110418', 'nm0523734']},
     {'title': 'tt4846340',
      'year': 2016,
      'duration': 127,
      'genres': ['Biography', 'Drama', 'History'],
      'rating': 7.8,
      'directors': ['nm0577647'],
      'cast': ['nm0378245', 'nm0818055', 'nm1847117']}]
```

**Question 14:** Display the **data** in the file `small_movies.csv` as a **list** of **dictionaries**.

The `genres`, `directors`, and `cast` values **must** be **lists** of **strings**.

**Hint:** Recall that there is a **string method** that enables you to perform this. If you don't know where to start, please review the [lecture slides](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-lecture-material/-/tree/main/f22/meena_lec_notes/lec-14) from the October 10 lecture.

In [103]:
semi_raw_movies_list = [] # use this empty list to append your dictionary
for movie in range(len(mostly_raw_movies_list)): # loop through movies in mostly_raw_movies_list
    new_movie = {} # create an empty dictionary to insert values for each movie
    new_movie['title'] = mostly_raw_movies_list[movie]["title"] # extract the title of movie 
    new_movie['year'] = int(mostly_raw_movies_list[movie]["year"])
    new_movie['duration'] = int(mostly_raw_movies_list[movie]["duration"])# extract the year of movie
    new_movie['genres'] = mostly_raw_movies_list[movie]["genres"].split(",")
    for idx in range(len(new_movie['genres'])):
        new_movie['genres'][idx] =  new_movie['genres'][idx].lstrip() 
    new_movie['rating'] = float(mostly_raw_movies_list[movie]["rating"])
    new_movie['directors'] = (mostly_raw_movies_list[movie]["directors"]).split(",")
    for idx in range(len(new_movie['directors'])):
        new_movie['directors'][idx] =  new_movie['directors'][idx].lstrip() 
    new_movie['cast'] = (mostly_raw_movies_list[movie]["cast"]).split(",")
    for idx in range(len(new_movie['cast'])):
        new_movie['cast'][idx] =  new_movie['cast'][idx].lstrip() 
    # split the genres into a list of strings
    # TODO: add the other columns (with the required data types) to new_movie
    # TODO: add new_movie to semi_raw_movies_list
    semi_raw_movies_list.append(new_movie)

semi_raw_movies_list

[{'title': 'tt3104988',
  'year': 2018,
  'duration': 120,
  'genres': ['Comedy', 'Drama', 'Romance'],
  'rating': 6.9,
  'directors': ['nm0160840'],
  'cast': ['nm2090422', 'nm6525901', 'nm0000706', 'nm2110418', 'nm0523734']},
 {'title': 'tt4846340',
  'year': 2016,
  'duration': 127,
  'genres': ['Biography', 'Drama', 'History'],
  'rating': 7.8,
  'directors': ['nm0577647'],
  'cast': ['nm0378245', 'nm0818055', 'nm1847117']}]

In [104]:
grader.check("q14")

**Question 15:** What are the `genres` of the **second** movie in your list?

In [105]:
# we have done this one for you
second_movie_genres_list = semi_raw_movies_list[1]["genres"]

second_movie_genres_list

['Biography', 'Drama', 'History']

In [106]:
grader.check("q15")

**Question 16:** How **many** `cast` members are there in the **second** movie?

You **must** answer this question by querying the value from the `semi_raw_movies_list` data structure.

In [107]:
# replace the ... with your code
second_movie_num_cast = len(semi_raw_movies_list[1]["cast"])

second_movie_num_cast

3

In [108]:
grader.check("q16")

# Segment 3: Mapping IDs to Actual Names

You may have noticed that `title`, `directors`, and `cast` are represented by *IDs* rather than actual *names*. To make our data more intuitive, we next need to **convert** these *IDs* to actual *names*. The output **must** be a **list** of **dictionaries** in the following format:

```python
   {
        'title': "the movie name",
        'year': <the year as an integer>,
        'duration': <the duration as an integer>,
        'genres': [<genre1>, <genre2>, ...],
        'rating': <the rating as a float>,
        'directors': ["director-name1", "director-name2", ...],
        'cast': ["actor-name1", "actor-name2", ....]
    }
```

After converting the IDs to actual names, your **list** of **dictionaries** should look like:

```python
        [{'title': 'Crazy Rich Asians',
      'year': 2018,
      'duration': 120,
      'genres': ['Comedy', 'Drama', 'Romance'],
      'rating': 6.9,
      'directors': ['Jon M. Chu'],
      'cast': ['Constance Wu',
       'Henry Golding',
       'Michelle Yeoh',
       'Gemma Chan',
       'Lisa Lu']},
     {'title': 'Hidden Figures',
      'year': 2016,
      'duration': 127,
      'genres': ['Biography', 'Drama', 'History'],
      'rating': 7.8,
      'directors': ['Theodore Melfi'],
      'cast': ['Taraji P. Henson', 'Octavia Spencer', 'Janelle Monáe']}]
```

### Task 3.1: Find the Actual Names

Run the cell below and try to explain its output.

In [109]:
title_id = semi_raw_movies_list[0]["title"] # extract the title of the first movie
title = mapping_dict[title_id] # recall the dictionary mapping_dict from q4
print(title_id + ": " + title)

tt3104988: Crazy Rich Asians


**Question 17.1:** List the `title` of the **second** movie in your list.

Your output **must** be a **string** of the *name* and **not** the *ID*. You **must** answer this question by querying values from the `semi_raw_movies_list` and `mapping_dict` data structures.

In [110]:
# compute and store the answer in the variable 'title_second', then display it
title_second = mapping_dict[semi_raw_movies_list[1]["title"]]

In [111]:
grader.check("q17-1")

**Question 17.2:** List the **names** of the `cast` of the **first** movie in your list.

Your output **must** be a **list** of **strings**.

In [131]:
cast_names = [] # create an empty list to store the names of the cast members
first_movie = semi_raw_movies_list[0] # extract the dictionary of the first movie
# TODO: iterate over the IDs of the cast members of first_movie
# take each cast member's ID and lookup the value from the mapping_dict dict
# add each cast member's name to the list 'cast_names'

print (first_movie)

print (first_movie["cast"][0])

for ID in range(len(first_movie["cast"])):
    cast_names.append(mapping_dict[first_movie["cast"][ID]])
cast_names

{'title': 'tt3104988', 'year': 2018, 'duration': 120, 'genres': ['Comedy', 'Drama', 'Romance'], 'rating': 6.9, 'directors': ['nm0160840'], 'cast': ['nm2090422', 'nm6525901', 'nm0000706', 'nm2110418', 'nm0523734']}
nm2090422


['Constance Wu', 'Henry Golding', 'Michelle Yeoh', 'Gemma Chan', 'Lisa Lu']

In [132]:
grader.check("q17-2")

**Question 17.3:** List the **names** of all the `directors` of **both** movies in your list.

Your output **must** be a **list** of **strings**.

In [None]:
# compute and store the answer in the variable 'directors', then display it
# create an empty list to store the names of the directors
# TODO: iterate over each movie in semi_raw_movies_list
    # TODO: for each movie, iterate over the IDs of the directors
        # take each director's ID and lookup the value from the mapping_dict dict
        # add each director's name to the list 'directors'
        
# display the variable 'directors'

dir = []


In [None]:
grader.check("q17-3")

### Task 3.2: Convert to Actual Names

Use your `mapping_dict` and `semi_raw_movies_list` to finish this task.
There are three columns (`title`, `directors`, and `cast`) which need to be converted.

We will use convert these columns *incrementally*. First, we will convert the `title` values to be the **names** instead of the **IDs**.

**Question 18.1:** Display the **data** in the file `small_movies.csv` as a **list** of **dictionaries**.

The `title` values **must** be **actual names** instead of **IDs**. The `directors` and `cast` values **must** be the **IDs**.

In [142]:
movies_list_v1 = [] # use this empty list to append your dictionary
# TODO: iterate over the movies in semi_raw_movies_list
    # TODO: create a new empty dictionary for each movie
    # TODO: find the actual title from the title ID and add to the new dictionary
    # TODO: add the other columns to the new dictionary
    # TODO: add the new dictionary to movies_list_v1
    
for movie in range(len(semi_raw_movies_list)):
    new_movie = {}
    new_movie["title"] = mapping_dict[semi_raw_movies_list[movie]["title"]]
    new_movie["year"] = semi_raw_movies_list[movie]["year"]
    new_movie["duration"] = semi_raw_movies_list[movie]["duration"]
    new_movie["genres"] = semi_raw_movies_list[movie]["genres"]
    new_movie["rating"] = semi_raw_movies_list[movie]["rating"]
    new_movie["directors"] = semi_raw_movies_list[movie]["directors"]
    new_movie["cast"] = semi_raw_movies_list[movie]["cast"]
    
    movies_list_v1.append(new_movie)
movies_list_v1

[{'title': 'Crazy Rich Asians',
  'year': 2018,
  'duration': 120,
  'genres': ['Comedy', 'Drama', 'Romance'],
  'rating': 6.9,
  'directors': ['nm0160840'],
  'cast': ['nm2090422', 'nm6525901', 'nm0000706', 'nm2110418', 'nm0523734']},
 {'title': 'Hidden Figures',
  'year': 2016,
  'duration': 127,
  'genres': ['Biography', 'Drama', 'History'],
  'rating': 7.8,
  'directors': ['nm0577647'],
  'cast': ['nm0378245', 'nm0818055', 'nm1847117']}]

In [143]:
grader.check("q18-1")

 We will now convert the `directors` values to be the **names** instead of the **IDs**.

**Question 18.2:** Display the **data** in the file `small_movies.csv` as a **list** of **dictionaries**.

The `title` and `directors` values **must** be **actual names** instead of **IDs**. The `cast` values **must** be the **IDs**.

In [151]:
movies_list_v2 = [] # use this empty list to append your dictionary
# TODO: copy/paste your loop for q18.1 where you defined movies_list_v1
# TODO: inside the loop, for each movie define a new list for the names of the directors
# TODO: loop through the director IDs of the movie and add their names to list
# TODO: assign the value of the key 'directors' to be this new list
# TODO: add the new dictionary to movies_list_v2

for movie in range(len(semi_raw_movies_list)):
    direct = []
    new_movie = {}
    new_movie["title"] = mapping_dict[semi_raw_movies_list[movie]["title"]]
    new_movie["year"] = semi_raw_movies_list[movie]["year"]
    new_movie["duration"] = semi_raw_movies_list[movie]["duration"]
    new_movie["genres"] = semi_raw_movies_list[movie]["genres"]
    new_movie["rating"] = semi_raw_movies_list[movie]["rating"]
    for name in range (len(semi_raw_movies_list[movie]["directors"])):
        direct.append(mapping_dict[semi_raw_movies_list[movie]["directors"][name]])
    new_movie["directors"] = direct
    new_movie["cast"] = semi_raw_movies_list[movie]["cast"]
    movies_list_v2.append(new_movie)

movies_list_v2

[{'title': 'Crazy Rich Asians',
  'year': 2018,
  'duration': 120,
  'genres': ['Comedy', 'Drama', 'Romance'],
  'rating': 6.9,
  'directors': ['Jon M. Chu'],
  'cast': ['nm2090422', 'nm6525901', 'nm0000706', 'nm2110418', 'nm0523734']},
 {'title': 'Hidden Figures',
  'year': 2016,
  'duration': 127,
  'genres': ['Biography', 'Drama', 'History'],
  'rating': 7.8,
  'directors': ['Theodore Melfi'],
  'cast': ['nm0378245', 'nm0818055', 'nm1847117']}]

In [150]:
grader.check("q18-2")

 Finally, we will now convert the `cast` values to be the **names** instead of the **IDs**, to finish our definition of `movies_list`.

**Question 18.3:** Display the **data** in the file `small_movies.csv` as a **list** of **dictionaries**.

The `title`, `directors` and `cast` values **must** be **actual names** instead of **IDs**.

In [155]:
movies_list = [] # use this empty list to append your dictionary
# TODO: copy/paste your loop for q18.1 where you defined movies_list_v1
# TODO: replace the IDs of the cast members with their names

for movie in range(len(semi_raw_movies_list)):
    direct = []
    cast = []
    new_movie = {}
    new_movie["title"] = mapping_dict[semi_raw_movies_list[movie]["title"]]
    new_movie["year"] = semi_raw_movies_list[movie]["year"]
    new_movie["duration"] = semi_raw_movies_list[movie]["duration"]
    new_movie["genres"] = semi_raw_movies_list[movie]["genres"]
    new_movie["rating"] = semi_raw_movies_list[movie]["rating"]
    for name in range (len(semi_raw_movies_list[movie]["directors"])):
        direct.append(mapping_dict[semi_raw_movies_list[movie]["directors"][name]])
    new_movie["directors"] = direct
    new_movie["cast"] = semi_raw_movies_list[movie]["cast"]
    for name in range (len(semi_raw_movies_list[movie]["cast"])):
        cast.append(mapping_dict[semi_raw_movies_list[movie]["cast"][name]])
    new_movie["cast"] = cast
    
    movies_list.append(new_movie)

movies_list

[{'title': 'Crazy Rich Asians',
  'year': 2018,
  'duration': 120,
  'genres': ['Comedy', 'Drama', 'Romance'],
  'rating': 6.9,
  'directors': ['Jon M. Chu'],
  'cast': ['Constance Wu',
   'Henry Golding',
   'Michelle Yeoh',
   'Gemma Chan',
   'Lisa Lu']},
 {'title': 'Hidden Figures',
  'year': 2016,
  'duration': 127,
  'genres': ['Biography', 'Drama', 'History'],
  'rating': 7.8,
  'directors': ['Theodore Melfi'],
  'cast': ['Taraji P. Henson', 'Octavia Spencer', 'Janelle Monáe']}]

In [156]:
grader.check("q18-3")

We are now ready to test your data structure `movies_list`.

**Question 19:** What is the movie `title` of the **first** movie in your list?

In [157]:
# we have done this one for you
first_movie_title = movies_list[0]["title"]

first_movie_title

'Crazy Rich Asians'

In [158]:
grader.check("q19")

**Question 20:** Who are the `directors` of the **second movie** in your list?

Your output **must** be a **list** of **strings**. You **must** answer this question by querying the value from the `movies_list` data structure.

In [161]:
# compute and store the answer in the variable 'second_movie_directors', then display it
second_movie_directors = movies_list[1]["directors"]

In [162]:
grader.check("q20")

## Great work! You are now ready to start [p8](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-f22-projects/-/tree/main/p8).