# Cinematic choices



In [None]:
# Don't change this cell; just run it.
import numpy as np  # The array library.

# The OKpy testing system.
from client.api.notebook import Notebook
ok = Notebook('boolean_arrays.ok')

## The data

We start by loading in some data on movies.

You can find a little more on this dataset from the [dataset
page](https://github.com/matthew-brett/datasets/tree/master/movie_metadata).

It is a table of data about films, including title, year, budget,
[IMDB](https://www.imdb.com/) rating and so on.

The next cell loads in the data, and puts the data into arrays.

We will cover the code to do this soon, but for now, just run this cell.

In [None]:
# Get the library for loading tables
import pandas as pd
# Load the table
mmd = pd.read_csv('movies.csv')
# Restrict ourselves to US movies (budget should be USD)
us_mmd = mmd[mmd['country'] == 'USA']
# Select the top 25 US films by IMDB rating
top_ten = us_mmd.head(25)
# Put the film data into arrays for our use.
titles = np.array(top_ten['title'])
budgets = np.array(top_ten['budget'])
gross_earnings = np.array(top_ten['gross'])

You now have three arrays, each with 25 elements, corresponding to the top 25
movies made in the USA, by IMDB rating.

Here are the movie titles:

In [None]:
titles

The corresponding budgets in dollars:

In [None]:
budgets

The corresponding gross earnings in dollars:

In [None]:
gross_earnings

## Percent profit

As a warm-up, make a new array `pct_profit`, that has one element per film,
and where each element is the result of dividing the film's gross earnings by
its budget, and multiplying by 100.

In [None]:
#- Calculate percent profit
pct_profit = ...
# Show the result.
pct_profit

In [None]:
_ = ok.grade('q_pct_profit')

## Big winners

These are the US films with the highest IMDB score, so we expect them to have
been reasonably successful.  We therefore expect them to have fairly high
values in `pct_profit`.  In the next cell, make a Boolean array `profit_gt_500`
that has True for corresponding films that grossed more than 500% of their
budget, and False otherwise.

In [None]:
profit_gt_500 = ...
# Show the result.
profit_gt_500

In [None]:
_ = ok.grade('q_profit_gt_500')

How many films of these 25 made more than 500% of their budget in gross
earnings?  Put the result in the variable `n_big_earners`

In [None]:
n_big_earners = ...
# Show the result.
n_big_earners

In [None]:
_ = ok.grade('q_n_big_earners')

Use [Boolean indexing](https://uob-ds.github.io/cfd2021/data-frames/boolean_indexing) to find
the film titles for films with gross earnings over 500% of their budget.  Store
these film titles in the array `big_earner_titles`.

In [None]:
big_earner_titles = ...
# Show the result.
big_earner_titles

In [None]:
_ = ok.grade('q_big_earner_titles')

## Relative losers

The world does not just contain winners, even among these highly-rated films.

Use Boolean indexing to find and show the film titles where percent profit was
less than 100.  Put these titles into the variable `relative_loser_titles`.

In [None]:
relative_loser_titles = ...
# Show the result.
relative_loser_titles

In [None]:
_ = ok.grade('q_relative_losers')

You have just discovered you are a big snob for high earners.  You want to tell
the world.  First you make a new copy of the `titles` array to prepare to give
your own version of some titles.

In [None]:
# Run this cell.
# Make a new copy of the titles array using the .copy method.
my_titles = titles.copy()
my_titles

Now express your disapproval of the lower-earning films by replacing their
titles in `my_titles` with the new title "Who cares?".  Use Boolean indexing to
do the assignment.

In [None]:
my_titles...
# Show the result.
my_titles

In [None]:
_ = ok.grade('q_my_titles')

## Done.

Congratulations, you're done with the assignment!  Be sure to:

- **run all the tests** (the next cell has a shortcut for that).
- **Save and Checkpoint** from the `File` menu.

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [ok.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]