<a href="https://colab.research.google.com/github/veyselberk88/Data-Science-Tools-and-Ecosystem/blob/main/lec07.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="./ccsf.png" alt="CCSF Logo" width=200px style="margin:0px -5px">

# Lecture 07: Building Tables

Associated Textbook Sections: [6.0, 6.1, 6.2](https://ccsf-math-108.github.io/textbook/chapters/06/Tables.html)

---

## Overview

* [Creating Tables](#Creating-Tables)
* [Columns and Rows of Tables](#Columns-and-Rows-of-Tables)
* [Attribute Types](#Attribute-Types)
* [Exploring the Tallest_Buildings](#Exploring-the-Tallest-Buildings)
* [Exploring Movies](#Exploring-Movies)

---

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np

---

## Creating Tables

Here are *some* of the ways that you will create a table in this class:
* `Table.read_table(filename)` - a table from a CSV file
* `Table()` - an empty table
* `select`, `drop`, `where`, `sort`, etc. - a table from existing tables
* `with_column` and `with_columns` - a table from an existing table with additional columns
* `with_row` and `with_rows` - a table from an existing table with additional rows
* `take` and `exclude` - a table formed from a subset of rows from an existing table

---

### Demo: Using `read_table`

As of February 2024, the tallest buildings in the United States (according to Wikipedia) **should** be stored in the file `tallest_buildings.csv`. This file is located in your the same folder as this Jupyter Notebook.

In [None]:
tallest_buildings = Table.read_table('tallest_buildings.csv')
tallest_buildings

Name,Floors,Year,Height (ft),City
One World Trade Center,104,2014,1776,New York City
Central Park Tower,98,2020,1550,New York City
Willis Tower,108,1974,1450,Chicago
111 West 57th Street,84,2021,1428,New York City
One Vanderbilt,93,2020,1401,New York City
432 Park Avenue,85,2015,1396,New York City
Trump International Hotel and Tower,98,2009,1388,Chicago
30 Hudson Yards,103,2019,1270,New York City
Empire State Building,102,1931,1250,New York City
Bank of America Tower,55,2009,1200,New York City


---

### Demo: Tables from Tables

Most (not all) of the table methods create a new table and do not modify the original table.

In [None]:
tallest_buildings.select('Name', 'Height (ft)')

Name,Height (ft)
One World Trade Center,1776
Central Park Tower,1550
Willis Tower,1450
111 West 57th Street,1428
One Vanderbilt,1401
432 Park Avenue,1396
Trump International Hotel and Tower,1388
30 Hudson Yards,1270
Empire State Building,1250
Bank of America Tower,1200


In [None]:
tallest_buildings

Name,Floors,Year,Height (ft),City
One World Trade Center,104,2014,1776,New York City
Central Park Tower,98,2020,1550,New York City
Willis Tower,108,1974,1450,Chicago
111 West 57th Street,84,2021,1428,New York City
One Vanderbilt,93,2020,1401,New York City
432 Park Avenue,85,2015,1396,New York City
Trump International Hotel and Tower,98,2009,1388,Chicago
30 Hudson Yards,103,2019,1270,New York City
Empire State Building,102,1931,1250,New York City
Bank of America Tower,55,2009,1200,New York City


In [None]:
name_height = tallest_buildings.select('Name', 'Height (ft)')
name_height

Name,Height (ft)
One World Trade Center,1776
Central Park Tower,1550
Willis Tower,1450
111 West 57th Street,1428
One Vanderbilt,1401
432 Park Avenue,1396
Trump International Hotel and Tower,1388
30 Hudson Yards,1270
Empire State Building,1250
Bank of America Tower,1200


In [None]:
buildings_above_1500 = tallest_buildings.where('Height (ft)', are.above_or_equal_to(1500))
buildings_above_1500

Name,Floors,Year,Height (ft),City
One World Trade Center,104,2014,1776,New York City
Central Park Tower,98,2020,1550,New York City


---

## Columns and Rows of Tables

* Columns:
    * Labeled NumPy arrays
    * Column labels are strings
    * All column values have the same data type
    * `t.select(column_labels_or_indexes)` - creates a table with the specified columns of table `t`.
    * `t.column(column_label_or_index)` - creates an array with the specified column information
* Rows:
    * `Row` data type (... kind of like a `list` where the items have labels)
    * `t.take(row_indexes)` - creates a table with the specified rows
    * `t.row(row_indexes)` - creates a row object with the specified rows

---

### Demo: Rows and Columns

Compare the `row` and `take` table methods.

In [None]:
a=tallest_buildings.take(0)
a

Name,Floors,Year,Height (ft),City
One World Trade Center,104,2014,1776,New York City


In [None]:
b=tallest_buildings.row(0)
b

Row(Name='One World Trade Center', Floors=104, Year=2014, Height (ft)=1776, City='New York City')

In [None]:
type(a) , type(b)

(datascience.tables.Table, datascience.tables.Row)

In [None]:
tallest_buildings.row(0).item(0)

'One World Trade Center'

In [None]:
tallest_buildings.row(0).item(0), tallest_buildings.row(0).item(3)

('One World Trade Center', 1776)

---

Demonstrate how to use an array to use `take` with more than 1 row.

In [None]:
tallest_buildings.take(np.arange(5))

Name,Floors,Year,Height (ft),City
One World Trade Center,104,2014,1776,New York City
Central Park Tower,98,2020,1550,New York City
Willis Tower,108,1974,1450,Chicago
111 West 57th Street,84,2021,1428,New York City
One Vanderbilt,93,2020,1401,New York City


In [None]:
tallest_buildings.take([0,1,2,3,4]) #prints same value as above example

Name,Floors,Year,Height (ft),City
One World Trade Center,104,2014,1776,New York City
Central Park Tower,98,2020,1550,New York City
Willis Tower,108,1974,1450,Chicago
111 West 57th Street,84,2021,1428,New York City
One Vanderbilt,93,2020,1401,New York City


---

Compare the `select` and `column` table methods.

In [None]:
tallest_buildings.select('Name')

Name
One World Trade Center
Central Park Tower
Willis Tower
111 West 57th Street
One Vanderbilt
432 Park Avenue
Trump International Hotel and Tower
30 Hudson Yards
Empire State Building
Bank of America Tower


In [None]:
tallest_buildings.column('Name')

array(['One World Trade Center', 'Central Park Tower', 'Willis Tower',
       '111 West 57th Street', 'One Vanderbilt', '432 Park Avenue',
       'Trump International Hotel and Tower', '30 Hudson Yards',
       'Empire State Building', 'Bank of America Tower',
       'St. Regis Chicago', 'Aon Center', '875 North Michigan Avenue',
       'Comcast Technology Center', 'Wilshire Grand Center',
       '3 World Trade Center', 'The Brooklyn Tower*', '53W53',
       'Chrysler Building', 'The New York Times Building', 'The Spiral',
       'Bank of America Plaza', 'U.S. Bank Tower', 'Franklin Center',
       'One57', 'JPMorgan Chase Tower', '35 Hudson Yards',
       'Two Prudential Plaza', '1 Manhattan West', 'Wells Fargo Plaza',
       '50 Hudson Yards', '4 World Trade Center', 'One Chicago East Tower',
       'Comcast Center', '311 South Wacker Drive',
       '220 Central Park South', '70 Pine Street', 'Key Tower',
       'One Liberty Place', '2 Manhattan West',
       'Four Seasons Hotel New 

---

### Demo: Creating a Table from Scratch

Create a table containing information on the major east-west streets north of the Ocean campus and how far they are from campus.

[Google Maps near CCSF - Ocean Campus](https://goo.gl/maps/QVR57VvqKWqLeSA9A)

In [None]:
from IPython.display import IFrame
IFrame('https://www.google.com/maps/embed?pb=!1m10!1m8!1m3!1d6311.10985715617\
7!2d-122.4451173!3d37.7301236!3m2!1i1024!2i768!4f13.1!5e0!3m2!1sen!2sus!4v1675\
197446609!5m2!1sen!2sus', 600, 450)

In [None]:
streets = make_array('Judson', 'Staples', 'Flood', 'Hearst')
streets

array(['Judson', 'Staples', 'Flood', 'Hearst'],
      dtype='<U7')

In [None]:
Table()

In [None]:
northside = Table()
northside

In [None]:
type(northside)

datascience.tables.Table

In [None]:
northside = Table().with_column('Streets',streets)
northside

Streets
Judson
Staples
Flood
Hearst


In [None]:
northside.with_column('Blocks from campus', np.arange(4))

Streets,Blocks from campus
Judson,0
Staples,1
Flood,2
Hearst,3


In [None]:
northside

Streets
Judson
Staples
Flood
Hearst


In [None]:
northside = northside.with_column('Blocks from campus', np.arange(4))
northside

Streets,Blocks from campus
Judson,0
Staples,1
Flood,2
Hearst,3


---

Update `northside` to include Monterey by adding it using `with_row`.

In [None]:
monterey_data = ['Monterey',4]
monterey_data

['Monterey', 4]

In [None]:
northside = northside.with_row(monterey_data)
northside

Streets,Blocks from campus
Judson,0
Staples,1
Flood,2
Hearst,3
Monterey,4


---

Add multiple columns to a table using .with_columns

In [None]:
streets = make_array('Judson', 'Staples', 'Flood', 'Hearst', 'Monterey')
blocks = np.arange(5)
northside_again =Table().with_columns(
    'Streets', streets,
    'Blocks from campus', blocks
)
northside_again

Streets,Blocks from campus
Judson,0
Staples,1
Flood,2
Hearst,3
Monterey,4


---

## Attribute Types


---

### Types of Attributes

All values in a column of a table should be both the same type and be comparable to each other in some way
* **Numerical** --- Each value is from a numerical scale
    * Numerical measurements are ordered
    * Differences are meaningful
* **Categorical** --- Each value is from a fixed inventory
    * May or may not have an ordering
    * Categories are the same or different


---

### “Numerical” Attributes

Sometimes numbers represent categorical data:
* 94112 and 94110 are San Francisco ZIP codes
* Subtracting 94112 and 94110 doesn't yield a meaningful value
* ZIP codes are categorical, even though numbers were used for the categories

---

## Exploring the Tallest Buildings

---

### Tallest Buildings Attributes

In [None]:
tallest_buildings

Name,Floors,Year,Height (ft),City
One World Trade Center,104,2014,1776,New York City
Central Park Tower,98,2020,1550,New York City
Willis Tower,108,1974,1450,Chicago
111 West 57th Street,84,2021,1428,New York City
One Vanderbilt,93,2020,1401,New York City
432 Park Avenue,85,2015,1396,New York City
Trump International Hotel and Tower,98,2009,1388,Chicago
30 Hudson Yards,103,2019,1270,New York City
Empire State Building,102,1931,1250,New York City
Bank of America Tower,55,2009,1200,New York City


* `'Name'` and `'City'` are categorical attributes
* The other attributes are numerical

---

### Summarizing Height

Remember that `select` and `column` produce different data types.

In [None]:
tallest_buildings.select('Height (ft)')

Height (ft)
1776
1550
1450
1428
1401
1396
1388
1270
1250
1200


In [None]:
tallest_buildings.column('Height (ft)')

array([1776, 1550, 1450, 1428, 1401, 1396, 1388, 1270, 1250, 1200, 1198,
       1136, 1128, 1121, 1100, 1079, 1066, 1050, 1046, 1046, 1043, 1023,
       1018, 1007, 1004, 1002, 1000,  994,  994,  991,  981,  978,  974,
        971,  961,  952,  952,  948,  945,  935,  935,  932,  928,  922,
        915,  912,  912,  901,  900,  897,  896,  886,  878,  875,  871,
        871,  870,  869,  868,  861,  859,  859,  858,  853,  850,  850,
        848,  847,  847,  847,  844,  844,  844,  841,  835,  821,  820,
        818,  817,  817,  814,  813,  809,  808,  806,  805,  802])

---

### Average

* The average is one way to summarize numerical data like the heights of buildings.
* Use `np.average` to calculate the average of an array of values.
* Another name that we will use for average is `mean`.
* `np.mean` calculates the mean of an array of values.
* For our class, we will use average and mean interchangeably.
* The `np.average` function also calculates weighted averages.

---

#### Demo: Average

Calculate the average height of the buildings in the data set.

In [None]:
heights = tallest_buildings.column('Height (ft)')
heights

array([1776, 1550, 1450, 1428, 1401, 1396, 1388, 1270, 1250, 1200, 1198,
       1136, 1128, 1121, 1100, 1079, 1066, 1050, 1046, 1046, 1043, 1023,
       1018, 1007, 1004, 1002, 1000,  994,  994,  991,  981,  978,  974,
        971,  961,  952,  952,  948,  945,  935,  935,  932,  928,  922,
        915,  912,  912,  901,  900,  897,  896,  886,  878,  875,  871,
        871,  870,  869,  868,  861,  859,  859,  858,  853,  850,  850,
        848,  847,  847,  847,  844,  844,  844,  841,  835,  821,  820,
        818,  817,  817,  814,  813,  809,  808,  806,  805,  802])

In [None]:
average_height = np.average(heights)
average_height

978.47126436781605

---

### Median

* The median can also be used to summarize numerical data like the heights of buildings
* The median is:
    * The middle value of an odd number of sorted data points
    * The average of the two middle values of an even number of sorted data points
* Use `np.median` to calculate the median of an array of numerical values.

---

#### Demo: Median

Calculate the median height of the buildings in the table.

In [None]:
median_height = np.median(heights)
median_height

922.0

---

### The Salesforce Tower

The Salesforce Tower in San Francisco is one of the tallest buildings in the country. See if it is in the `tallest_buildings` table.

In [None]:
tallest_buildings.where('City','San Francisco')

Name,Floors,Year,Height (ft),City
Transamerica Pyramid,48,1972,853,San Francisco
181 Fremont,66,2018,802,San Francisco


---

There must have been a mistake on creating the `tallest_buildings.csv` data file, because the Salesforce Tower is on [Wikipedia's page](https://en.wikipedia.org/wiki/List_of_tallest_buildings_in_the_United_States), but not in the table. Create a new table that includes all the information in `tallest_buildings` and the Salesforce Tower information.

In [None]:
tallest_buildings_with_SFT = tallest_buildings.with_row(
    ['Salesforce Tower', 61, 2018, 1070, 'San Francisco']
)
tallest_buildings_with_SFT.where('City', 'San Francisco')

Name,Floors,Year,Height (ft),City
Transamerica Pyramid,48,1972,853,San Francisco
181 Fremont,66,2018,802,San Francisco
Salesforce Tower,61,2018,1070,San Francisco


---

## Exploring Movies

---

### Loading the Data

Explore the `top_movies.csv` data set that we generated from [www.the-numbers.com/market](https://www.the-numbers.com/market/). Try to implement several of the methods and attributes seen so far.

In [None]:
movies = Table.read_table('top_movies.csv')
movies

Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold
1995,Batman Forever,Super Hero,Live Action,Based on Comic/Graphic Novel,Action,PG-13,Warner Bros.,184031112,445482201,42306002
1996,Independence Day,Science Fiction,Live Action,Original Screenplay,Adventure,PG-13,20th Century Fox,306169255,729403223,69269062
1997,Men in Black,Science Fiction,Live Action,Based on Comic/Graphic Novel,Action,PG-13,Sony Pictures,250650052,575020703,54607854
1998,Titanic,Historical Fiction,Live Action,Original Screenplay,Drama,PG-13,Paramount Pictures,488192879,1096091898,104092298
1999,Star Wars Ep. I: The Phantom Menace,Science Fiction,Animation/Live Action,Original Screenplay,Adventure,PG,20th Century Fox,430443350,892237879,84732942
2000,How the Grinch Stole Christmas,Kids Fiction,Live Action,Based on Fiction Book/Short Story,Adventure,PG,Universal,254257385,496721750,47172056
2001,Harry Potter and the Sorcerer’s Stone,Fantasy,Animation/Live Action,Based on Fiction Book/Short Story,Adventure,PG,Warner Bros.,300404434,558879624,53074988
2002,Spider-Man,Super Hero,Live Action,Based on Comic/Graphic Novel,Adventure,PG-13,Sony Pictures,403706375,731674375,69484746
2003,Finding Nemo,Kids Fiction,Digital Animation,Original Screenplay,Adventure,G,Walt Disney,339714367,593232548,56337374
2004,Shrek 2,Kids Fiction,Digital Animation,Based on Fiction Book/Short Story,Adventure,PG,Dreamworks SKG,441226247,748166240,71050925


---

### Data Attributes

* `'Year'`, `'Total for Year ($)'`, `'Total in 2022 dollars'`, and `'Tickets Sold'` are numerical attributes
* The other attributes are categorical

---

### Demo: Explore Movies

Calculate the average ticket price and add that information to the table.

In [None]:
total_gross =movies.column('Total for Year ($)')
number_of_tickets_sold = movies.column('Tickets Sold')
ave_ticket_price = total_gross / number_of_tickets_sold
ave_ticket_price

array([  4.35000008,   4.42000001,   4.59000004,   4.69000001,
         5.08000005,   5.39000007,   5.66000004,   5.81000001,
         6.03000003,   6.21000004,   6.41000011,   6.55000002,
         6.88000003,   7.17999995,   7.5       ,   7.89000002,
         7.93      ,   7.96000002,   8.13000008,   8.17000009,
         8.43000003,   8.65      ,   8.97000009,   9.11000005,
         9.15999998,   9.41000011,  10.40000006,  10.53000008,
        10.53000005,  10.53000107])

In [None]:
total_gross[:5]

array([184031112, 306169255, 250650052, 488192879, 430443350])

In [None]:
movies = movies.with_column('Average Ticket Price',ave_ticket_price)
movies

Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold,Average Ticket Price
1995,Batman Forever,Super Hero,Live Action,Based on Comic/Graphic Novel,Action,PG-13,Warner Bros.,184031112,445482201,42306002,4.35
1996,Independence Day,Science Fiction,Live Action,Original Screenplay,Adventure,PG-13,20th Century Fox,306169255,729403223,69269062,4.42
1997,Men in Black,Science Fiction,Live Action,Based on Comic/Graphic Novel,Action,PG-13,Sony Pictures,250650052,575020703,54607854,4.59
1998,Titanic,Historical Fiction,Live Action,Original Screenplay,Drama,PG-13,Paramount Pictures,488192879,1096091898,104092298,4.69
1999,Star Wars Ep. I: The Phantom Menace,Science Fiction,Animation/Live Action,Original Screenplay,Adventure,PG,20th Century Fox,430443350,892237879,84732942,5.08
2000,How the Grinch Stole Christmas,Kids Fiction,Live Action,Based on Fiction Book/Short Story,Adventure,PG,Universal,254257385,496721750,47172056,5.39
2001,Harry Potter and the Sorcerer’s Stone,Fantasy,Animation/Live Action,Based on Fiction Book/Short Story,Adventure,PG,Warner Bros.,300404434,558879624,53074988,5.66
2002,Spider-Man,Super Hero,Live Action,Based on Comic/Graphic Novel,Adventure,PG-13,Sony Pictures,403706375,731674375,69484746,5.81
2003,Finding Nemo,Kids Fiction,Digital Animation,Original Screenplay,Adventure,G,Walt Disney,339714367,593232548,56337374,6.03
2004,Shrek 2,Kids Fiction,Digital Animation,Based on Fiction Book/Short Story,Adventure,PG,Dreamworks SKG,441226247,748166240,71050925,6.21


---

Practice filtering the data using the `where` and `take` table methods.

In [None]:
movies.where('Year', are.between(2000,2005))

Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold,Average Ticket Price
2000,How the Grinch Stole Christmas,Kids Fiction,Live Action,Based on Fiction Book/Short Story,Adventure,PG,Universal,254257385,496721750,47172056,5.39
2001,Harry Potter and the Sorcerer’s Stone,Fantasy,Animation/Live Action,Based on Fiction Book/Short Story,Adventure,PG,Warner Bros.,300404434,558879624,53074988,5.66
2002,Spider-Man,Super Hero,Live Action,Based on Comic/Graphic Novel,Adventure,PG-13,Sony Pictures,403706375,731674375,69484746,5.81
2003,Finding Nemo,Kids Fiction,Digital Animation,Original Screenplay,Adventure,G,Walt Disney,339714367,593232548,56337374,6.03
2004,Shrek 2,Kids Fiction,Digital Animation,Based on Fiction Book/Short Story,Adventure,PG,Dreamworks SKG,441226247,748166240,71050925,6.21


In [None]:
movies.where('Year', are.equal_to(2007))

Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold,Average Ticket Price
2007,Spider-Man 3,Super Hero,Live Action,Based on Comic/Graphic Novel,Adventure,PG-13,Sony Pictures,336530303,515067453,48914288,6.88


In [None]:
movies.where('Movie', are.containing('Harry'))

Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold,Average Ticket Price
2001,Harry Potter and the Sorcerer’s Stone,Fantasy,Animation/Live Action,Based on Fiction Book/Short Story,Adventure,PG,Warner Bros.,300404434,558879624,53074988,5.66
2011,Harry Potter and the Deathly Hallows: Part II,Fantasy,Animation/Live Action,Based on Fiction Book/Short Story,Adventure,PG-13,Warner Bros.,381011219,505932930,48046812,7.93


In [None]:
movies.where('Movie', are.containing('the'))


Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold,Average Ticket Price
2000,How the Grinch Stole Christmas,Kids Fiction,Live Action,Based on Fiction Book/Short Story,Adventure,PG,Universal,254257385,496721750,47172056,5.39
2001,Harry Potter and the Sorcerer’s Stone,Fantasy,Animation/Live Action,Based on Fiction Book/Short Story,Adventure,PG,Warner Bros.,300404434,558879624,53074988,5.66
2005,Star Wars Ep. III: Revenge of the Sith,Science Fiction,Animation/Live Action,Original Screenplay,Adventure,PG-13,20th Century Fox,380270577,624687848,59324582,6.41
2006,Pirates of the Caribbean: Dead Man’s Chest,Historical Fiction,Live Action,Based on Theme Park Ride,Adventure,PG-13,Walt Disney,423315812,680536715,64628368,6.55
2009,Transformers: Revenge of the Fallen,Science Fiction,Animation/Live Action,Based on TV,Action,PG-13,Paramount Pictures,402111870,564565065,53614916,7.5
2011,Harry Potter and the Deathly Hallows: Part II,Fantasy,Animation/Live Action,Based on Fiction Book/Short Story,Adventure,PG-13,Warner Bros.,381011219,505932930,48046812,7.93
2014,Guardians of the Galaxy,Super Hero,Animation/Live Action,Based on Comic/Graphic Novel,Action,PG-13,Walt Disney,333055258,429262158,40765637,8.17
2018,Black Panther,Super Hero,Live Action,Based on Comic/Graphic Novel,Action,PG-13,Walt Disney,700059566,809179714,76845177,9.11


In [None]:
movies.where('Movie', are.containing('The'))

Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold,Average Ticket Price
1999,Star Wars Ep. I: The Phantom Menace,Science Fiction,Animation/Live Action,Original Screenplay,Adventure,PG,20th Century Fox,430443350,892237879,84732942,5.08
2008,The Dark Knight,Super Hero,Live Action,Based on Comic/Graphic Novel,Action,PG-13,Warner Bros.,531001578,778753016,73955652,7.18
2012,The Avengers,Super Hero,Animation/Live Action,Based on Comic/Graphic Novel,Action,PG-13,Walt Disney,623357910,824617936,78311295,7.96
2015,Star Wars Ep. VII: The Force Awakens,Science Fiction,Animation/Live Action,Original Screenplay,Adventure,PG-13,Walt Disney,742208942,927100845,88043765,8.43
2017,Star Wars Ep. VIII: The Last Jedi,Science Fiction,Animation/Live Action,Original Screenplay,Adventure,PG-13,Walt Disney,517218368,607169382,57660910,8.97


In [None]:
sorted_movies = movies.sort('Total in 2022 dollars',True)# True -> descending=True
top_5_movies = sorted_movies.take(np.arange(5))
top_5_movies

Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold
1998,Titanic,Historical Fiction,Live Action,Original Screenplay,Drama,PG-13,Paramount Pictures,488192879,1096091898,104092298
2019,Avengers: Endgame,Super Hero,Animation/Live Action,Based on Comic/Graphic Novel,Action,PG-13,Walt Disney,858373000,986754117,93708843
2015,Star Wars Ep. VII: The Force Awakens,Science Fiction,Animation/Live Action,Original Screenplay,Adventure,PG-13,Walt Disney,742208942,927100845,88043765
1999,Star Wars Ep. I: The Phantom Menace,Science Fiction,Animation/Live Action,Original Screenplay,Adventure,PG,20th Century Fox,430443350,892237879,84732942
2012,The Avengers,Super Hero,Animation/Live Action,Based on Comic/Graphic Novel,Action,PG-13,Walt Disney,623357910,824617936,78311295


---

In [None]:
movies.sort('Average Ticket Price', descending=True).show(5)

Year,Movie,Categorization,Creative Type,Production Method,Genre,MPAA Rating,Distributor,Total for Year ($),Total in 2022 dollars,Tickets Sold,Average Ticket Price
2024,Wonka,Kids Fiction,Live Action,Based on Fiction Book/Short Story,Musical,PG,Warner Bros.,68138426,68138426,6470885,10.53
2022,Top Gun: Maverick,Contemporary Fiction,Live Action,Original Screenplay,Action,PG-13,Paramount Pictures,718732821,718732816,68255728,10.53
2023,Barbie,Contemporary Fiction,Live Action,Based on Toy,Comedy,PG-13,Warner Bros.,636225983,636225980,60420321,10.53
2021,Spider-Man: No Way Home,Super Hero,Animation/Live Action,Based on Comic/Graphic Novel,Action,PG-13,Sony Pictures,572984769,580147075,55094689,10.4
2020,Bad Boys For Life,Contemporary Fiction,Live Action,Original Screenplay,Action,R,Sony Pictures,204417855,228748139,21723470,9.41


## Attribution

This content is licensed under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)</a> and derived from the <a href="https://www.data8.org/">Data 8: The Foundations of Data Science</a> offered by the University of California, Berkeley.

<img src="./by-nc-sa.png" width=100px>