# Queries about Movies

For each of the questions in this notebook, you should create an answer using both relational algebra operators as well as sql.

This notebook will be autograded.  You can tell whether your answer is correct or not if you run all of the cells in order.  Running the cells out of order will lead to unpredictable results.



In [38]:
import warnings
warnings.filterwarnings('ignore')

from reframe import Relation
from sols import *

In [39]:
cast = Relation('/home/faculty/millbr02/pub/cast.csv',sep=',')
title = Relation('/home/faculty/millbr02/pub/titles.csv',sep=',')
release_date = Relation('/home/faculty/millbr02/pub/release_dates.csv',sep=',')

In [40]:
cast.query("type == 'actor' & year == 2015").njoin(release_date.query("country == 'Norway'")).project(['title','name']).sort(['name'])

Unnamed: 0,title,name
1511,Steve Jobs,$hutter Boy
378,Straight Outta Compton,$hutter Boy
150,Spy,50 Cent
0,Southpaw,50 Cent
6599,Joy,?dgar (II) Ram?rez
5988,Point Break,?dgar (II) Ram?rez
866,The Last Witch Hunter,?lafur Darri ?lafsson
2346,The Gunman,?scar Foronda
7001,Stup,?ystein Martinsen
362,Straight Outta Compton,A. Russell Andrews


In [41]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [42]:
%config SqlMagic.autopandas = True

In [43]:
%sql postgresql://vannjo02:@localhost/movies

'Connected: vannjo02@movies'

In [44]:
%%sql

select title, name
from moviecast natural join release_date
where type = 'actor' and year = 2015 and country = 'Norway'
order by name
limit 10

10 rows affected.


Unnamed: 0,title,name
0,Spy,50 Cent
1,Southpaw,50 Cent
2,Pitch Perfect 2,Aakomon Jones
3,Pan,Aaran Mitra
4,Insurgent,Aaron Brewstar
5,Max,Aaron Dozzi
6,Ted 2,Aaron F. Randell
7,Avengers: Age of Ultron,Aaron Himelstein
8,The Night Before (II),Aaron (II) Hill
9,Star Wars: Episode VII - The Force Awakens,Aaron (IV) Kennedy


# Questions Start Here

### 1. Display the title and the name where the lead actor name and character name are the same for movies in 2007

In [45]:
cast.query("name == character & year == 2007 & n == 1").project(['title','name'])

Unnamed: 0,title,name
108240,Maharadhi,Balakrishna
238307,Good Luck with That,Igor Breakenback
301632,My Name Is Bruce,Bruce Campbell
466377,"Yi daegeun, Yi daikeun",Lee Dae-Geun
1591633,The Last Lecture by Randy Pausch,Randy Pausch
1756349,The Minis,Dennis Rodman


In [46]:

assert _.equals(mov1t1(cast))



In [47]:
%%sql
select title, name from moviecast where (name = character and n = 1) and year = 2007;

6 rows affected.


Unnamed: 0,title,name
0,Maharadhi,Balakrishna
1,Good Luck with That,Igor Breakenback
2,My Name Is Bruce,Bruce Campbell
3,"Yi daegeun, Yi daikeun",Lee Dae-Geun
4,The Last Lecture by Randy Pausch,Randy Pausch
5,The Minis,Dennis Rodman


In [48]:

assert _.equals(mov1t2())

### 2. What are the ten most common movie character names of all time from most common to least?

In [52]:
cast.groupby(['character']).count('year').sort(['count_year'],ascending=False).project(['character']).head(10)

Unnamed: 0,character
561267,Himself
292790,Dancer
418072,Extra
1123701,Reporter
336853,Doctor
1065528,Policeman
1265204,Student
987567,Nurse
114639,Bartender
1026794,Party Guest


In [53]:
assert _.equals(mov2t1(cast))

In [55]:
%%sql

select character, count(year) from moviecast group by character order by count desc limit(10);

10 rows affected.


Unnamed: 0,character,count
0,Himself,18883
1,Dancer,11266
2,Extra,9291
3,Reporter,7708
4,Doctor,6941
5,Policeman,6558
6,Student,6529
7,Nurse,6252
8,Bartender,6241
9,Party Guest,6130


In [56]:
assert _.equals(mov2t2())

### 3. List the name of each actress that has played the role of Stella more than one time.


In [57]:
cast.query("character == 'Stella' & type == 'actress'").groupby(['name']).count('title').query("count_title > 1").project(['name'])

Unnamed: 0,name
2,Abi Burgess
20,Amy Gross
121,Elise Vargas
324,Martina Tremante
327,Mary Wickes
366,Ophelia Shtruhl
383,Perla Liberatori
427,Sandra Loncaric
440,Sina Tkotsch
473,Suzanne Gonzales


In [58]:
assert _.equals(mov3t1(cast))

In [60]:
%%sql

select name, count(title) from moviecast where character = 'Stella' and type = 'actress' group by name having count(title) > 1;

11 rows affected.


Unnamed: 0,name,count
0,Suzanne Gonzales,2
1,Ophelia Shtruhl,2
2,Perla Liberatori,3
3,Sina Tkotsch,2
4,Martina Tremante,2
5,Sandra Loncaric,2
6,Mary Wickes,3
7,Zoey Vargas,2
8,Elise Vargas,2
9,Abi Burgess,2


In [61]:
assert _.equals(mov3t2())

### 4. Display all information about the entire cast, in "n"-order, of the 2007 version of "Sleuth".

In [62]:
cast.query("title == 'Sleuth' & year == 2007").sort(['n'])

Unnamed: 0,title,year,name,type,character,n
293979,Sleuth,2007,Michael Caine,actor,Andrew,1.0
1168226,Sleuth,2007,Jude Law,actor,Milo,2.0
1631579,Sleuth,2007,Harold Pinter,actor,Man on T.V.,3.0
233198,Sleuth,2007,Kenneth Branagh,actor,Other Man on T.V.,
336873,Sleuth,2007,Alec (II) Cawthorne,actor,Inspector Doppler,
2452309,Sleuth,2007,Eve (II) Channing,actress,Marguerite Wyke,
3015769,Sleuth,2007,Carmel O'Sullivan,actress,Maggie,


In [63]:
assert _.equals(mov4t1(cast))

In [64]:
%%sql

select * from moviecast where (title = 'Sleuth' and year = 2007) order by n;

7 rows affected.


Unnamed: 0,index,title,year,name,type,character,n
0,293979,Sleuth,2007,Michael Caine,actor,Andrew,1.0
1,1168226,Sleuth,2007,Jude Law,actor,Milo,2.0
2,1631579,Sleuth,2007,Harold Pinter,actor,Man on T.V.,3.0
3,336873,Sleuth,2007,Alec (II) Cawthorne,actor,Inspector Doppler,
4,233198,Sleuth,2007,Kenneth Branagh,actor,Other Man on T.V.,
5,2452309,Sleuth,2007,Eve (II) Channing,actress,Marguerite Wyke,
6,3015769,Sleuth,2007,Carmel O'Sullivan,actress,Maggie,


In [65]:
assert _.equals(mov4t2())

### 5. display the number of roles available to actresses for each year starting with 2000 up to 2015 order by year

In [66]:
cast.query("type == 'actress' & year >= 2000 & year <= 2015").groupby(['year']).count('name').project(['year','count_name'])

Unnamed: 0,year,count_name
0,2000,17227
1,2001,17610
2,2002,18633
3,2003,19236
4,2004,22013
5,2005,25682
6,2006,29294
7,2007,31190
8,2008,34870
9,2009,41294


In [67]:
assert _.equals(mov5t1(cast))

In [69]:
%%sql

select year, count(name) from moviecast where (type = 'actress' and year >= 2000 and year <= 2015) group by year order by year;

16 rows affected.


Unnamed: 0,year,count
0,2000,17227
1,2001,17611
2,2002,18633
3,2003,19236
4,2004,22013
5,2005,25682
6,2006,29294
7,2007,31190
8,2008,34870
9,2009,41294


In [70]:
assert _.equals(mov5t2())