# Order Based Window Functions Lab

### Introduction

In this lesson, we'll learn practice working with order based functions.  Let's get started.

### Loading our Data

Once again, we'll be working with our IMDB movie data.  Let's load it up.

In [1]:
import pandas as pd
url = "https://raw.githubusercontent.com/data-eng-10-21/window-functions/main/data/"
movies_df = pd.read_csv(f'{url}/movies.csv')
names_df = pd.read_csv(f'{url}/names.csv')
ratings_df = pd.read_csv(f'{url}/ratings.csv')
title_principals_df = pd.read_csv(f'{url}/title_principals.csv')
names_df = pd.read_csv(f'{url}/names.csv')

  exec(code_obj, self.user_global_ns, self.user_ns)


In [3]:
movies_df = movies_df.drop(columns = ['worlwide_gross_income'])

And then we can take a look at the our movies dataframe.

In [None]:
movies_df[:2]

From here, we can create our database, and populate our tables.

In [3]:
import sqlite3
conn = sqlite3.connect('imdb.db')

In [35]:
movies_df.to_sql('movies', conn,
                 index = False, 
                 if_exists = 'replace')
names_df.to_sql('names', conn,
                index = False,
                if_exists = 'replace')
ratings_df.to_sql('ratings', conn, 
                  index = False,
                  if_exists = 'replace')
title_principals_df.to_sql('movie_roles', conn, 
                           index = False,
                           if_exists = 'replace')

### Finding top earning movies

Let's start by finding the top ranking 10 earning movies in 2015.  Display the title, genre, and income.  And add a column that displays each movie's rank by their respective genre.

> The results look like the following.

<img src="./rank_genre.png" width="80%">

In [42]:
query = """

"""
pd.read_sql(query, conn)

Unnamed: 0,title,genre,income,rank_by_genre
0,Star Wars - Il risveglio della Forza,"Action, Adventure, Sci-Fi",2068224000.0,1
1,Jurassic World,"Action, Adventure, Sci-Fi",1670401000.0,2
2,Fast & Furious 7,"Action, Adventure, Thriller",1515048000.0,1
3,Avengers: Age of Ultron,"Action, Adventure, Sci-Fi",1402809000.0,3
4,Minions,"Animation, Adventure, Comedy",1159443000.0,1
5,Spectre,"Action, Adventure, Thriller",880674600.0,2
6,Inside Out,"Animation, Adventure, Comedy",858071400.0,2
7,Temper,"Action, Drama",740300000.0,1
8,Mission: Impossible - Rogue Nation,"Action, Adventure, Thriller",682716600.0,3
9,Gopala Gopala,"Comedy, Drama, Fantasy",660100000.0,1


Now let's get an overview of how movies performed.  Rank the top earning years for each genre, and order the entire dataset by the top earning genres per year.

In [60]:
query = """


"""

pd.read_sql(query, conn)

Unnamed: 0,year,genre,total_income,ranked_income
0,2018,"Action, Adventure, Sci-Fi",8195401000.0,1
1,2015,"Action, Adventure, Sci-Fi",7595056000.0,2
2,2017,"Action, Adventure, Fantasy",5002522000.0,1
3,2014,"Action, Adventure, Sci-Fi",4925353000.0,3
4,2016,"Animation, Adventure, Comedy",4895481000.0,1
5,2016,"Action, Adventure, Sci-Fi",4710719000.0,4
6,2013,"Animation, Adventure, Comedy",4322058000.0,2
7,2013,"Action, Adventure, Sci-Fi",4185565000.0,5
8,2012,"Action, Adventure, Sci-Fi",3756771000.0,6
9,2011,"Animation, Adventure, Comedy",3616207000.0,3


So it looks like `Action, Adventure and Sci-Fi` and `Animation, Adventure, Comedy` dominate our list.  Now let's calculate a running total of earnings for movies in `Action, Adventure, and Sci-Fi` for 2018 limiting to the top 10 movies, ordering by highest earnings to least.

In [65]:
query = """


"""

pd.read_sql(query, conn)

Unnamed: 0,title,year,genre,running_income
0,Avengers: Infinity War,2018,"Action, Adventure, Sci-Fi",2048360000.0
1,Black Panther,2018,"Action, Adventure, Sci-Fi",3395641000.0
2,Jurassic World - Il regno distrutto,2018,"Action, Adventure, Sci-Fi",4727599000.0
3,Venom,2018,"Action, Adventure, Sci-Fi",5583684000.0
4,Ready Player One,2018,"Action, Adventure, Sci-Fi",6166578000.0
5,Bumblebee,2018,"Action, Adventure, Sci-Fi",6634567000.0
6,Rampage: Furia animale,2018,"Action, Adventure, Sci-Fi",7062595000.0
7,Solo: A Star Wars Story,2018,"Action, Adventure, Sci-Fi",7455520000.0
8,Pacific Rim: La rivolta,2018,"Action, Adventure, Sci-Fi",7746450000.0
9,Maze Runner - La rivelazione,2018,"Action, Adventure, Sci-Fi",8034626000.0


> <img src="./action-sci-fi.png" width="60%">

### Window Functions with Join

Now let's move on to using window functions to answer questions about various actors.

To start, let's take a look at the `names` which contains our actors, writers and directors. 

In [6]:
query = """SELECT * FROM names LIMIT 1"""
pd.read_sql(query, conn)

Unnamed: 0,imdb_name_id,name,height,place_of_birth,children,birth_year
0,nm0000001,Fred Astaire,177.0,"Omaha, Nebraska, USA",2,1899


And the `movie_roles` table which joins our individuals to the corresponding movie.

In [66]:
import pandas as pd
query = """SELECT * FROM movie_roles LIMIT 1"""
pd.read_sql(query, conn)

Unnamed: 0,imdb_title_id,imdb_name_id,category
0,tt0000009,nm0063086,actress


Let's start by finding the directors whose movies earn the highest average amount across all movies between 2000 and 2010, and that had more than 3 `movie_roles` in that time.  Limit the results to the top ten.

> The answer will look like the following.

<img src="./movie_directors.png" width="40%">

In [80]:
query = """


"""
pd.read_sql(query, conn)

Unnamed: 0,name,avg_income,num_roles
0,Peter Jackson,727483500.0,5
1,Andrew Adamson,645302300.0,4
2,Carlos Saldanha,549887500.0,4
3,Sam Raimi,528879000.0,5
4,Gore Verbinski,516293400.0,6
5,Michael Bay,486304600.0,5
6,Roland Emmerich,457234000.0,4
7,Chris Columbus,431109000.0,5
8,Christopher Nolan,418716800.0,6
9,Steven Spielberg,384431900.0,7


Now let's take a look at the revenue earned in movies by `Peter Jackson`, `Michael Bay`, `Christopher Nolan` or `Stephen Spielberg`.  Organize the entire set of movies from most to least revenue, and add a column that ranks the revenue earned from most to least. 

> <img src="./directors-rank.png" width="80%">

In [90]:
query = """


"""
pd.read_sql(query, conn)

Unnamed: 0,title,name,income,rank_by_director
0,Il Signore degli Anelli - Il ritorno del re,Peter Jackson,1142271000.0,1
1,Transformers 3,Michael Bay,1123794000.0,1
2,Transformers 4 - L'era dell'estinzione,Michael Bay,1104054000.0,2
3,Il cavaliere oscuro - Il ritorno,Christopher Nolan,1081133000.0,1
4,Lo Hobbit - Un viaggio inaspettato,Peter Jackson,1017004000.0,2
5,Il cavaliere oscuro,Christopher Nolan,1005455000.0,2
6,The Hobbit: The Desolation of Smaug,Peter Jackson,958366900.0,3
7,Lo Hobbit - La battaglia delle cinque armate,Peter Jackson,956019800.0,4
8,Il Signore degli Anelli - Le due torri,Peter Jackson,951227400.0,5
9,Il Signore degli Anelli - La compagnia dell'An...,Peter Jackson,887934300.0,6


Now let's move to actors.  Find the running average revenue Tom Cruise movies have earned between 1992 and 1995 (inclusive), and Brad Pitt movies have earned between 1992 and 1995, ordered by actor, and date published.

> The result is below.

<img src="./pitt_and_cruise.png" width="80%">

In [32]:
query = """


"""
pd.read_sql(query, conn)

Unnamed: 0,name,title,income,year,avg_running_income
0,Brad Pitt,In mezzo scorre il fiume,43440294.0,1992,43440290.0
1,Brad Pitt,Fuga dal mondo dei sogni,14110589.0,1992,28775440.0
2,Brad Pitt,Kalifornia,2395231.0,1993,19982040.0
3,Brad Pitt,A letto con l'amico,3134381.0,1994,15770120.0
4,Brad Pitt,Intervista col vampiro,223664608.0,1994,57349020.0
5,Brad Pitt,Vento di passioni,160638883.0,1994,74564000.0
6,Brad Pitt,Seven,327333559.0,1995,110673900.0
7,Brad Pitt,L'esercito delle 12 scimmie,168839459.0,1995,117944600.0
8,Tom Cruise,Cuori ribelli,137783840.0,1992,137783800.0
9,Tom Cruise,Codice d'onore,243240178.0,1992,190512000.0


### Summary

In this lesson, we saw how we can use window functions to calculate both running totals and for rank based functions.

Take a look at other rank based functions [here](https://mode.com/sql-tutorial/sql-window-functions/).