# Lab | SQL Self and cross join

In this lab, you will be using the [Sakila](https://dev.mysql.com/doc/sakila/en/) database of movie rentals.

### Instructions

1. Get all pairs of actors that worked together.
2. Get all pairs of customers that have rented the same film more than 3 times.
3. Get all possible pairs of actors and films.

![DB schema](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/database-sakila-schema.png)

### **SOLUTIONS**

In [1]:
import pymysql
from sqlalchemy import create_engine
import pandas as pd
import getpass  # To get the password without showing the input

In [3]:
password = getpass.getpass()
connection_string = 'mysql+pymysql://root:' + password + '@localhost/sakila'
engine = create_engine(connection_string)
%load_ext sql
%sql {connection_string}

 ······


The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: root@sakila'

In [11]:
%%sql
-- 1 Get all pairs of actors that worked together

select fa1.film_id, concat(a1.first_name, ' ', a1.last_name) actor_1,
                    concat(a2.first_name, ' ', a2.last_name) actor_2
from sakila.actor a1
inner join film_actor fa1 on a1.actor_id = fa1.actor_id
inner join film_actor fa2 on (fa1.film_id = fa2.film_id) and (fa1.actor_id != fa2.actor_id)
inner join actor a2 on a2.actor_id = fa2.actor_id
order by fa1.film_id
limit 20;

   mysql+pymysql://root:***@localhost/bank
 * mysql+pymysql://root:***@localhost/sakila
20 rows affected.


film_id,actor_1,actor_2
1,PENELOPE GUINESS,CHRISTIAN GABLE
1,PENELOPE GUINESS,LUCILLE TRACY
1,PENELOPE GUINESS,SANDRA PECK
1,PENELOPE GUINESS,JOHNNY CAGE
1,PENELOPE GUINESS,MENA TEMPLE
1,PENELOPE GUINESS,WARREN NOLTE
1,PENELOPE GUINESS,OPRAH KILMER
1,PENELOPE GUINESS,ROCK DUKAKIS
1,PENELOPE GUINESS,MARY KEITEL
1,CHRISTIAN GABLE,PENELOPE GUINESS


In [13]:
%%sql
-- 2 Get all pairs of customers that have rented the same film more than 3 times.

select c1.customer_id, c2.customer_id, count(*) as num_films
from sakila.customer c1
inner join rental r1 on r1.customer_id = c1.customer_id
inner join inventory i1 on r1.inventory_id = i1.inventory_id
inner join film f1 on i1.film_id = f1.film_id
inner join inventory i2 on i2.film_id = f1.film_id
inner join rental r2 on r2.inventory_id = i2.inventory_id
inner join customer c2 on r2.customer_id = c2.customer_id
where c1.customer_id <> c2.customer_id
group by c1.customer_id, c2.customer_id
having count(*) > 3
order by num_films desc
limit 20;

   mysql+pymysql://root:***@localhost/bank
 * mysql+pymysql://root:***@localhost/sakila
20 rows affected.


customer_id,customer_id_1,num_films
111,24,8
24,111,8
181,7,8
7,181,8
237,376,8
376,237,8
596,181,8
181,596,8
317,201,8
201,317,8


In [14]:
%%sql
-- 3 Get all possible pairs of actors and films.

select concat(a.first_name,' ', a.last_name) as actor_name, f.title
from sakila.actor a
cross join sakila.film as f
limit 10;

   mysql+pymysql://root:***@localhost/bank
 * mysql+pymysql://root:***@localhost/sakila
10 rows affected.


actor_name,title
THORA TEMPLE,ACADEMY DINOSAUR
JULIA FAWCETT,ACADEMY DINOSAUR
MARY KEITEL,ACADEMY DINOSAUR
REESE WEST,ACADEMY DINOSAUR
BELA WALKEN,ACADEMY DINOSAUR
JAYNE SILVERSTONE,ACADEMY DINOSAUR
MERYL ALLEN,ACADEMY DINOSAUR
BURT TEMPLE,ACADEMY DINOSAUR
JOHN SUVARI,ACADEMY DINOSAUR
GREGORY GOODING,ACADEMY DINOSAUR
