In [1]:
import pymysql
from sqlalchemy import create_engine
import pandas as pd
import getpass  # To get the password without showing the input

In [2]:
password = getpass.getpass()
connection_string = 'mysql+pymysql://root:' + password + '@localhost/bank'
engine = create_engine(connection_string)
%load_ext sql
%sql {connection_string}

 ······


'Connected: root@bank'

### **SELF JOINS**

As the name suggests, a `self join` is a join on the same table; ie. it allows you to join a table to itself. It is useful for querying hierarchical data or comparing rows within the same table

Here in this example we are trying to find the customers that are from the same district.
In the query below, focus on the `<>` operator (*different than*) that is used. This is also an example of _compound_ conditions in the join statement.

In [11]:
%%sql
select a1.account_id, a2.account_id, a1.district_id
from bank.account a1
join bank.account a2
on a1.account_id <> a2.account_id
and a1.district_id = a2.district_id
order by a1.district_id, a1.account_id
limit 3;

 * mysql+pymysql://root:***@localhost/bank
3 rows affected.


account_id,account_id_1,district_id
2,22,1
2,36,1
2,17,1


# 3.03 Activity 1

Keep working on the `bank` database.

Let's find for each account an `owner` and a `disponent`.

**Solution:**

In [29]:
%%sql
select d1.account_id, d1.client_id as disp, d2.client_id as owner from bank.disp d1
join bank.disp d2
on d1.account_id = d2.account_id and d1.type <> d2.type
order by d1.account_id
limit 5;

 * mysql+pymysql://root:***@localhost/bank
5 rows affected.


account_id,disp,owner
2,3,2
2,2,3
3,5,4
3,4,5
8,11,10


In [30]:
%%sql 
select d1.account_id, d1.type as Type1, d2.type as Type2
from bank.disp d1
join bank.disp d2 on d1.account_id = d2.account_id and d1.type <> d2.type
limit 3;

 * mysql+pymysql://root:***@localhost/bank
3 rows affected.


account_id,Type1,Type2
2,DISPONENT,OWNER
2,OWNER,DISPONENT
3,DISPONENT,OWNER


As you see, there are repeated values for each of the account `id`s. Lets try to solve this problem:

- Method 1: Filter with where one of the types

In [37]:
%%sql
select d1.account_id, d1.client_id as disp, d2.client_id as owner from bank.disp as d1
join bank.disp as d2
on d1.account_id = d2.account_id and d1.type <> d2.type
where d1.type = 'DISPONENT'
limit 3;

 * mysql+pymysql://root:***@localhost/bank
3 rows affected.


account_id,disp,owner
2,3,2
3,5,4
8,11,10


- Method 2: Creating a temporary table with the row number and applying a filter that takes only the odd columns

In [36]:
%%sql
drop temporary table if exists combo;

create temporary table combo
select d1.account_id, d1.type as Type1, d2.type as Type2, row_number() over(order by account_id) as RowNumber
from bank.disp d1
join bank.disp d2
on d1.account_id = d2.account_id and d1.type <> d2.type;

select * from combo
where RowNumber % 2 = 1
limit 3;

 * mysql+pymysql://root:***@localhost/bank
0 rows affected.
1738 rows affected.
3 rows affected.


account_id,Type1,Type2,RowNumber
2,DISPONENT,OWNER,1
3,DISPONENT,OWNER,3
8,DISPONENT,OWNER,5


### **CROSS JOINS**

A `cross join` is used when you wish to create a combination of every row from two tables. The main idea of the `cross join` is that it returns the Cartesian product of the joined tables. Each row from one table is connected to every other row in the other table.

Lets say we want to find all the combinations of different card types and ownership of account. We have not talked about sub queries yet. We will cover sub queries in greater detail later.

In [41]:
%%sql
select * from (
  select distinct type as card_type from bank.card
) sub1
cross join (
  select distinct type as user_type from bank.disp
) sub2;

 * mysql+pymysql://root:***@localhost/bank
6 rows affected.


card_type,user_type
gold,OWNER
junior,OWNER
classic,OWNER
gold,DISPONENT
junior,DISPONENT
classic,DISPONENT


Ptential uses of cross joins:
- All possible pairs of clients and service (bank)
- All possible pairs of clients and product (real state)
- All possible pairs between users of a dating service

The `CROSS JOIN`s can cause performance issues as they are computationally very expensive. This is because the number of rows that are returned is the product of the number of rows in table 1 by the number of rows in the other table.