<div style="float:right; padding-top: 15px; padding-right: 15px">
    <div>
        <a href="https://whiteboxml.com">
            <img src="https://whiteboxml.com/static/img/logo/black_bg_white.svg" width="250">
        </a>
    </div>
</div>

# joins and relationships

## 1. introduction

* joins allows to leverage relationships between entities while doing queries.
* not all joins are supported by every technology (we will see workarounds).

## 2. sample database (same will be used for lab!)

In [3]:
# let's load jupyter sql extension

%load_ext sql
%config SqlMagic.autocommit = False

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [4]:
# load database

%sql sqlite:///data/publications.db

getting tables in publications database:

In [5]:
%%sql tables <<

SELECT 
    name
FROM 
    sqlite_master 
WHERE 
    type ='table' AND 
    name NOT LIKE 'sqlite_%';

 * sqlite:///data/publications.db
Done.
Returning data to local variable tables


In [6]:
tables.DataFrame()

Unnamed: 0,name
0,authors
1,discounts
2,employee
3,jobs
4,pub_info
5,publishers
6,roysched
7,sales
8,stores
9,titleauthor


## 3. types of joins

### inner join

goal: get a sorted table of publishers and number of titles published... important concepts:
* aggregation functions and groupby
* sorted results
* column alias

In [10]:
%%sql inner <<

SELECT 
pub_name,
count(titles.title_id) as titles_published
FROM publishers
JOIN titles
ON publishers.pub_id = titles.pub_id
GROUP BY pub_name
ORDER BY titles_published DESC;

 * sqlite:///data/publications.db
Done.
Returning data to local variable inner


In [11]:
inner.DataFrame()

Unnamed: 0,pub_name,titles_published
0,Binnet & Hardley,7
1,Algodata Infosystems,6
2,New Moon Books,5


### left join (all elements in the first table of the join will prevail)

In [14]:
%%sql left <<

SELECT 
pub_name,
count(titles.title_id) as titles_published
FROM publishers
LEFT JOIN titles
ON publishers.pub_id = titles.pub_id
GROUP BY pub_name
ORDER BY titles_published DESC;

 * sqlite:///data/publications.db
Done.
Returning data to local variable left


In [15]:
left.DataFrame()

Unnamed: 0,pub_name,titles_published
0,Binnet & Hardley,7
1,Algodata Infosystems,6
2,New Moon Books,5
3,Scootney Books,0
4,Ramona Publishers,0
5,Lucerne Publishing,0
6,GGG&G,0
7,Five Lakes Publishing,0


### right join (not supported by sqlite, just change order of tables)

goal: get number of units sold for each title with title name, type and price, even for those titles without sales

trick: to perform a right join, change order of tables and do a left join

In [27]:
%%sql right <<

select 
titles.title, 
titles.type, 
titles.price, 
sum(sales.qty) as units_sold
from titles
left join sales
on titles.title_id = sales.title_id
group by titles.title, titles.type, titles.price;

 * sqlite:///data/publications.db
Done.
Returning data to local variable right


In [30]:
right.DataFrame()

Unnamed: 0,title,type,price,units_sold
0,But Is It User Friendly?,popular_comp,22.95,30.0
1,Computer Phobic AND Non-Phobic Individuals: Be...,psychology,21.59,20.0
2,Cooking with Computers: Surreptitious Balance ...,business,11.95,25.0
3,Emotional Security: A New Algorithm,psychology,7.99,25.0
4,Fifty Years in Buckingham Palace Kitchens,trad_cook,11.95,20.0
5,Is Anger the Enemy?,psychology,10.95,108.0
6,Life Without Fear,psychology,7.0,25.0
7,Net Etiquette,popular_comp,,
8,"Onions, Leeks, and Garlic: Cooking Secrets of ...",trad_cook,20.95,40.0
9,Prolonged Data Deprivation: Four Case Studies,psychology,19.99,15.0


### outer join (not supported by sqlite and mysql either)

goal: get full list of employees and jobs, even for jobs with no employees and employees with no assigned job...

In [7]:
%%sql outer <<

SELECT 
employee.fname, 
employee.hire_date, 
jobs.job_desc, 
jobs.job_id
FROM jobs
LEFT JOIN employee
on jobs.job_id = employee.job_id
UNION
SELECT 
employee.fname, 
employee.hire_date, 
jobs.job_desc, 
jobs.job_id
FROM employee
LEFT JOIN jobs
on jobs.job_id = employee.job_id;

 * sqlite:///data/publications.db
Done.
Returning data to local variable outer


In [8]:
outer.DataFrame()

Unnamed: 0,fname,hire_date,job_desc,job_id
0,,,New Hire - Job not specified,1.0
1,Anabela,1993-01-27 00:00:00,Public Relations Manager,8.0
2,Ann,1991-07-16 00:00:00,Business Operations Manager,3.0
3,Annette,1990-02-21 00:00:00,Managing Editor,6.0
4,Aria,1991-10-26 00:00:00,Productions Manager,10.0
5,Carine,1992-07-07 00:00:00,Sales Representative,13.0
6,Carlos,1989-04-21 00:00:00,Publisher,5.0
7,Daniel,1990-01-01 00:00:00,Operations Manager,11.0
8,Diego,1991-12-16 00:00:00,Managing Editor,6.0
9,Elizabeth,1990-07-24 00:00:00,Designer,14.0


## 4. combined queries (with)

In [39]:
%%sql with_example <<

WITH 
employees_custom AS 
(
SELECT *
FROM employee
),
jobs_custom AS
(
SELECT *
FROM jobs
JOIN employees_custom
ON employees_custom.job_id = jobs.job_id
)
SELECT * from jobs_custom;

 * sqlite:///data/publications.db
Done.
Returning data to local variable with_example


In [40]:
with_example.DataFrame()

Unnamed: 0,job_id,job_desc,min_lvl,max_lvl,emp_id,fname,minit,lname,job_id:1,job_lvl,pub_id,hire_date
0,10,Productions Manager,75,165,A-C71970F,Aria,,Cruz,10,87,1389,1991-10-26 00:00:00
1,6,Managing Editor,140,225,A-R89858F,Annette,,Roulet,6,152,9999,1990-02-21 00:00:00
2,3,Business Operations Manager,175,225,AMD15433F,Ann,M,Devon,3,200,9952,1991-07-16 00:00:00
3,8,Public Relations Manager,100,175,ARD36773F,Anabela,R,Domingues,8,100,877,1993-01-27 00:00:00
4,5,Publisher,150,250,CFH28514M,Carlos,F,Hernadez,5,211,9999,1989-04-21 00:00:00
5,13,Sales Representative,25,100,CGS88322F,Carine,G,Schmitt,13,64,1389,1992-07-07 00:00:00
6,11,Operations Manager,75,150,DBT39435M,Daniel,B,Tonini,11,75,877,1990-01-01 00:00:00
7,6,Managing Editor,140,225,DWR65030M,Diego,W,Roel,6,192,1389,1991-12-16 00:00:00
8,14,Designer,25,100,ENL44273F,Elizabeth,N,Lincoln,14,35,877,1990-07-24 00:00:00
9,4,Chief Financial Officier,175,250,F-C16315M,Francisco,,Chang,4,227,9952,1990-11-03 00:00:00


## 5. lab time! let's start together

In this challenge you will write a MySQL SELECT query that joins various tables to figure out what titles each author has published at which publishers. Your output should have at least the following columns:

    AUTHOR ID - the ID of the author
    LAST NAME - author last name
    FIRST NAME - author first name
    TITLE - name of the published title
    PUBLISHER - name of the publisher where the title was published

https://github.com/ironhack-datalabs/dataV2-labs/tree/master/module-1/MySQL-Select

In [16]:
%%sql challenge_1 <<

select authors.au_id as author_id, 
authors.au_lname as last_name, 
authors.au_fname as first_name,
titles.title,
publishers.pub_name as publisher
from titles
JOIN titleauthor
on titles.title_id = titleauthor.title_id
JOIN authors
on authors.au_id = titleauthor.au_id
JOIN publishers
on titles.pub_id = publishers.pub_id;

 * sqlite:///data/publications.db
Done.
Returning data to local variable challenge_1


In [17]:
challenge_1.DataFrame()

Unnamed: 0,author_id,last_name,first_name,title,publisher
0,172-32-1176,White,Johnson,Prolonged Data Deprivation: Four Case Studies,New Moon Books
1,213-46-8915,Green,Marjorie,The Busy Executive's Database Guide,Algodata Infosystems
2,213-46-8915,Green,Marjorie,You Can Combat Computer Stress!,New Moon Books
3,238-95-7766,Carson,Cheryl,But Is It User Friendly?,Algodata Infosystems
4,267-41-2394,O'Leary,Michael,Cooking with Computers: Surreptitious Balance ...,Algodata Infosystems
5,267-41-2394,O'Leary,Michael,"Sushi, Anyone?",Binnet & Hardley
6,274-80-9391,Straight,Dean,Straight Talk About Computers,Algodata Infosystems
7,409-56-7008,Bennet,Abraham,The Busy Executive's Database Guide,Algodata Infosystems
8,427-17-2319,Dull,Ann,Secrets of Silicon Valley,Algodata Infosystems
9,472-27-2349,Gringlesby,Burt,"Sushi, Anyone?",Binnet & Hardley


<div style="padding-top: 25px; float: right">
    <div>    
        <i>&nbsp;&nbsp;Â© Copyright by</i>
    </div>
    <div>
        <a href="https://whiteboxml.com">
            <img src="https://whiteboxml.com/static/img/logo/black_bg_white.svg" width="125">
        </a>
    </div>
</div>