<img src="https://bit.ly/2VnXWr2" width="100" align="left">

# Lab | SQL Select

## Challenge 1 - Who Have Published What At Where?

In this challenge you will write a `SELECT` query that joins various tables to figure out what titles each author has published at which publishers. Your output should have at least the following columns:

- `AUTHOR_ID` - the ID of the author
- `LAST_NAME` - author last name
- `FIRST_NAME` - author first name
- `TITLE` - name of the published title
- `PUBLISHER` - name of the publisher where the title was published

In [1]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Users/Miguel/Documents/GitHub/Ironhack exercises/Modulo 1/ironhack_service_account_big_query.json"
from google.cloud import bigquery

In [2]:
client = bigquery.Client()

In [3]:
query_1 = '''
SELECT authors.au_id AS author_id, authors.au_lname AS last_name , authors.au_fname AS first_name , titles.title AS title , publishers.pub_name AS publisher

FROM  `ironhack-data-analytics-265219.publications.authors` authors

INNER JOIN `ironhack-data-analytics-265219.publications.titleauthor` titleauthor

ON authors.au_id = titleauthor.au_id

INNER JOIN `ironhack-data-analytics-265219.publications.titles` titles

ON titleauthor.title_id=titles.title_id

INNER JOIN `ironhack-data-analytics-265219.publications.publishers` publishers

ON titles.pub_id = publishers.pub_id'''

In [4]:
query_job_1= client.query(query=query_1)

In [5]:
dataframe_1 = query_job_1.to_dataframe()

In [15]:
dataframe_1

Unnamed: 0,author_id,last_name,first_name,title,publisher
0,807-91-6654,Panteley,Sylvia,"Onions, Leeks, and Garlic: Cooking Secrets of ...",Binnet & Hardley
1,722-51-5454,DeFrance,Michel,The Gourmet Microwave,Binnet & Hardley
2,712-45-1867,del Castillo,Innes,Silicon Valley Gastronomic Treats,Binnet & Hardley
3,899-46-2035,Ringer,Anne,Is Anger the Enemy?,New Moon Books
4,899-46-2035,Ringer,Anne,The Gourmet Microwave,Binnet & Hardley
5,998-72-3567,Ringer,Albert,Is Anger the Enemy?,New Moon Books
6,998-72-3567,Ringer,Albert,Life Without Fear,New Moon Books
7,172-32-1176,White,Johnson,Prolonged Data Deprivation: Four Case Studies,New Moon Books
8,486-29-1786,Locksley,Charlene,Emotional Security: A New Algorithm,New Moon Books
9,486-29-1786,Locksley,Charlene,Net Etiquette,Algodata Infosystems


In [7]:
query_2='''
SELECT COUNT(*) AS books
FROM  `ironhack-data-analytics-265219.publications.titleauthor` titleauthor
'''

In [8]:
query_job_2 = client.query(query=query_2)

In [9]:
dataframe_2 = query_job_2.to_dataframe()

In [10]:
dataframe_2

Unnamed: 0,books
0,25


In [11]:
dataframe_1.count()

author_id     25
last_name     25
first_name    25
title         25
publisher     25
dtype: int64

## Challenge 2 - Who Have Published How Many At Where?

Elevating from your solution in Challenge 1, query how many titles each author has published at each publisher. 

To check if your output is correct, sum up the `TITLE COUNT` column. The sum number should be the same as the total number of records in Table `_titleauthor_`.

_Hint: In order to count the number of titles published by an author, you need to use COUNT. Also check out Group By because you will count the rows of different groups of data. Refer to the references and learn by yourself. These features will be formally discussed in the Temp Tables and Subqueries lesson._

In [16]:
query_3 = '''
SELECT authors.au_id AS author_id, authors.au_lname AS last_name , authors.au_fname AS first_name , publishers.pub_name AS publisher , COUNT(titles.title_id) AS title_count

FROM  `ironhack-data-analytics-265219.publications.authors` authors

INNER JOIN `ironhack-data-analytics-265219.publications.titleauthor` titleauthor

ON authors.au_id = titleauthor.au_id

INNER JOIN `ironhack-data-analytics-265219.publications.titles` titles

ON titleauthor.title_id=titles.title_id

INNER JOIN `ironhack-data-analytics-265219.publications.publishers` publishers

ON titles.pub_id = publishers.pub_id

GROUP BY 1,2,3,4'''

In [17]:
query_job_3 = client.query(query=query_3)

In [18]:
dataframe_3 = query_job_3.to_dataframe()

In [19]:
dataframe_3

Unnamed: 0,author_id,last_name,first_name,publisher,title_count
0,807-91-6654,Panteley,Sylvia,Binnet & Hardley,1
1,722-51-5454,DeFrance,Michel,Binnet & Hardley,1
2,712-45-1867,del Castillo,Innes,Binnet & Hardley,1
3,899-46-2035,Ringer,Anne,New Moon Books,1
4,899-46-2035,Ringer,Anne,Binnet & Hardley,1
5,998-72-3567,Ringer,Albert,New Moon Books,2
6,172-32-1176,White,Johnson,New Moon Books,1
7,486-29-1786,Locksley,Charlene,New Moon Books,1
8,486-29-1786,Locksley,Charlene,Algodata Infosystems,1
9,846-92-7186,Hunter,Sheryl,Algodata Infosystems,1


## Challenge 3 - Best Selling Authors

Who are the top 3 authors who have sold the highest number of titles? Write a query to find out.

Requirements:

- Your output should have the following columns:

`AUTHOR_ID` - the ID of the author
`LAST_NAME` - author last name
`FIRST_NAME` - author first name
`TOTAL` - total number of titles sold from this author
- Your output should be ordered based on `TOTAL` from high to low.
- Only output the top 3 best selling authors.


_Hint: In order to calculate the total of profits of an author, you need to use the [SUM function](https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#sum). Refer to the reference and learn how to use it._

In [23]:
query_4 = '''
SELECT authors.au_id AS author_id, authors.au_lname AS last_name , authors.au_fname AS first_name , SUM(sales.qty) AS total

FROM  `ironhack-data-analytics-265219.publications.authors` authors

INNER JOIN `ironhack-data-analytics-265219.publications.titleauthor` titleauthor

ON authors.au_id = titleauthor.au_id

INNER JOIN `ironhack-data-analytics-265219.publications.titles` titles

ON titleauthor.title_id=titles.title_id

INNER JOIN `ironhack-data-analytics-265219.publications.sales` sales

ON sales.title_id = titles.title_id

GROUP BY 1,2,3

ORDER BY total DESC
LIMIT 3'''

In [24]:
query_job_4 = client.query(query=query_4)

In [25]:
dataframe_4 = query_job_4.to_dataframe()

In [26]:
dataframe_4

Unnamed: 0,author_id,last_name,first_name,total
0,899-46-2035,Ringer,Anne,148
1,998-72-3567,Ringer,Albert,133
2,213-46-8915,Green,Marjorie,50


## Challenge 4 - Best Selling Authors Ranking

Now modify your solution in Challenge 3 so that the output will display all 23 authors instead of the top 3. Note that the authors who have sold 0 titles should also appear in your output (ideally display `0` instead of `NULL` as the `TOTAL`). Also order your results based on `TOTAL` from high to low.

In [27]:
query_5 = '''
SELECT authors.au_id AS author_id, authors.au_lname AS last_name , authors.au_fname AS first_name , COALESCE(SUM(sales.qty),0) AS total

FROM  `ironhack-data-analytics-265219.publications.authors` authors

INNER JOIN `ironhack-data-analytics-265219.publications.titleauthor` titleauthor

ON authors.au_id = titleauthor.au_id

INNER JOIN `ironhack-data-analytics-265219.publications.titles` titles

ON titleauthor.title_id=titles.title_id

INNER JOIN `ironhack-data-analytics-265219.publications.sales` sales

ON sales.title_id = titles.title_id

GROUP BY 1,2,3

ORDER BY total DESC'''

In [28]:
query_job_5 = client.query(query=query_5)

In [29]:
dataframe_5 = query_job_5.to_dataframe()

In [30]:
dataframe_5

Unnamed: 0,author_id,last_name,first_name,total
0,899-46-2035,Ringer,Anne,148
1,998-72-3567,Ringer,Albert,133
2,846-92-7186,Hunter,Sheryl,50
3,427-17-2319,Dull,Ann,50
4,213-46-8915,Green,Marjorie,50
5,724-80-9391,MacFeather,Stearns,45
6,267-41-2394,O'Leary,Michael,45
7,807-91-6654,Panteley,Sylvia,40
8,722-51-5454,DeFrance,Michel,40
9,238-95-7766,Carson,Cheryl,30
