### Exploring Northwind database using SQL

The Northwind database was originally created by Microsoft. It simulates a wholesale business called "Northwind Traders" that imports and exports foods worldwide.

In this exercise, I explore the Northwind database using Postgre SQL by Pivoting the tables.

In [13]:
#Import libraries
import pandas as pd
from sqlalchemy import create_engine

In [14]:
# Create database connection
engine = create_engine('postgresql+psycopg2://tharinduabeysinghe:####@localhost/northwind')

# Run quey and load data to a dataframe
def execute_sql_query(sql):
    # Load data into a pandas DataFrame
    df = pd.read_sql_query(sql, con=engine)
    return df

#### Pivot and UnPivot

Pivot and Unpivot are used to transform data (rotate from rows to columns and vice versa) and make more readable and efficient. This is crucial in summarizing, analyzing and reporting data. 

When we apply Pivot, each unique value in a column turned into its own column and data is aggregated based on a specific function such as sum, avg, or count. Unpivot is the reverse of Pivot. It converts columns back into rows. 

Many pivot tasks in SQL can be completed using an aggregate functions with FILTER function. The FILTER() function lets you to compute multiple conditional aggregates in the same query. This is often the simplest and cleanest method to pivot a table. When we have dynamic and unknown categories or when we need to pivot dozens of values aggregate and FILTER() method is not enough. In that case we can use Pivot or Crosstab functions depending on the DBMS system we use. 

The code below queries the number of orders shipped to each region of the United States per year.


In [15]:
# Define the SQL query to count orders shipped to each US region per year
sql = '''
SELECT 
    EXTRACT(YEAR FROM order_date)::int AS order_year,
    COUNT(order_id) FILTER (WHERE ship_region = 'AK') AS orders_AK,
    COUNT(order_id) FILTER (WHERE ship_region = 'CA') AS orders_CA,
    COUNT(order_id) FILTER (WHERE ship_region = 'ID') AS orders_ID,
    COUNT(order_id) FILTER (WHERE ship_region = 'MT') AS orders_MT,
    COUNT(order_id) FILTER (WHERE ship_region = 'NM') AS orders_NM,
    COUNT(order_id) FILTER (WHERE ship_region = 'OR') AS orders_OR,
    COUNT(order_id) FILTER (WHERE ship_region = 'WA') AS orders_WA,
    COUNT(order_id) FILTER (WHERE ship_region = 'WY') AS orders_WY
FROM public.orders
WHERE ship_country = 'USA'
GROUP BY order_year
'''
        
# Execute query
execute_sql_query(sql)

Unnamed: 0,order_year,orders_ak,orders_ca,orders_id,orders_mt,orders_nm,orders_or,orders_wa,orders_wy
0,1997,4,3,17,2,6,14,12,2
1,1996,2,0,3,0,6,5,2,5
2,1998,4,1,11,1,6,9,5,2


The next query returns the annual sales (dollar amount) of each employee. Thought process behind the query is described below.

First, three tables are joined to get the following details
1. employees - employee names
2. orders - order date
3. order_details - quantity of sales, unit price of each food item, and discounts

employees and orders tables are joined employee_id column. The output is joined with order_details table using order_id column.

A pivot table is created to return the final output by the following order.
1. Filter the data by year.
2. Calculate the total dollar amounts of sales for each row.
3. Group total sales by each employee name. 

In [16]:

sql = '''
WITH sales AS (
    SELECT 
        CONCAT(first_name, ' ', last_name) AS employee_name,
        EXTRACT(YEAR FROM o.order_date)::int AS order_year,
        d.quantity,
        d.unit_price,
        d.discount
    FROM employees e
    JOIN orders o ON o.employee_id = e.employee_id
    JOIN order_details d ON d.order_id = o.order_id
)
SELECT 
    employee_name,
    ROUND(SUM(quantity * unit_price * (1 - discount)) FILTER (WHERE order_year = 1996)::numeric, 2) AS sales_1996,
    ROUND(SUM(quantity * unit_price * (1 - discount)) FILTER (WHERE order_year = 1997)::numeric, 2) AS sales_1997,
    ROUND(SUM(quantity * unit_price * (1 - discount)) FILTER (WHERE order_year = 1998)::numeric, 2) AS sales_1998
FROM sales
GROUP BY employee_name
'''        
# Execute query
execute_sql_query(sql)

Unnamed: 0,employee_name,sales_1996,sales_1997,sales_1998
0,Robert King,15232.16,60471.19,48864.88
1,Nancy Davolio,35764.52,93148.08,63195.01
2,Laura Callahan,22240.12,56032.62,48589.54
3,Michael Suyama,16642.61,43126.37,14144.15
4,Andrew Fuller,21757.06,70444.14,74336.55
5,Steven Buchanan,18383.92,30716.47,19691.89
6,Janet Leverling,18223.96,108026.16,76562.73
7,Margaret Peacock,49945.12,128809.79,54135.94
8,Anne Dodsworth,9894.51,26310.39,41103.16
