### Exploring Northwind database using SQL

The Northwind database was originally created by Microsoft. It simulates a wholesale business called "Northwind Traders" that imports and exports foods worldwide.

In this exercise, I explore the Northwind database using Postgre SQL by Pivoting the tables.

In [8]:
#Import libraries
import pandas as pd
from sqlalchemy import create_engine

In [5]:
# Create database connection
engine = create_engine('postgresql+psycopg2://tharinduabeysinghe:####@localhost/northwind')

# Run quey and load data to a dataframe
def execute_sql_query(sql):
    # Load data into a pandas DataFrame
    df = pd.read_sql_query(sql, con=engine)
    return df

#### Pivot and UnPivot

Pivot and Unpivot are used to transform data (rotate from rows to columns and vice versa) and make more readable and efficient. This is crucial in summarizing, analyzing and reporting data. 

When we apply Pivot, each unique value in a column turned into its own column and data is aggregated based on a specific function such as sum, avg, or count. Unpivot is the reverse of Pivot. It converts columns back into rows. 

In [6]:
# Define your SQL query
sql = "SELECT * FROM categories;"

# Execute query
execute_sql_query(sql)

Unnamed: 0,category_id,category_name,description,picture
0,1,Beverages,"Soft drinks, coffees, teas, beers, and ales",[]
1,2,Condiments,"Sweet and savory sauces, relishes, spreads, an...",[]
2,3,Confections,"Desserts, candies, and sweet breads",[]
3,4,Dairy Products,Cheeses,[]
4,5,Grains/Cereals,"Breads, crackers, pasta, and cereal",[]
5,6,Meat/Poultry,Prepared meats,[]
6,7,Produce,Dried fruit and bean curd,[]
7,8,Seafood,Seaweed and fish,[]


Many pivot tasks in SQL can be completed using an aggregate functions with FILTER function. The FILTER() function lets you to compute multiple conditional aggregates in the same query. This is often the simplest and cleanest method to pivot a table. When we have dynamic and unknown categories or when we need to pivot dozens of values aggregate and FILTER() method is not enough. In that case we can use Pivot or Crosstab functions depending on the DBMS system we use. 

The code below queries the number of orders shipped to each region of the United States per year.


In [None]:
sql = '''SELECT EXTRACT(YEAR FROM order_date)::int AS order_year,
            Count(order_id) FILTER(WHERE ship_region='AK') AS orders_AK,
            Count(order_id) FILTER(WHERE ship_region='CA') AS orders_CA,
            Count(order_id) FILTER(WHERE ship_region='ID') AS orders_ID,
            Count(order_id) FILTER(WHERE ship_region='MT') AS orders_MT,
            Count(order_id) FILTER(WHERE ship_region='NM') AS orders_NM,
            Count(order_id) FILTER(WHERE ship_region='OR') AS orders_OR,
            Count(order_id) FILTER(WHERE ship_region='WA') AS orders_WA,
            Count(order_id) FILTER(WHERE ship_region='WY') AS orders_WY
        FROM public.orders
        WHERE ship_country = 'USA'
        GROUP BY order_year'''
        
# Execute query
execute_sql_query(sql)

Unnamed: 0,order_year,sales_ak,sales_ca,sales_id,sales_mt,sales_nm,sales_or,sales_wa,sales_wy
0,1997,4,3,17,2,6,14,12,2
1,1996,2,0,3,0,6,5,2,5
2,1998,4,1,11,1,6,9,5,2
