### Exploring Northwind database using SQL

The Northwind database was originally created by Microsoft. It simulates a wholesale business called "Northwind Traders" that imports and exports foods worldwide.

In this exercise, I explore the Northwind database using Postgre SQL. 

In [2]:
#Import libraries
import pandas as pd
from sqlalchemy import create_engine

In [3]:
# Create database connection
engine = create_engine('postgresql+psycopg2://tharinduabeysinghe:####@localhost/northwind')

# Run quey and load data to a dataframe
def execute_sql_query(sql):
    # Load data into a pandas DataFrame
    df = pd.read_sql_query(sql, con=engine)
    return df

### Window Functions

A window function performs calculation across a set of rows that are related to the current row, without collapsing the result set (without using GROUP BY).

A window function consists of two parts. The first part is the function and the second part is the window. The window defines how you need to view your data with the function. 

The code below uses different functions with the same window. 

In [4]:
sql ='''SELECT product_id,
            quantity,
            unit_price,
            Row_number()
                OVER(
                ORDER BY quantity DESC) AS up,
            Rank()
                OVER(
                ORDER BY quantity DESC) AS r_up,
            Dense_rank()
                OVER(
                ORDER BY quantity DESC) AS dr_up
        FROM   order_details
        WHERE  product_id = 11'''


# Execute query
execute_sql_query(sql)

Unnamed: 0,product_id,quantity,unit_price,up,r_up,dr_up
0,11,50,21.0,1,1,1
1,11,50,21.0,2,1,1
2,11,50,16.8,3,1,1
3,11,40,21.0,4,4,2
4,11,40,21.0,5,4,2
5,11,35,21.0,6,6,3
6,11,30,16.8,7,7,4
7,11,30,16.8,8,7,4
8,11,30,21.0,9,7,4
9,11,25,21.0,10,10,5


By changing the window portion for the query, we can change which portion the function is applied in the source view. You can use different columns for partition by or order by clauses.

The Query below applies the rank function over each different quantity of the orders.

In [5]:
sql = '''SELECT product_id,
                quantity,
                Row_number()
                OVER(
                    PARTITION BY product_id
                    ORDER BY quantity DESC) AS up
        FROM   order_details
        WHERE  product_id IN ( 11, 12, 13 )
        AND quantity > 20'''

# Execute query
execute_sql_query(sql)

Unnamed: 0,product_id,quantity,up
0,11,50,1
1,11,50,2
2,11,50,3
3,11,40,4
4,11,40,5
5,11,35,6
6,11,30,7
7,11,30,8
8,11,30,9
9,11,25,10


In [6]:
sql='''SELECT product_id,
              quantity,
              ROUND(CAST(AVG(quantity) 
                OVER(PARTITION BY product_id) AS numeric),2) AS Avg_quantity
        FROM   order_details
        WHERE  product_id IN ( 11, 12, 13 )
        AND quantity > 30 '''
        
# Execute query
execute_sql_query(sql)

Unnamed: 0,product_id,quantity,avg_quantity
0,11,40,44.17
1,11,50,44.17
2,11,40,44.17
3,11,50,44.17
4,11,50,44.17
5,11,35,44.17
6,12,36,57.33
7,12,100,57.33
8,12,36,57.33
9,13,80,63.0


Here is a question generated by ChatGPT on Northwind dataset. 

“For each customer, list their orders along with the order date, total order amount, and the running total of their spending over time.”

Below is the answer query.

In [7]:
sql='''
-- Create a new view with customer_id, order_id, order date, quantity 
-- and how much they spent for each order
WITH cte
     AS (SELECT o.customer_id,
                o.order_id,
                order_date,
                Sum(quantity) AS total_order_amount,
                Sum(unit_price * quantity * ( 1 - discount )) AS order_total
         FROM orders o
            JOIN PUBLIC.order_details od
                ON o.order_id = od.order_id
         GROUP BY o.customer_id,
                  o.order_id,
                  order_date
         ORDER BY o.customer_id,
                  order_date)
-- Then use window function to calculate running total
SELECT customer_id,
       order_id,
       order_date,
       total_order_amount,
       Round(Cast(Sum(order_total)
                    OVER (
                      partition BY customer_id
                      ORDER BY order_date) AS NUMERIC), 2) AS running_total
FROM cte
GROUP BY customer_id,
         order_id,
         order_date,
         total_order_amount,
         order_total
ORDER BY customer_id,
          order_date '''
 
# Execute query
execute_sql_query(sql)

Unnamed: 0,customer_id,order_id,order_date,total_order_amount,running_total
0,ALFKI,10643,1997-08-25,38,814.50
1,ALFKI,10692,1997-10-03,20,1692.50
2,ALFKI,10702,1997-10-13,21,2022.50
3,ALFKI,10835,1998-01-15,17,2868.30
4,ALFKI,10952,1998-03-16,18,3339.50
...,...,...,...,...,...
825,WOLZA,10792,1997-12-23,28,1666.85
826,WOLZA,10870,1998-02-04,5,1826.85
827,WOLZA,10906,1998-02-25,15,2254.35
828,WOLZA,10998,1998-04-03,69,2940.35


#### Reference:

- [SQL Window Functions](https://www.youtube.com/watch?v=rIcB4zMYMas&t=77s)