### Exploring Northwind database using SQL

The Northwind database was originally created by Microsoft. It simulates a wholesale business called "Northwind Traders" that imports and exports foods worldwide.

In this exercise, I explore the Northwind database using Postgre SQL. 

In [2]:
#Import libraries
import pandas as pd
from sqlalchemy import create_engine

In [3]:
# Create database connection
engine = create_engine('postgresql+psycopg2://tharinduabeysinghe:####@localhost/northwind')

# Run quey and load data to a dataframe
def execute_sql_query(sql):
    # Load data into a pandas DataFrame
    df = pd.read_sql_query(sql, con=engine)
    return df

In [None]:
# Define your SQL query
sql = "SELECT * FROM categories;"

# Execute query
execute_sql_query(sql)

Unnamed: 0,category_id,category_name,description,picture
0,1,Beverages,"Soft drinks, coffees, teas, beers, and ales",[]
1,2,Condiments,"Sweet and savory sauces, relishes, spreads, an...",[]
2,3,Confections,"Desserts, candies, and sweet breads",[]
3,4,Dairy Products,Cheeses,[]
4,5,Grains/Cereals,"Breads, crackers, pasta, and cereal",[]
5,6,Meat/Poultry,Prepared meats,[]
6,7,Produce,Dried fruit and bean curd,[]
7,8,Seafood,Seaweed and fish,[]


### Subqueries

A subquery is a query inside another query. Subqueries are mostly used to add a new column to the main query result, to create a filter or to create a consolidated source from which to select the data. A subquery is always written in parentheses. It can appear in different places usually within the SELECT, FROM, and WHERE clauses, based on the objective of its use. You can have as many as nested subqueries as possible. 

The nested query below pulls the names and quantities of the 10 most ordered products.

In [None]:
sql= '''SELECT product_name,
       		(SELECT Sum(quantity)
        	FROM public.order_details o
        	WHERE o.product_id = p.product_id) AS product_quantity
		FROM public.products p
		ORDER BY product_quantity DESC
		LIMIT 10'''

# Execute query
execute_sql_query(sql)

Unnamed: 0,product_name,product_quantity
0,Camembert Pierrot,1577
1,Raclette Courdavault,1496
2,Gorgonzola Telino,1397
3,Gnocchi di nonna Alice,1263
4,Pavlova,1158
5,Rhönbräu Klosterbier,1155
6,Guaraná Fantástica,1125
7,Boston Crab Meat,1103
8,Tarte au sucre,1083
9,Chang,1057


The query below pulls the cities with the 10 most orders shipped and the percentage of orders shipped to each city out of all orders. The nested query calculates the total orders and outer query calculates the percentage orders per city.

In [9]:
sql = '''SELECT 
            ship_city,
            ROUND(cast(count(o.order_id) as numeric) / (SELECT count(*) as total_orders FROM order_details), 2) as perc
        FROM orders o
        INNER JOIN order_details d on o.order_id = d.order_id
        GROUP BY 1
        ORDER BY 2 DESC
        LIMIT 10'''
        
# Execute query
execute_sql_query(sql)

Unnamed: 0,ship_city,perc
0,Boise,0.05
1,Graz,0.05
2,Rio de Janeiro,0.04
3,Sao Paulo,0.04
4,London,0.04
5,Cunewalde,0.04
6,México D.F.,0.03
7,Albuquerque,0.03
8,Cork,0.03
9,Brandenburg,0.02


When the subquery used as a filter to the main query, the subquery is in the WHERE clause. The outer query use operators such as IN, >, and < to filter depending on the output of the subquery.

### Common Table Expressions (CTEs)
A CTE is a named temporary result set. CTEs are defined using WITH keyword and used as a subquery. A CTE can be referenced within a single query (SELECT, INSERT, UPDATE, or DELETE) statement. A CTE is created only in the memory, not as a table in the database. Once the query is cancelled, the CTE is not available anymore.

The query below pulls yearly order data from the database and then aggregates total sales per year in the outer query.

In [8]:
sql = '''WITH yearlysales
              AS (SELECT Date_part('year', o.order_date) AS orderyear,
                    od.unit_price,
                    od.quantity
                FROM orders o
                LEFT JOIN order_details od
                       ON od.order_id = o.order_id)
         SELECT orderyear,
             Sum(unit_price * quantity) AS TotalSales
         FROM   yearlysales
         GROUP  BY orderyear
         ORDER  BY orderyear; '''

# Execute query
execute_sql_query(sql)

Unnamed: 0,orderyear,totalsales
0,1996.0,226298.50135
1,1997.0,658388.749487
2,1998.0,469771.339604
