<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/0_Intro/3_Database_Overview.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Contoso Database Overview

### Overview
1. Become familiar with the dataset.
2. Review all of the tables.
3. Investigate the sales table further.

In [43]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# If running in Google Colab, install PostgreSQL and restore the database
if 'google.colab' in sys.modules:
    # Install PostgreSQL
    !sudo apt-get install postgresql -qq > /dev/null 2>&1

    # Start PostgreSQL service (suppress output)
    !sudo service postgresql start > /dev/null 2>&1

    # Set password for the 'postgres' user to avoid authentication errors (suppress output)
    !sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';" > /dev/null 2>&1

    # Create the 'colab_db' database (suppress output)
    !sudo -u postgres psql -c "CREATE DATABASE contoso_100k;" > /dev/null 2>&1

    # Download the PostgreSQL .sql dump
    !wget -q -O contoso_100k.sql https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql

    # Restore the dump file into the PostgreSQL database (suppress output)
    !sudo -u postgres psql contoso_100k < contoso_100k.sql > /dev/null 2>&1

    # Shift libraries from ipython-sql to jupysql
    !pip uninstall -y ipython-sql > /dev/null 2>&1
    !pip install jupysql > /dev/null 2>&1

# Load the sql extension for SQL magic
%load_ext sql

# Connect to the PostgreSQL database
%sql postgresql://postgres:password@localhost:5432/contoso_100k

# Enable automatic conversion of SQL results to pandas DataFrames
%config SqlMagic.autopandas = True

# Disable named parameters for SQL magic
%config SqlMagic.named_parameters = "disabled"

# Display pandas number to two decimal places
pd.options.display.float_format = '{:.2f}'.format

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## 🏢 Introduction to the Contoso Dataset

In this course, we'll utilize the **Contoso Dataset**, a comprehensive synthetic dataset designed to simulate a retail business environment. This dataset is ideal for practicing SQL queries and data analysis techniques.

### 📊 What's in the Contoso Dataset?

We will primarily focus on the **Sales Database**, which includes key business metrics related to transactions and revenue. The dataset contains:

- 🛒 **Sales Transactions** – Detailed records of customer purchases.  
- 📦 **Product Information** – Data on products, categories, and subcategories.  
- 🏢 **Store Details** – Information about different store locations.  
- 📅 **Date and Time Data** – Timestamps to analyze sales trends over time.  

### 🎯 Why Are We Using the Contoso Dataset?

- 🏪 **Realistic Business Data** – Simulates real-world sales transactions, making it ideal for practicing SQL.  
- 📊 **Versatile Analysis Opportunities** – The sales-focused dataset allows us to perform a variety of common analyses expected of data analysts across different industries.  
- 🔢 **Diverse Data Types** – Includes a mix of numerical, categorical, and date-based data, enabling a wide range of SQL operations without limitations.  

### 📚 Where Do I Get This Dataset?

For the entirety of the course, we will be using the contents of this downloadable [dataset](https://github.com/lukebarousse/Int_SQL_Data_Analytics_Course/releases/download/v.0.0.0/contoso_100k.sql).
> NOTE: A variety of "Contoso" datasets are available online, however you'll need to use this one if you want to follow along with the course.

We sourced this dataset from the [Contoso Data Generator V2](https://github.com/sql-bi/Contoso-Data-Generator-V2).  

---

## Database Exploration

### 🩻 ERD Diagram

Below is the Entity Relationship Diagram (ERD) showing how the tables relate to each other. 

**Note**: This is a simplified ERD diagram and it doesn't show every column.

<img src="../Resources/images/0.3_ERD.png" alt="ERD" style="width: 80%; height: auto;">

### 📊 Tables

There are 6 tables in total:

1. **`currencyexchange`** - Tracks historical exchange rates by currency and date. Necessary for adjusting sales or profit into a common currency.
2. **`date`** - Date dimension with attributes (e.g., year, quarter). Mostly redundant because we're using SQL date functions directly.
3. **`store`** - Information about stores (e.g., location, type). 
4. **`customer`** - Contains customer details (e.g., demographics, region). 
5. **`product`** - Details about products (e.g., category, price). 
6. **`sales`** - Transaction records (e.g., product, quantity, order date). 

Let's take a look at all the tables in the database.

> **TIP: Gemini Chat Help**
>   
> If you're having trouble with a query, you can use the Gemini Chat to help you generate the query.
> <img src="../Resources/images/0.3_gemini.png" width="50%" alt="Gemini Chat">

In [44]:
%%sql

SELECT *
FROM information_schema.tables
WHERE table_schema = 'public';

Unnamed: 0,table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
0,contoso_100k,public,currencyexchange,BASE TABLE,,,,,,YES,NO,
1,contoso_100k,public,customer,BASE TABLE,,,,,,YES,NO,
2,contoso_100k,public,sales,BASE TABLE,,,,,,YES,NO,
3,contoso_100k,public,date,BASE TABLE,,,,,,YES,NO,
4,contoso_100k,public,product,BASE TABLE,,,,,,YES,NO,
5,contoso_100k,public,store,BASE TABLE,,,,,,YES,NO,


> `information_schema` is a system schema that provides metadata about database objects (tables, columns, schemas, constraints, etc.) using standardized SQL views. It allows you to query database structure without accessing system-specific catalogs.
>
> The dot values after `information_schema` (e.g., `information_schema.tables`, `information_schema.columns`) are system views that store metadata about different database objects.
>
> Each view contains structured information:
> - `information_schema.tables` → Lists all tables and views.
> - `information_schema.columns` → Lists all columns in all tables.
> - `information_schema.schemata` → Lists all schemas in the database.
> - `information_schema.views` → Lists all views.
> - `information_schema.table_constraints` → Lists constraints (PK, FK, UNIQUE).
>
> Note on `public`: We use `public` to filter for user-created tables while excluding system schemas, which store PostgreSQL’s internal metadata and background processes.

#### Currency Exchange

Tracks historical exchange rates by currency and date. Necessary for adjusting sales or profit into a common currency.

> **Student Note:** Exchange Rate is already included in the `sales` table. This table is just for reference.

In [45]:
%%sql

SELECT *
FROM currencyexchange
LIMIT 5

Unnamed: 0,date,fromcurrency,tocurrency,exchange
0,2015-01-01,AUD,AUD,1.0
1,2015-01-01,AUD,CAD,0.95
2,2015-01-01,AUD,EUR,0.67
3,2015-01-01,AUD,GBP,0.53
4,2015-01-01,AUD,USD,0.82


> **REVIEW: `SELECT *` & `LIMIT` Usage**
>
> We aren't using `SELECT *` without `LIMIT` because...
> - It costs **computer resources**, if  you're running this on the Cloud this means more money. 💰
> - It also costs **time**, by taking longer to load the results.

#### Date

Date dimension with attributes (e.g., year, quarter). 

> **Student Note:** We'll be using dates from the `sales` table. This table is just for reference.
> 
> Date tables are used in cases where the fact table is missing date information or if you want to filter by date (common in tools like Power BI).

In [46]:
%%sql

SELECT *
FROM date
LIMIT 5

Unnamed: 0,date,datekey,year,yearquarter,yearquarternumber,quarter,yearmonth,yearmonthshort,yearmonthnumber,month,monthshort,monthnumber,dayofweek,dayofweekshort,dayofweeknumber,workingday,workingdaynumber
0,2015-01-01,20150101,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Thursday,Thu,5,0,0
1,2015-01-02,20150102,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Friday,Fri,6,1,1
2,2015-01-03,20150103,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Saturday,Sat,7,0,1
3,2015-01-04,20150104,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Sunday,Sun,1,0,1
4,2015-01-05,20150105,2015,Q1-2015,8061,Q1,January 2015,Jan 2015,24181,January,Jan,1,Monday,Mon,2,1,2


#### Store

Information about stores (e.g., location, type). 

In [47]:
%%sql

SELECT *
FROM store
LIMIT 5

Unnamed: 0,storekey,storecode,geoareakey,countrycode,countryname,state,opendate,closedate,description,squaremeters,status
0,10,1,1,AU,Australia,Australian Capital Territory,2008-01-01,,Contoso Store Australian Capital Territory,595.0,
1,20,2,3,AU,Australia,Northern Territory,2008-01-12,2016-07-07,Contoso Store Northern Territory,665.0,Closed
2,30,3,5,AU,Australia,South Australia,2012-01-07,2015-08-08,Contoso Store South Australia,2000.0,Restructured
3,35,3,5,AU,Australia,South Australia,2015-12-08,,Contoso Store South Australia,3000.0,
4,40,4,6,AU,Australia,Tasmania,2010-01-01,,Contoso Store Tasmania,2000.0,


### Main Tables: `customer`, `product`, `sales`

> NOTE: These tables are all very wide, containing many column names.
> 
> Instead of showing the entire table first, let's look at the column names.

First, let's take a look at all the columns in the database.

- We can see all the columns in the database by using the `information_schema.columns` view.

In [48]:
%%sql

SELECT *
FROM information_schema.columns

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,...,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,contoso_100k,pg_catalog,pg_replication_slots,xmin,9,,YES,xid,,,...,NO,,,,,,NO,NEVER,,NO
1,contoso_100k,pg_catalog,pg_replication_slots,catalog_xmin,10,,YES,xid,,,...,NO,,,,,,NO,NEVER,,NO
2,contoso_100k,pg_catalog,pg_replication_slots,restart_lsn,11,,YES,pg_lsn,,,...,NO,,,,,,NO,NEVER,,NO
3,contoso_100k,pg_catalog,pg_replication_slots,confirmed_flush_lsn,12,,YES,pg_lsn,,,...,NO,,,,,,NO,NEVER,,NO
4,contoso_100k,pg_catalog,pg_class,relpages,10,,NO,integer,,,...,NO,,,,,,NO,NEVER,,YES
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2151,contoso_100k,pg_catalog,pg_stat_all_indexes,schemaname,3,,YES,name,,,...,NO,,,,,,NO,NEVER,,NO
2152,contoso_100k,pg_catalog,pg_statistic_ext,stxname,3,,NO,name,,,...,NO,,,,,,NO,NEVER,,YES
2153,contoso_100k,pg_catalog,pg_type,typdefaultbin,30,,YES,pg_node_tree,,,...,NO,,,,,,NO,NEVER,,YES
2154,contoso_100k,pg_catalog,pg_statio_user_tables,relname,3,,YES,name,,,...,NO,,,,,,NO,NEVER,,NO


#### Customer

Contains customer details (e.g., demographics, region). 

In [49]:
%%sql

SELECT *
FROM information_schema.columns
WHERE table_name = 'customer'

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,...,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,contoso_100k,public,customer,customerkey,1,,NO,integer,,,...,NO,,,,,,NO,NEVER,,YES
1,contoso_100k,public,customer,geoareakey,2,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
2,contoso_100k,public,customer,startdt,3,,YES,date,,,...,NO,,,,,,NO,NEVER,,YES
3,contoso_100k,public,customer,enddt,4,,YES,date,,,...,NO,,,,,,NO,NEVER,,YES
4,contoso_100k,public,customer,birthday,18,,YES,date,,,...,NO,,,,,,NO,NEVER,,YES
5,contoso_100k,public,customer,age,19,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
6,contoso_100k,public,customer,latitude,23,,YES,double precision,,,...,NO,,,,,,NO,NEVER,,YES
7,contoso_100k,public,customer,longitude,24,,YES,double precision,,,...,NO,,,,,,NO,NEVER,,YES
8,contoso_100k,public,customer,middleinitial,9,,YES,character varying,5.0,20.0,...,NO,,,,,,NO,NEVER,,YES
9,contoso_100k,public,customer,surname,10,,YES,character varying,50.0,200.0,...,NO,,,,,,NO,NEVER,,YES


> **REVIEW: `WHERE` Clause**
>
> The `WHERE` clause is used to filter rows based on specific conditions.
>
> In this case, we're filtering the `customer` table to only return rows where the `table_name` column is equal to `customer`.

3. Finally, let's select the columns of interest we want to view.

In [50]:
%%sql

SELECT
    customerkey,
    continent,
    gender,
    givenname,
    surname,
    countryfull,
    birthday,
    company
FROM
    customer

Unnamed: 0,customerkey,continent,gender,givenname,surname,countryfull,birthday,company
0,15,Australia,male,Julian,McGuigan,Australia,1965-03-24,Cut Rite Lawn Care
1,23,Australia,female,Rose,Dash,Australia,1990-05-10,Rack N Sack
2,36,Australia,female,Annabelle,Townsend,Australia,1964-07-16,id Boutiques
3,120,Australia,male,Jamie,Hetherington,Australia,1946-12-11,Showbiz Pizza Place
4,180,Australia,male,Gabriel,Bosanquet,Australia,1955-04-24,Dubrow's Cafeteria
...,...,...,...,...,...,...,...,...
104985,2099639,North America,male,Miroslav,Slach,United States,1945-04-30,Strength Gurus
104986,2099656,North America,male,Wilfredo,Lozada,United States,1945-08-24,Williams Bros.
104987,2099697,North America,male,Phillipp,Maier,United States,1966-12-08,Excella
104988,2099711,North America,female,Katerina,Pavlícková,United States,1941-01-01,Lawnscape Garden Maintenance


#### Product

Details about products (e.g., category, price). 

In [51]:
%%sql

SELECT *
FROM information_schema.columns
WHERE table_name = 'product'

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,...,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,contoso_100k,public,product,productkey,1,,NO,integer,,,...,NO,,,,,,NO,NEVER,,YES
1,contoso_100k,public,product,productcode,2,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
2,contoso_100k,public,product,weight,8,,YES,double precision,,,...,NO,,,,,,NO,NEVER,,YES
3,contoso_100k,public,product,cost,9,,YES,double precision,,,...,NO,,,,,,NO,NEVER,,YES
4,contoso_100k,public,product,price,10,,YES,double precision,,,...,NO,,,,,,NO,NEVER,,YES
5,contoso_100k,public,product,categorykey,11,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
6,contoso_100k,public,product,subcategorykey,13,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
7,contoso_100k,public,product,categoryname,12,,YES,character varying,50.0,200.0,...,NO,,,,,,NO,NEVER,,YES
8,contoso_100k,public,product,subcategoryname,14,,YES,character varying,50.0,200.0,...,NO,,,,,,NO,NEVER,,YES
9,contoso_100k,public,product,productname,3,,YES,character varying,100.0,400.0,...,NO,,,,,,NO,NEVER,,YES


In [52]:
%%sql

SELECT
    productkey,
    productcode,
    productname,
    cost,
    price,
    categoryname,
    subcategoryname
FROM
    product
ORDER BY
    productkey

Unnamed: 0,productkey,productcode,productname,cost,price,categoryname,subcategoryname
0,1,101001,Contoso 512MB MP3 Player E51 Silver,6.62,12.99,Audio,MP4&MP3
1,2,101002,Contoso 512MB MP3 Player E51 Blue,6.62,12.99,Audio,MP4&MP3
2,3,101003,Contoso 1G MP3 Player E100 White,7.40,14.52,Audio,MP4&MP3
3,4,101004,Contoso 2G MP3 Player E200 Silver,11.00,21.57,Audio,MP4&MP3
4,5,101005,Contoso 2G MP3 Player E200 Red,11.00,21.57,Audio,MP4&MP3
...,...,...,...,...,...,...,...
2512,2513,505026,Contoso Bluetooth Active Headphones L15 Red,43.07,129.99,Cell phones,Cell phones Accessories
2513,2514,505027,Contoso Bluetooth Active Headphones L15 White,43.07,129.99,Cell phones,Cell phones Accessories
2514,2515,505028,Contoso In-Line Coupler E180 White,1.71,3.35,Cell phones,Cell phones Accessories
2515,2516,505029,Contoso In-Line Coupler E180 Black,1.71,3.35,Cell phones,Cell phones Accessories


#### Sales

Transaction records (e.g., product, quantity, order date). 

In [53]:
%%sql

SELECT *
FROM information_schema.columns
WHERE table_name = 'sales'

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,...,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
0,contoso_100k,public,sales,exchangerate,13,,YES,double precision,,,...,NO,,,,,,NO,NEVER,,YES
1,contoso_100k,public,sales,linenumber,2,,NO,integer,,,...,NO,,,,,,NO,NEVER,,YES
2,contoso_100k,public,sales,orderdate,3,,YES,date,,,...,NO,,,,,,NO,NEVER,,YES
3,contoso_100k,public,sales,deliverydate,4,,YES,date,,,...,NO,,,,,,NO,NEVER,,YES
4,contoso_100k,public,sales,customerkey,5,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
5,contoso_100k,public,sales,storekey,6,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
6,contoso_100k,public,sales,productkey,7,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
7,contoso_100k,public,sales,quantity,8,,YES,integer,,,...,NO,,,,,,NO,NEVER,,YES
8,contoso_100k,public,sales,unitprice,9,,YES,double precision,,,...,NO,,,,,,NO,NEVER,,YES
9,contoso_100k,public,sales,netprice,10,,YES,double precision,,,...,NO,,,,,,NO,NEVER,,YES


In [54]:
%%sql

SELECT
    orderkey,
    orderdate,
    customerkey,
    storekey,
    productkey,
    quantity,
    unitprice,
    currencycode,
    exchangerate
FROM
    sales

Unnamed: 0,orderkey,orderdate,customerkey,storekey,productkey,quantity,unitprice,currencycode,exchangerate
0,1000,2015-01-01,947009,400,48,1,112.46,GBP,0.64
1,1000,2015-01-01,947009,400,460,1,749.75,GBP,0.64
2,1001,2015-01-01,1772036,430,1730,2,54.38,USD,1.00
3,1002,2015-01-01,1518349,660,955,4,315.04,USD,1.00
4,1002,2015-01-01,1518349,660,62,7,135.75,USD,1.00
...,...,...,...,...,...,...,...,...,...
199868,3398034,2024-04-20,664396,999999,1651,7,159.99,EUR,0.94
199869,3398034,2024-04-20,664396,999999,1646,1,159.99,EUR,0.94
199870,3398035,2024-04-20,267690,999999,1575,2,60.99,CAD,1.38
199871,3398035,2024-04-20,267690,999999,415,5,326.00,CAD,1.38


---
## Further Investigation

We'll be using the `sales` table the most so let's explore it a bit more. This is a general investigation/overview of how we'll be viewing the tables throughout the course.

Why? 
- Become familiar with our tables
- Understand common definitions (e.g. net revenue)

### **Steps Breakdown:**

1. **🧮 Calculate `net_revenue`**: Multiply `quantity * netprice * exchangerate` to get actual revenue in USD.  

2. **📅 Filter for Recent Sales**: Use `WHERE` to return only sales from **2020 onwards** (`'YYYY-MM-DD'` format).  

3. **👥 Add Customer Info**: Use `LEFT JOIN` to join the `sales` table (`s`) with `customers` (`c`) to include customer details.  

4. **📦 Add Product Info**: Use `LEFT JOIN` to bring in product details (`name`, `category`) by joining `product` (`p`) to `sales`.  

5. **📊 Flag Large Orders**: Create a **"High"/"Low"** flag based on `net_reenue > 1000` to easily identify high value orders. 


### **Steps:**

1. Add in the calculation for `net_revenue`: `quantity * netprice * exchangerate`
- Calculation Explanation: 
    - **Net Revenue**
        - Definition: The total revenue after accounting for discounts, promotions, and adjustments. It's the actual price paid by customers. 
        - We're using net revenue because it's the most accurate representation of sales performance and the revenue the business earns from each transaction.
        - Formula: `netprice` * `quantity`
    - `exchangerate` must be multiplied because not every sale is in USD currency, you can see which currency it is in the `currencycode` column.

> **REVIEW: `ORDER BY` Clause** 
>
> The `ORDER BY` clause is used to sort the results of a query by one or more columns.
>
> In this case, we're sorting the results by the `orderkey` column.


In [55]:
%%sql

SELECT 
  orderdate,
  quantity * netprice * exchangerate AS net_revenue
FROM sales
ORDER BY  --Added
    orderdate
LIMIT 10

Unnamed: 0,orderdate,net_revenue
0,2015-01-01,423.28
1,2015-01-01,108.75
2,2015-01-01,1146.75
3,2015-01-01,950.25
4,2015-01-01,1302.91
5,2015-01-01,58.73
6,2015-01-01,224.98
7,2015-01-01,263.11
8,2015-01-01,578.52
9,2015-01-01,63.49


2. Let's take a look at more recent sales. We'll filter the data to look at anything after 2020, using the `WHERE` clause to only return sales that are on or after the date 2020-01-01 ('YYYY-MM-DD' format).

In [60]:
%%sql

SELECT 
  orderdate,
  quantity * netprice * exchangerate AS net_revenue
FROM sales
WHERE
  orderdate >= '2020-01-01'
ORDER BY
    orderdate
LIMIT 10

Unnamed: 0,orderdate,net_revenue
0,2020-01-01,1396.2
1,2020-01-01,601.47
2,2020-01-01,50.59
3,2020-01-01,2781.54
4,2020-01-01,66.12
5,2020-01-01,743.04
6,2020-01-01,1976.0
7,2020-01-01,391.55
8,2020-01-01,353.19
9,2020-01-01,50.4


3. A lot of the times we'll want to look at the customers who made the order. Add in customer information by using`LEFT JOIN` to join the `sales` table with the `customers` table to get the customer information. Assign the alias `s` to the `sales` table and `c` to `customer` table.

> **REVIEW: `JOINS`**
>- `JOINS` are used to combine rows from two or more tables based on a related column between them.
>   <div style="margin-left: 40px;">
>   <img src="../Resources/images/0.3_joins_overview.png" alt="Left Join" style="width: 40%; height: auto;">
> </div> 
>
>- `LEFT JOINS` returns all records from the left table (Table A), and the matching rows from the right table (Table B). 
>   <div style="margin-left: 40px;">
>   <img src="../Resources/images/0.3_sql_left_join.png" alt="Left Join" style="width: 40%; height: auto;">
> </div> 


In [61]:
%%sql

SELECT 
  s.orderdate,
  s.quantity * s.netprice * s.exchangerate AS net_revenue,
  c.givenname,-- Added
  c.surname,-- Added
  c.countryfull,-- Added
  c.continent-- Added
FROM 
  sales s
LEFT JOIN customer c ON s.customerkey = c.customerkey -- Added
WHERE
    s.orderdate > '2020-01-01'
ORDER BY
    s.orderdate
LIMIT 10

Unnamed: 0,orderdate,net_revenue,givenname,surname,countryfull,continent
0,2020-01-02,8.23,Chuck,Cecil,United States,North America
1,2020-01-02,23.74,Swen,Thalberg,Germany,Europe
2,2020-01-02,24.07,Swen,Thalberg,Germany,Europe
3,2020-01-02,2986.18,Swen,Thalberg,Germany,Europe
4,2020-01-02,181.32,Swen,Thalberg,Germany,Europe
5,2020-01-02,582.57,Finlay,Connolly,United Kingdom,Europe
6,2020-01-02,59.1,Finlay,Connolly,United Kingdom,Europe
7,2020-01-02,9.54,Finlay,Connolly,United Kingdom,Europe
8,2020-01-02,236.28,Mary,Rex,United States,North America
9,2020-01-02,81.38,Mary,Rex,United States,North America


4. Next, we want to be able to see the type of product ordered. So we get the product information (e.g name and category name) from the `product` table. Using a `LEFT JOIN` to join the `product` table to the `sales` table and assign the alias `p` to the product table.

> **TIP: Gemini Explain Code**
>
> If you're having trouble understanding a query, you can use the Gemini Chat to explain the code.
>
> <img src="../Resources/images/0.3_gemini_explain.png" width="50%" alt="Gemini Chat">

In [62]:
%%sql

SELECT 
  s.orderdate,
  s.quantity * s.netprice * s.exchangerate AS net_revenue,
  c.givenname,
  c.surname,
  c.countryfull,
  c.continent,
  p.productkey,-- Added
  p.productname,-- Added
  p.categoryname,-- Added
  p.subcategoryname-- Added
FROM
    sales s
LEFT JOIN customer c ON s.customerkey = c.customerkey 
LEFT JOIN product p ON s.productkey = p.productkey -- Added
WHERE
    s.orderdate > '2020-01-01'
ORDER BY
    s.orderdate
LIMIT 10

Unnamed: 0,orderdate,net_revenue,givenname,surname,countryfull,continent,productkey,productname,categoryname,subcategoryname
0,2020-01-02,2405.45,Nora,Greece,Italy,Europe,162,"Adventure Works 52"" LCD HDTV X790W Black",TV and Video,Televisions
1,2020-01-02,274.21,Nora,Greece,Italy,Europe,458,WWI Desktop PC1.80 E1800 White,Computers,Desktops
2,2020-01-02,333.96,Magnus,Krane,United States,North America,779,Contoso Dual USB Power Adapter - power adapter...,Computers,Computers Accessories
3,2020-01-02,293.22,Magnus,Krane,United States,North America,63,WWI 2GB Spy Video Recorder Pen M300 Blue,Audio,Recording Pen
4,2020-01-02,635.01,Magnus,Krane,United States,North America,1135,"Fabrikam SLR Camera 35"" M358 Blue",Cameras and camcorders,Digital SLR Cameras
5,2020-01-02,30.06,Magnus,Krane,United States,North America,77,NT Bluetooth Active Headphones E202 Silver,Audio,Bluetooth Headphones
6,2020-01-02,20.39,Samson,Mathilda,Netherlands,Europe,1663,MGS Hand Games men M300 Yellow,Games and Toys,Boxed Games
7,2020-01-02,9.89,Filomena,Trevisan,Italy,Europe,1680,MGS Hand Games for students E400 Silver,Games and Toys,Boxed Games
8,2020-01-02,1911.0,Brent,Osburn,United States,North America,1415,The Phone Company Touch Screen Phones Infrared...,Cell phones,Touch Screen Phones
9,2020-01-02,1542.84,Brent,Osburn,United States,North America,439,WWI Desktop PC2.30 M2300 Brown,Computers,Desktops


5. Finally, let's add in a condition to flag whether the `net_revenue` was above 1000: if it is then assign "High" to it, if not then assign "Low". This can help quickly us quickly filter orders that have a higher quantity.

> **REVIEW: `CASE WHEN` Statement**
>
> The `CASE WHEN` statement is used to create conditional logic in SQL.
>
> In this case, we're creating a conditional logic to flag whether the `net_revenue` was above 1000: if it is then assign "High" to it, if not then assign "Low".

In [63]:
%%sql

SELECT 
  s.orderdate,
  s.quantity * s.netprice * s.exchangerate AS net_revenue,
  c.givenname,
  c.surname,
  c.countryfull,
  c.continent,
  p.productkey,
  p.productname,
  p.categoryname,
  p.subcategoryname,
  CASE WHEN s.quantity * s.netprice * s.exchangerate > 1000 THEN 'HIGH' ELSE 'LOW' END AS high_low
FROM 
  sales s
LEFT JOIN customer c ON s.customerkey = c.customerkey
LEFT JOIN product p ON s.productkey = p.productkey
WHERE
  orderdate >= '2020-01-01'
ORDER BY
    s.orderdate

Unnamed: 0,orderdate,net_revenue,givenname,surname,countryfull,continent,productkey,productname,categoryname,subcategoryname,high_low
0,2020-01-01,50.40,Steven,Short,United States,North America,1744,MGS Zoo Tycoon Marine Mania E120,Games and Toys,Download Games,LOW
1,2020-01-01,1396.20,Steven,Short,United States,North America,1091,"Contoso SLR Camera 35"" M358 Silver Grey",Cameras and camcorders,Digital SLR Cameras,HIGH
2,2020-01-01,601.47,Steven,Short,United States,North America,1360,Contoso In front of Centrex L15 White,Cell phones,Home & Office Phones,LOW
3,2020-01-01,50.59,Laura,Bailey,United Kingdom,Europe,2501,Contoso Phone Tough Skin Case E140 Pink,Cell phones,Cell phones Accessories,LOW
4,2020-01-01,2781.54,Robert,Baader,Germany,Europe,1478,The Phone Company Smart phones Expert M400 Black,Cell phones,Smart phones & PDAs,HIGH
...,...,...,...,...,...,...,...,...,...,...,...
124446,2024-04-20,914.61,Karlotta,Rivière,France,Europe,1651,Contoso DVD 9-Inch Player Portable M300 Silver,"Music, Movies and Audio Books",Movie DVD,LOW
124447,2024-04-20,150.18,Karlotta,Rivière,France,Europe,1646,Contoso DVD 9-Inch Player Portable M300 Black,"Music, Movies and Audio Books",Movie DVD,LOW
124448,2024-04-20,147.78,Michael,Wilson,Canada,North America,1575,SV DVD Player M140 Gold,"Music, Movies and Audio Books",Movie DVD,LOW
124449,2024-04-20,2019.62,Michael,Wilson,Canada,North America,415,Proseware Laptop8.9 E089 White,Computers,Laptops,HIGH
