## Exercises - Basic SQL Queries

Here are some of the exercises for which you can write SQL queries to self evaluate.

* Ensure that we have required database and user for retail data. **We might provide the database as part of our labs.** Here are the instructions to use `psql` for setting up the required database (if required) and tables.

```shell
psql -U postgres -h localhost -p 5432 -W
```

```sql
CREATE DATABASE itversity_retail_db;
CREATE USER itversity_retail_user WITH ENCRYPTED PASSWORD 'retail_password';
GRANT ALL ON DATABASE itversity_retail_db TO itversity_retail_user;
```

* Create Tables using the script provided. You can either use `psql` or **SQL Workbench**.

```shell
psql -U itversity_retail_user \
  -h localhost \
  -p 5432 \
  -d itversity_retail_db \
  -W
```

* You can drop the existing tables.

```sql
DROP TABLE order_items;
DROP TABLE orders;
DROP TABLE customers;
DROP TABLE products;
DROP TABLE categories;
DROP TABLE departments;
```

* Once the tables are dropped you can run below script to create the tables for the purpose of exercises.

```sql
\i /data/retail_db/create_db_tables_pg.sql
```

* Data shall be loaded using the script provided.

```sql
\i /data/retail_db/load_db_tables_pg.sql
```

* Run queries to validate we have data in all the 3 tables.

In [1]:
%load_ext sql

In [2]:
%env DATABASE_URL=postgresql://retail_user:retail_password@localhost:5432/retail_db

env: DATABASE_URL=postgresql://retail_user:retail_password@localhost:5432/retail_db


### Exercise 1 - Customer order count

Get order count per customer for the month of 2014 January.
* Tables - orders and customers
* Data should be sorted in descending order by count and ascending order by customer id.
* Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count.

In [4]:
%%sql
select * from orders limit 3;

 * postgresql://retail_user:***@localhost:5432/retail_db
3 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED
2,2013-07-25 00:00:00,256,PENDING_PAYMENT
3,2013-07-25 00:00:00,12111,COMPLETE


In [5]:
%sql select * from customers limit 3;

 * postgresql://retail_user:***@localhost:5432/retail_db
3 rows affected.


customer_id,customer_fname,customer_lname,customer_email,customer_password,customer_street,customer_city,customer_state,customer_zipcode
1,Richard,Hernandez,XXXXXXXXX,XXXXXXXXX,6303 Heather Plaza,Brownsville,TX,78521
2,Mary,Barrett,XXXXXXXXX,XXXXXXXXX,9526 Noble Embers Ridge,Littleton,CO,80126
3,Ann,Smith,XXXXXXXXX,XXXXXXXXX,3422 Blue Pioneer Bend,Caguas,PR,725


In [195]:
%%sql


SELECT c.customer_id,
       c.customer_fname,
       c.customer_lname,
       COUNT(o.order_id) AS customer_order_count
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.order_customer_id
WHERE to_char(o.order_date, 'yyyy-MM') ~ '2014-01'
GROUP BY (c.customer_id,
          c.customer_fname,
          c.customer_lname)
ORDER BY customer_order_count DESC,
         customer_id
LIMIT 30;


    


 * postgresql://retail_user:***@localhost:5432/retail_db
30 rows affected.


customer_id,customer_fname,customer_lname,customer_order_count
8622,Shirley,Smith,5
9676,Theresa,Smith,5
7,Melissa,Wilcox,4
222,Frank,Ruiz,4
2444,Kenneth,Smith,4
2485,Mary,Hernandez,4
2555,Mary,Long,4
3128,Karen,Turner,4
3199,Ashley,Hernandez,4
3610,Jordan,Smith,4


### Exercise 2 - Dormant Customers

Get the customer details who have not placed any order for the month of 2014 January.
* Tables - orders and customers
* Data should be sorted in ascending order by customer_id
* Output should contain all the fields from customers

***********************************************************************************

have 30 Dormant Customers all the time and none  Dormant Customers for 2014-01

trong example này thì vẫn được ? https://www.postgresqltutorial.com/postgresql-full-outer-join/



In [202]:
%%sql

SELECT c.customer_id,
       c.customer_fname,
       c.customer_lname,
       COUNT(o.order_id) AS customer_order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.order_customer_id
-- WHERE to_char(o.order_date, 'yyyy-MM') ~ '2014-01'
GROUP BY (c.customer_id,
          c.customer_fname,
          c.customer_lname)
HAVING COUNT(o.order_id) < 1
ORDER BY customer_id
LIMIT 250;

 * postgresql://retail_user:***@localhost:5432/retail_db
30 rows affected.


customer_id,customer_fname,customer_lname,customer_order_count
219,Mary,Harrell,0
339,Mary,Greene,0
469,Randy,Smith,0
1187,Dorothy,Vazquez,0
1481,Grace,Smith,0
1808,Albert,Ellison,0
2073,Donna,Stephens,0
2096,Jose,Tanner,0
2450,James,Smith,0
4555,Mary,Smith,0


In [222]:
%%sql select * from  order_items 

where order_item_subtotal < 50
limit 10;

 * postgresql://retail_user:***@localhost:5432/retail_db
10 rows affected.


order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
5,4,897,2,49.98,24.99
27,10,917,1,21.99,21.99
31,11,1014,1,49.98,49.98
106,42,627,1,39.99,39.99
119,48,1014,1,49.98,49.98
120,48,886,2,49.98,24.99
137,58,775,2,19.98,9.99
174,71,627,1,39.99,39.99
214,94,1014,1,49.98,49.98
248,110,278,1,44.99,44.99


### Exercise 3 - Revenue Per Customer

Get the revenue generated by each customer for the month of 2014 January
* Tables - orders, order_items and customers
* Data should be sorted in descending order by revenue and then ascending order by customer_id
* Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.
* If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.
* Consider only COMPLETE and CLOSED orders

***********************************************************************************************************

chưa xử lý được ý về không có order thì revenue = 0

hiện nay chưa có được các khách hàng có số order = 0  trên toàn bộ giải thời gian
 
may mắn là query đúng thì 2014-01 cũng sẽ không có order count nào = 0


count theo order_item_order_id thì sẽ ra được 359 khách hàng không có order trong 2014-01 ?? --> chưa xử lý được revenue cho các customer này về 0

In [246]:
%%sql

SELECT c.customer_id,
       c.customer_fname, c.customer_lname,
       --COUNT(o.order_id),
        COUNT(oi.order_item_order_id),
         SUM(oi.order_item_subtotal) as customer_revenue
FROM customers c  LEFT JOIN orders o ON c.customer_id = o.order_customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_item_order_id
WHERE to_char(o.order_date, 'yyyy-MM') ~ '2014-01' AND o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY (c.customer_id, c.customer_fname, c.customer_lname)

--HAVING COUNT(oi.order_item_order_id) < 1 
HAVING SUM(oi.order_item_subtotal) IS NOT NULL
ORDER BY  customer_revenue DESC, c.customer_id
--LIMIT 1000;
LIMIT 20;

 * postgresql://retail_user:***@localhost:5432/retail_db
20 rows affected.


customer_id,customer_fname,customer_lname,count,customer_revenue
2555,Mary,Long,13,2954.630000000001
3465,Mary,Gardner,10,2929.74
3710,Ashley,Smith,12,2739.82
1780,Larry,Sharp,11,2689.65
986,Catherine,Hawkins,5,2629.9
9676,Theresa,Smith,12,2599.84
1847,Mary,Smith,8,2589.87
11901,Mary,Smith,10,2469.8700000000003
4618,Andrea,Smith,8,2429.82
10896,Victoria,Smith,11,2419.78


### Exercise 4 - Revenue Per Category

Get the revenue generated for each category for the month of 2014 January
* Tables - orders, order_items, products and categories
* Data should be sorted in ascending order by category_id.
* Output should contain all the fields from category along with the revenue as category_revenue.
* Consider only COMPLETE and CLOSED orders

cần order để có order (date + status) + , order_items để có subtotal ---> revenue, cateogry (category id, depart id, category name), product

In [21]:


%%sql
--58 unique cat id
select * from categories limit 1;



 * postgresql://retail_user:***@localhost:5432/retail_db
1 rows affected.


category_id,category_department_id,category_name
1,2,Football


In [27]:
%%sql
-- 55 unique  cat id ? --> có 3 cat_Id không sinh lợi luận --> sum = NONE làm sao để thành 0
select * from products limit 1;

 * postgresql://retail_user:***@localhost:5432/retail_db
1 rows affected.


product_id,product_category_id,product_name,product_description,product_price,product_image
1,2,Quest Q64 10 FT. x 10 FT. Slant Leg Instant U,,59.98,http://images.acmesports.sports/Quest+Q64+10+FT.+x+10+FT.+Slant+Leg+Instant+Up+Canopy


In [6]:
%%sql
select * from orders limit 1;

 * postgresql://retail_user:***@localhost:5432/retail_db
1 rows affected.


order_id,order_date,order_customer_id,order_status
1,2013-07-25 00:00:00,11599,CLOSED


In [8]:
%%sql
select  distinct from order_items limit 1;

 * postgresql://retail_user:***@localhost:5432/retail_db
1 rows affected.


order_item_id,order_item_order_id,order_item_product_id,order_item_quantity,order_item_subtotal,order_item_product_price
1,1,957,1,299.98,299.98


In [49]:
%%sql
SELECT c.category_id,
       c.category_department_id,
       c.category_name,
       SUM(oi.order_item_subtotal) AS category_revenue,
       COUNT(o.order_id) AS category_order_count
FROM categories c
LEFT JOIN products p ON c.category_id = p.product_category_id
JOIN order_items oi ON p.product_id = oi.order_item_product_id
JOIN orders o ON oi.order_item_order_id = o.order_id
WHERE to_char(o.order_date, 'yyyy-MM') ~ '2014-01'
  AND o.order_status IN ('COMPLETE',
                         'CLOSED')
GROUP BY (c.category_id,
          c.category_department_id,
          c.category_name)
ORDER BY c.category_id
LIMIT 35;

 * postgresql://retail_user:***@localhost:5432/retail_db
33 rows affected.


category_id,category_department_id,category_name,category_revenue,category_order_count
2,2,Soccer,1094.88,7
3,2,Baseball & Softball,3214.409999999999,20
4,2,Basketball,1299.98,2
5,2,Lacrosse,1299.69,9
6,2,Tennis & Racquet,1124.75,11
7,2,Hockey,1433.0,21
9,3,Cardio Equipment,133156.7700000003,460
10,3,Strength Training,3388.96,5
11,3,Fitness Accessories,1509.73,12
12,3,Boxing & MMA,3998.460000000001,19


### Exercise 5 - Product Count Per Department

Get the products for each department.
* Tables - departments, categories, products
* Data should be sorted in ascending order by department_id
* Output should contain all the fields from department and the product count as product_count

In [45]:
%%sql
select * from departments limit 100;

 * postgresql://retail_user:***@localhost:5432/retail_db
6 rows affected.


department_id,department_name
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop


In [48]:
%%sql

select d.department_id, d.department_name,
COUNT(p.product_id)

FROM departments d LEFT JOIN categories c ON d.department_id = c.category_department_id
LEFT JOIN products p ON c.category_id = p.product_category_id
GROUP BY d.department_id
ORDER BY d.department_id;

 * postgresql://retail_user:***@localhost:5432/retail_db
6 rows affected.


department_id,department_name,count
2,Fitness,168
3,Footwear,168
4,Apparel,140
5,Golf,120
6,Outdoors,336
7,Fan Shop,149
