# Practice 5 - Products and revenue value
This notebook uses four tables in the Olist database which are olist_orders_dataset (renamed to orders), olist_order_payments_dataset (renamed to order_payments), olist_order_products_dataset renamed to products), and product_category_name_translation (renamed to translation).<br>
Three questions are:

1. **What is the total revenue each year?**
3. **Which product category brings the highest revenue each year?**
2. **Which product category has the highest number of canceled orders each year?**

## Connect and load in the database

In [1]:
%load_ext sql
%sql mysql+mysqlconnector://root:***@localhost/olist

'Connected: root@olist'

### Relational Schema <br>
<img src="files/photos/P5.png">

## SQL queries
### Total revenue

To find the total revenue each year, I would examine delivered orders only since they are completed orders and normally full paid.

In [2]:
%%sql
SELECT YEAR(order_purchase_timestamp) AS year,
       SUM(revenue) AS total_revenue
FROM (SELECT order_id,
             ROUND(SUM(price) + SUM(freight_value),2) AS revenue
      FROM order_items
      GROUP BY order_id) a
JOIN orders o
ON a.order_id = o.order_id
WHERE order_status = 'delivered'
GROUP BY year
ORDER BY year;

 * mysql+mysqlconnector://root:***@localhost/olist
3 rows affected.


year,total_revenue
2016,46653.74
2017,6921535.24
2018,8451584.77


Assuming that Olist started in 2016, I also want to know the percentage change of total revenue value each year.

In [3]:
%%sql
SELECT t2.*,
       COALESCE(ROUND((t2.total_revenue - t1.total_revenue)/
                      t1.total_revenue*100,2),0)
       AS percentage_change
FROM (SELECT YEAR(order_purchase_timestamp) AS year,
             SUM(revenue) AS total_revenue
      FROM (SELECT order_id,
                   ROUND(SUM(price) + SUM(freight_value),2)
                   AS revenue
            FROM order_items
            GROUP BY order_id) a
      JOIN orders o
      ON a.order_id = o.order_id
      WHERE order_status = 'delivered'
      GROUP BY year) t1
RIGHT JOIN (SELECT YEAR(order_purchase_timestamp) AS year,
                  SUM(revenue) AS total_revenue
           FROM (SELECT order_id,
                        ROUND(SUM(price) + SUM(freight_value),2)
                        AS revenue
                 FROM order_items
                 GROUP BY order_id) a
           JOIN orders o
           ON a.order_id = o.order_id
           WHERE order_status = 'delivered'
           GROUP BY year) t2
ON t2.year = t1.year + 1
ORDER BY t2.year;

 * mysql+mysqlconnector://root:***@localhost/olist
3 rows affected.


year,total_revenue,percentage_change
2016,46653.74,0.0
2017,6921535.24,14735.97
2018,8451584.77,22.11


We have the top 3 sellers who have the highest total revenue each year as follow:

In [4]:
%%sql
SELECT year, seller_id, total_revenue, value_rank
FROM (SELECT year, seller_id,
             SUM(t1.revenue) AS total_revenue,
             RANK() OVER (PARTITION BY year ORDER BY SUM(t1.revenue) DESC)
             AS value_rank
      FROM (SELECT order_id, YEAR(order_purchase_timestamp) AS year
            FROM orders
            WHERE order_status = 'delivered') o
      JOIN (SELECT seller_id, order_id,
                   ROUND(SUM(price) + SUM(freight_value),2)
                   AS revenue
            FROM order_items
            GROUP BY seller_id, order_id) t1
      ON o.order_id = t1.order_id
      GROUP BY year, seller_id) t2
WHERE value_rank IN (1,2,3);

 * mysql+mysqlconnector://root:***@localhost/olist
9 rows affected.


year,seller_id,total_revenue,value_rank
2016,620c87c171fb2a6dd6e8bb4dec959fc6,5309.75,1
2016,822b63912576852aea9a8436d72317b7,2934.28,2
2016,46dc3b2cc0980fb8ec44634e21d2718e,2426.91,3
2017,53243585a1d6dc2643021fd1853d8905,185858.39,1
2017,7e93a43ef30c4f03f38b393420bc753a,155059.42,2
2017,4a3ca9315b744ce9f8e9374361493884,145094.22,3
2018,4869f7a5dfa277a7dca6462dcf3b52b2,148906.43,1
2018,955fee9216a65b617aa5c0531780ce60,135344.66,2
2018,1025f0e2d44d7041d6cf58b6550e0bfa,130288.87,3


### Revenue by product category

Queries in this part will be based on product categories. Let's have a quick check to see if we have all product category names translated.

In [5]:
%%sql
SELECT product_category_name
FROM products p
WHERE NOT EXISTS (SELECT product_category_name
                 FROM translation t
                 WHERE  p.product_category_name = t.product_category_name)
GROUP BY product_category_name;

 * mysql+mysqlconnector://root:***@localhost/olist
4 rows affected.


product_category_name
moveis_cozinha_area_de_servico_jantar_e_jardi
pc_gamer
portateis_cozinha_e_preparadores_de_alimentos


So there're 3 categories that don't have english translation, and we also have missing product categories. I also need to know if one order has one type of products or more.

In [6]:
%%sql
SELECT order_id,
       COUNT(DISTINCT product_id) AS num_products
FROM order_items
GROUP BY order_id
ORDER BY num_products DESC
LIMIT 10;

 * mysql+mysqlconnector://root:***@localhost/olist
10 rows affected.


order_id,num_products
ca3625898fbd48669d50701aba51cd5f,8
ad850e69fce9a512ada84086651a2e7d,7
77df84f9195be22a4e9cb72ca9e8b4c2,7
7d8f5bfd5aff648220374a2df62e84d5,7
5a3b1c29a49756e75f1ef513383c0c12,6
1c11d0f4353b31ac3417fbfa5f0f2a8a,6
aa0b425987bdeae4a29c616a2bc3a08a,6
3990f96693d321ac142fff312bf3706a,6
5efc0b7fe9df7f0c567404abaa4d25fc,6
200f4d883fcc701355e46b8c6035743f,6


Since one order can have different products, I have to use both order_id and product_id in the next query; a LEFT JOIN to the translation table is also needed just in case the top category names are not translated. And again, we choose delivered orders only.

In [7]:
%%sql
SELECT year, product_category_name, product_category_name_english, total_revenue
FROM (SELECT year, p.product_category_name, t.product_category_name_english,
             SUM(t1.revenue) AS total_revenue,
             RANK() OVER (PARTITION BY year ORDER BY SUM(t1.revenue) DESC)
             AS value_rank
      FROM (SELECT order_id, YEAR(order_purchase_timestamp) AS year
            FROM orders
            WHERE order_status = 'delivered') o
      JOIN (SELECT order_id, product_id,
                   ROUND(SUM(price) + SUM(freight_value),2)
                   AS revenue
            FROM order_items
            GROUP BY order_id, product_id) t1
      ON o.order_id = t1.order_id
      JOIN products p
      ON t1.product_id = p.product_id
      LEFT JOIN translation t
      ON p.product_category_name = t.product_category_name
      GROUP BY year, p.product_category_name, t.product_category_name_english) t3
WHERE value_rank = 1;

 * mysql+mysqlconnector://root:***@localhost/olist
3 rows affected.


year,product_category_name,product_category_name_english,total_revenue
2016,moveis_decoracao,furniture_decor,6899.35
2017,cama_mesa_banho,bed_bath_table,580949.2
2018,beleza_saude,health_beauty,866810.34


### Canceled orders by category

The last query has the same structure as the previous one with small changes in order status and aggregation function to calculate the total canceled orders by category each year.

In [8]:
%%sql
SELECT year, product_category_name, product_category_name_english, total_canceled_orders
FROM (SELECT year, p.product_category_name, t.product_category_name_english,
             SUM(t1.num_canceled_orders) AS total_canceled_orders,
             RANK() OVER (PARTITION BY year ORDER BY SUM(t1.num_canceled_orders) DESC)
             AS value_rank
      FROM (SELECT order_id, YEAR(order_purchase_timestamp) AS year
            FROM orders
            WHERE order_status = 'canceled') o
      JOIN (SELECT order_id, product_id,
                   COUNT(order_id)
                   AS num_canceled_orders
            FROM order_items
            GROUP BY order_id, product_id) t1
      ON o.order_id = t1.order_id
      JOIN products p
      ON t1.product_id = p.product_id
      LEFT JOIN translation t
      ON p.product_category_name = t.product_category_name
      GROUP BY year, p.product_category_name, t.product_category_name_english) t3
WHERE value_rank = 1;

 * mysql+mysqlconnector://root:***@localhost/olist
3 rows affected.


year,product_category_name,product_category_name_english,total_canceled_orders
2016,brinquedos,toys,3
2017,esporte_lazer,sports_leisure,25
2018,beleza_saude,health_beauty,27
