# Customers and Products Analysis Using SQL


## Introduction

The goal of this project is to analyse data from a sales records database for scale model cars
and extract information for decision-making. The scale model cars database contains eight tables:

- **customers:** customer data
- **employees:** all employee information
- **offices:** sales office information
- **orders:** customers' sales orders
- **orderdetails:** sales order line for each sales order
- **payments:** customers' payment records
- **products:** a list of scale model cars
- **productlines:** a list of product line categories

Good analysis starts with questions. Below are the questions we want to answer for this project.

- *Question 1*: Which products should we order more of or less of?
- *Question 2*: How should we tailor marketing and communication strategies to customer behaviours?
- *Question 3*: How much can we spend on acquiring new customers?

## Scale Model Cars Database

First of all, we'll connect our Jupyter Notebook to our database file using the following code.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///stores.db

Next, let's explore the database by writing a query to display the table names, the number of columns
and the number of rows in each of them.

In [2]:
%%sql

SELECT 'Customers' AS table_name,
       (SELECT COUNT(*)
          FROM pragma_table_info('customers')) AS number_of_columns,
       COUNT(*) AS number_of_rows
  FROM customers

 UNION ALL

SELECT 'Products' AS table_name,
       (SELECT COUNT(*)
          FROM pragma_table_info('products')) AS number_of_columns,
       COUNT(*) AS number_of_rows
  FROM products
  
 UNION ALL

SELECT 'ProductLines' AS table_name,
       (SELECT COUNT(*)
          FROM pragma_table_info('productlines')) AS number_of_columns,
       COUNT(*) AS number_of_rows
  FROM productlines

 UNION ALL

SELECT 'Orders' AS table_name,
       (SELECT COUNT(*)
          FROM pragma_table_info('orders')) AS number_of_columns,
       COUNT(*) AS number_of_rows
  FROM orders

 UNION ALL

SELECT 'OrderDetails' AS table_name,
       (SELECT COUNT(*)
          FROM pragma_table_info('orderdetails')) AS number_of_columns,
       COUNT(*) AS number_of_rows
  FROM orderdetails

 UNION ALL

SELECT 'Payments' AS table_name,
       (SELECT COUNT(*)
          FROM pragma_table_info('payments')) AS number_of_columns,
       COUNT(*) AS number_of_rows
  FROM payments

 UNION ALL

SELECT 'Employees' AS table_name,
       (SELECT COUNT(*)
          FROM pragma_table_info('employees')) AS number_of_columns,
       COUNT(*) AS number_of_rows
  FROM employees

 UNION ALL

SELECT 'Offices' AS table_name,
       (SELECT COUNT(*)
          FROM pragma_table_info('offices')) AS number_of_columns,
       COUNT(*) AS number_of_rows
  FROM offices;

 * sqlite:///stores.db
Done.


table_name,number_of_columns,number_of_rows
Customers,13,122
Products,9,110
ProductLines,4,7
Orders,7,326
OrderDetails,5,2996
Payments,4,273
Employees,8,23
Offices,9,7


## Question 1: Which Products Should We Order More of or Less of?
  
Now that we know the database a little better, we can answer the first question: *Which products should we order more of or less of?* 

This question refers to inventory reports, including low stock and product performance. This will optimise the supply and the user experience by preventing the best-selling products from going out-of-stock.

- The **low stock** represents the quantity of `each product sold divided by the quantity of product in stock`. We'll consider the ten highest rates. These will be the top ten products that are (almost) out-of-stock.
- The **product performance** represents the `sum of sales per product`.
- **Priority products for restocking** are those with `high product performance` that are on the brink of being `out of stock`.

In [3]:
%%sql

SELECT p.productCode, p.productName, p.productLine,
       ROUND(SUM(od.quantityOrdered * 1.0) / p.quantityInStock, 2) AS low_stock,
       ROUND(SUM(od.quantityOrdered * od.priceEach), 2) AS product_performance
  FROM orderdetails AS od
  JOIN products AS p
    ON od.productCode = p.productCode
 GROUP BY p.productCode
 ORDER BY product_performance DESC, low_stock DESC
 LIMIT 10;

 * sqlite:///stores.db
Done.


productCode,productName,productLine,low_stock,product_performance
S18_3232,1992 Ferrari 360 Spider red,Classic Cars,0.22,276839.98
S12_1108,2001 Ferrari Enzo,Classic Cars,0.28,190755.86
S10_1949,1952 Alpine Renault 1300,Classic Cars,0.13,190017.96
S10_4698,2003 Harley-Davidson Eagle Drag Bike,Motorcycles,0.18,170686.0
S12_1099,1968 Ford Mustang,Classic Cars,13.72,161531.48
S12_3891,1969 Ford Falcon,Classic Cars,0.92,152543.02
S18_1662,1980s Black Hawk Helicopter,Planes,0.2,144959.91
S18_2238,1998 Chrysler Plymouth Prowler,Classic Cars,0.21,142530.63
S18_1749,1917 Grand Touring Sedan,Vintage Cars,0.34,140535.6
S12_2823,2002 Suzuki XREO,Motorcycles,0.1,135767.03


Classic cars are the priority for restocking. They sell frequently, and they are the highest-performance products.
 
## Question 2: How Should We Match Marketing and Communication Strategies to Customer Behaviours?

In the first part of this project, we explored products. Now we'll explore customer information by answering the second question: *How should we match marketing and communication strategies to customer behaviours?*

This involves categorising customers: finding the VIP (very important person) customers and those who are less engaged.
- VIP customers bring in the most profit for the store.
- Less-engaged customers bring in less profit.

Before we begin, let's compute how much profit each customer generates.

In [4]:
%%sql

SELECT o.customerNumber,
       ROUND(SUM(od.quantityOrdered * (od.priceEach - p.buyPrice)), 2) AS profit
  FROM products AS p
  JOIN orderdetails AS od
    ON p.productCode = od.productCode
  JOIN orders AS o
    ON od.orderNumber = o.orderNumber
 GROUP BY o.customerNumber;

 * sqlite:///stores.db
Done.


customerNumber,profit
103,10063.8
112,31312.72
114,70311.07
119,60875.3
121,41391.52
124,236769.39
128,27728.34
129,28092.43
131,58669.1
141,326519.66


## Finding the VIP and Less Engaged Customers

In [5]:
%%sql

-- Top 5 VIP customers

WITH

revenue AS (
SELECT o.customerNumber,
       ROUND(SUM(od.quantityOrdered * (od.priceEach - p.buyPrice)), 2) AS profit
  FROM products AS p
  JOIN orderdetails AS od
    ON p.productCode = od.productCode
  JOIN orders AS o
    ON od.orderNumber = o.orderNumber
 GROUP BY o.customerNumber
)
 
SELECT c.contactLastName, c.contactFirstName, c.city, c.country,
       r.profit
  FROM customers AS c
  JOIN revenue AS r
    ON c.customerNumber = r.customerNumber
 ORDER BY r.profit DESC
 LIMIT 5;

 * sqlite:///stores.db
Done.


contactLastName,contactFirstName,city,country,profit
Freyre,Diego,Madrid,Spain,326519.66
Nelson,Susan,San Rafael,USA,236769.39
Young,Jeff,NYC,USA,72370.09
Ferguson,Peter,Melbourne,Australia,70311.07
Labrune,Janine,Nantes,France,60875.3


In [6]:
%%sql

-- Top 5 least engaged customers

WITH

revenue AS (
SELECT o.customerNumber,
       ROUND(SUM(od.quantityOrdered * (od.priceEach - p.buyPrice)), 2) AS profit
  FROM products AS p
  JOIN orderdetails AS od
    ON p.productCode = od.productCode
  JOIN orders AS o
    ON od.orderNumber = o.orderNumber
 GROUP BY o.customerNumber
)

SELECT c.contactLastName, c.contactFirstName, c.city, c.country,
       r.profit
  FROM customers AS c
  JOIN revenue AS r
    ON c.customerNumber = r.customerNumber
 ORDER BY r.profit ASC
 LIMIT 5;

 * sqlite:///stores.db
Done.


contactLastName,contactFirstName,city,country,profit
Young,Mary,Glendale,USA,2610.87
Taylor,Leslie,Brickhaven,USA,6586.02
Ricotti,Franco,Milan,Italy,9532.93
Schmitt,Carine,Nantes,France,10063.8
Smith,Thomas,London,UK,10868.04


Now that we have the most-important and least-committed customers, we can determine how to drive loyalty and attract more customers.

## Question 3: How Much Can We Spend on Acquiring New Customers?

Before answering this question, let's find the number of new customers arriving each month. That way we can check if it's worth spending money on acquiring new customers. 

This query helps to find these numbers.

In [7]:
%%sql

WITH 

payment_with_year_month_table AS (
SELECT *, 
       CAST(SUBSTR(paymentDate, 1,4) AS INTEGER)*100 + CAST(SUBSTR(paymentDate, 6,7) AS INTEGER) AS year_month
  FROM payments p
),

customers_by_month_table AS (
SELECT p1.year_month, COUNT(*) AS number_of_customers, SUM(p1.amount) AS total
  FROM payment_with_year_month_table p1
 GROUP BY p1.year_month
),

new_customers_by_month_table AS (
SELECT p1.year_month, 
       COUNT(*) AS number_of_new_customers,
       SUM(p1.amount) AS new_customer_total,
       (SELECT number_of_customers
          FROM customers_by_month_table c
        WHERE c.year_month = p1.year_month) AS number_of_customers,
       (SELECT total
          FROM customers_by_month_table c
         WHERE c.year_month = p1.year_month) AS total
  FROM payment_with_year_month_table p1
 WHERE p1.customerNumber NOT IN (SELECT customerNumber
                                   FROM payment_with_year_month_table p2
                                  WHERE p2.year_month < p1.year_month)
 GROUP BY p1.year_month
)

SELECT year_month, 
       ROUND(number_of_new_customers*100/number_of_customers,1) AS number_of_new_customers_props,
       ROUND(new_customer_total*100/total,1) AS new_customers_total_props
  FROM new_customers_by_month_table;

 * sqlite:///stores.db
Done.


year_month,number_of_new_customers_props,new_customers_total_props
200301,100.0,100.0
200302,100.0,100.0
200303,100.0,100.0
200304,100.0,100.0
200305,100.0,100.0
200306,100.0,100.0
200307,75.0,68.3
200308,66.0,54.2
200309,80.0,95.9
200310,69.0,69.3


As we can see, the number of clients has been decreasing since 2003, and in 2004, we had the lowest values. The year 2005, which is present in the database as well, isn't present in the table above, this means that the store has not had any new customers since September of 2004. This means it makes sense to spend money acquiring new customers.

To determine how much money we can spend acquiring new customers, we can compute the Customer Lifetime Value (LTV), which represents the average amount of money a customer generates. We can then determine how much we can spend on marketing.

In [8]:
%%sql

-- Customer Lifetime Value (LTV)

WITH

revenue AS (
SELECT o.customerNumber,
       ROUND(SUM(od.quantityOrdered * (od.priceEach - p.buyPrice)), 2) AS profit
  FROM products AS p
  JOIN orderdetails AS od
    ON p.productCode = od.productCode
  JOIN orders AS o
    ON od.orderNumber = o.orderNumber
 GROUP BY o.customerNumber
)

SELECT ROUND(AVG(profit), 2) as LTV
  FROM revenue;

 * sqlite:///stores.db
Done.


LTV
39039.59


LTV tells us how much profit an average customer generates during their lifetime with our store. We can use it to predict our future profit. So, if we get ten new customers next month, we'll earn 390,395 dollars, and we can decide based on this prediction how much we can spend on acquiring new customers.

## Conclusion

In this project, we found that classic cars sell frequently, and they are the highest-performance products, therefore, they are the priority for restocking. Based on customers spendings, we found the most important and least-committed ones, so we can determine how to drive loyalty and attract more customers. 

As we saw, the number of clients has been decreasing, which means it makes sense to spend money acquiring new customers. We computed the Customer Lifetime Value (LTV), so now we can decide based on this result how much we can spend on acquiring new customers.