# Customers Who Never Order

**QUESTION**

Suppose that a website contains two tables, the Customers table and the Orders table.
Write a SQL query to find all customers who never order anything.

EXAMPLES:

Table: Customers

    +----+-------+
    | Id | Name  |
    +----+-------+
    | 1  | Joe   |
    | 2  | Henry |
    | 3  | Sam   |
    | 4  | Max   |
    +----+-------+

Table: Orders

    +----+------------+
    | Id | CustomerId |
    +----+------------+
    | 1  | 3          |
    | 2  | 1          |
    +----+------------+

Expected Output:

    +-----------+
    | Customers |
    +-----------+
    | Henry     |
    | Max       |
    +-----------+

**TECHNIQUES:**
  - SELECT DISTINCT
  - NOT IN (...)
  - LEFT JOIN ... ON ...
  - WITH cte AS (...)

**REFERENCE:**
  - https://leetcode.com/articles/customers-who-never-order/



## Prepare the test data

- Need to install `sqlalchemy` and appropriate drivers (e.g. `mysqlclient` for MySQL).
- Also need to dfine the connection URL.

In [1]:
import pandas as pd
import numpy as np
import os
import sqlalchemy as db

# Create Database Connection.
# Connection info is defined in an environment variable.
CONN_URL = os.environ['TEST_URL']
engine = db.create_engine(CONN_URL)

In [2]:
# Populate the test data
def populate_data(engine, table, data, path):
    """Upload the data to Database and write it to a TSV file."""
    df_tmp = pd.DataFrame(data)
    # Save it to a table
    if engine:
        print("> Uploading data to {} table...".format(table))
        df_tmp.to_sql(table, con=engine, index=False, if_exists='replace')
    # Save it to a file
    if path:
        print("> Saving data to {}...".format(path))
        df_tmp.to_csv(path, index=False, sep="\t")

# Customers
customers_data = {"Id": [1, 2, 3, 4], "Name": ['Joe', 'Henry', 'Sam', 'Max']}
customers_path = "/tmp/customers.tsv"
populate_data(engine, "Customers", customers_data, customers_path)

# Orders
orders_data = {"Id": [1, 2], "CustomerId": [3, 1]}
orders_path = "/tmp/orders.tsv"
populate_data(engine, "Orders", orders_data, orders_path)


> Uploading data to Customers table...
> Saving data to /tmp/customers.tsv...
> Uploading data to Orders table...
> Saving data to /tmp/orders.tsv...


## Pandas Solutions

### Read Data

In [3]:
# Read data from database tables
df_customers = pd.read_sql("SELECT * FROM Customers", engine)
df_orders = pd.read_sql("SELECT * FROM Orders", engine)

display(df_customers.head())
display(df_orders.head())

Unnamed: 0,Id,Name
0,1,Joe
1,2,Henry
2,3,Sam
3,4,Max


Unnamed: 0,Id,CustomerId
0,1,3
1,2,1


In [10]:
# Or, read them from TSV files
df_customers = pd.read_csv(customers_path, sep="\t")
df_orders = pd.read_csv(orders_path, sep="\t")

display(df_customers.head())
display(df_orders.head())

Unnamed: 0,Id,Name
0,1,Joe
1,2,Henry
2,3,Sam
3,4,Max


Unnamed: 0,Id,CustomerId
0,1,3
1,2,1


### ~ isin()

- Use **unique()** to get unique customer IDs from the Orders table.
- Use **isin** to get customers who have orders.  Negate it with **~**
- Use **rename** to rename the `Name` column to `Customers`.
- Wrap the final Panda Series with **pd.DataFrame** to keep it as a DataFrame.
- Optionally **reset_index**.

In [5]:
pd.DataFrame(
    df_customers[~ df_customers.Id.isin(df_orders['CustomerId'].unique())]\
        .rename(columns={'Name':'Customers'})['Customers']\
        .reset_index(drop=True)
)

Unnamed: 0,Customers
0,Henry
1,Max


### Merge
- Select unique customer IDs from the Orders table; keep it as a DataFrame
- Then **merge** with the Customers DataFrame. 
- Set **indicator** to true.  It will create a column called `_merge`, whose values can be `left_only` or `both`.
- Again, wrap the final result in a DataFrame.

In [6]:
df_unique_orders = pd.DataFrame(df_orders['CustomerId'].unique(), columns=['Id'])
df_m = df_customers.merge(df_unique_orders, how='left', on='Id', indicator=True)
pd.DataFrame(
    df_m[df_m['_merge'] == 'left_only']['Name'].values, columns=['Customers']
)

Unnamed: 0,Customers
0,Henry
1,Max


## SQL Solutions

### NOT IN

In [7]:
SQL = """
SELECT Name as 'Customers'
    FROM Customers
    WHERE Id NOT IN (SELECT DISTINCT CustomerId from Orders);
"""
pd.read_sql(SQL, engine)

Unnamed: 0,Customers
0,Henry
1,Max


### JOIN

In [8]:
SQL = """
SELECT c.Name as 'Customers'
    FROM Customers c
    LEFT JOIN (
        SELECT DISTINCT CustomerId as CustomerId FROM Orders
    ) t
    ON c.Id = t.CustomerId
    WHERE t.CustomerId IS NULL;
"""
pd.read_sql(SQL, engine)

Unnamed: 0,Customers
0,Henry
1,Max


### WITH

In [9]:
SQL = """
WITH T AS (
    SELECT DISTINCT CustomerId FROM Orders
)
SELECT C.Name AS 'Customers'
    FROM Customers C
    LEFT JOIN T
    ON C.ID = T.CustomerId
    WHERE T.CustomerId IS NULL;
"""
pd.read_sql(SQL, engine)

Unnamed: 0,Customers
0,Henry
1,Max
