# **Customers Who Never Order**

## **Problem Statement**
You are given two tables — `Customers` and `Orders`. Write a SQL query to find all customers who **never placed any orders**.

---

## **Table: Customers**

| Column Name | Type    |
|-------------|---------|
| Id          | int     |
| NameCust    | varchar |

- `Id` is the primary key for this table.

---

## **Table: Orders**

| Column Name | Type    |
|-------------|---------|
| Id          | int     |
| CustomerId  | int     |

- `CustomerId` is a foreign key that references `Id` in the `Customers` table.

---

## **Example**

### **Input**

**Customers Table:**

| Id | NameCust |
|----|----------|
| 1  | Joe      |
| 2  | Henry    |
| 3  | Sam      |
| 4  | Max      |

**Orders Table:**

| Id | CustomerId |
|----|------------|
| 1  | 3          |
| 2  | 1          |

### **Expected Output**

| Customers |
|-----------|
| Henry     |
| Max       |

---

## **Approach 1: PySpark DataFrame API**

### **Steps**
1. Create DataFrames for `Customers` and `Orders`.
2. Perform a **left anti join** to keep only those customers who don’t appear in `Orders`.
3. Select the `NameCust` column.

### **Code**


In [4]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

# Step 1: Initialize Spark
spark = SparkSession.builder.appName("CustomersWithoutOrders").getOrCreate()

# Step 2: Define Customers and Orders data
customers_data = [(1, "Joe"), (2, "Henry"), (3, "Sam"), (4, "Max")]
orders_data = [(1, 3), (2, 1)]

customers_df = spark.createDataFrame(customers_data, ["Id", "NameCust"])
orders_df = spark.createDataFrame(orders_data, ["Id", "CustomerId"])

# Step 3: Left anti join to get customers who didn't place orders
no_orders_df = customers_df.join(orders_df, customers_df.Id == orders_df.CustomerId, "left_anti")

# Step 4: Select result
no_orders_df.select(col("NameCust").alias("Customers")).show()

StatementMeta(, 2ddad3e5-70a4-4eff-bad9-a0ad23abbe87, 6, Finished, Available, Finished)

+---------+
|Customers|
+---------+
|    Henry|
|      Max|
+---------+



---

## **Approach 2: SQL Query in PySpark**

### **Steps**
1. Register DataFrames as SQL views.
2. Use a `LEFT JOIN` to find customers without matching records in `Orders`.
3. Use `WHERE orders.Id IS NULL` to filter only those who did not order.

### **Code**

In [5]:
customers_df.createOrReplaceTempView("Customers")
orders_df.createOrReplaceTempView("Orders")

result = spark.sql("""
    SELECT c.NameCust AS Customers
    FROM Customers c
    LEFT JOIN Orders o ON c.Id = o.CustomerId
    WHERE o.Id IS NULL
""")

result.show()

StatementMeta(, 2ddad3e5-70a4-4eff-bad9-a0ad23abbe87, 7, Finished, Available, Finished)

+---------+
|Customers|
+---------+
|    Henry|
|      Max|
+---------+



---

## **Summary**

| Approach         | Method               | Key Technique         |
|------------------|----------------------|------------------------|
| **Approach 1**   | PySpark DataFrame API| `left_anti join`       |
| **Approach 2**   | SQL in PySpark       | `LEFT JOIN + IS NULL`  |