# **Biggest Single Number**

## **Problem Statement**
You are given a table `my_numbers` with a single column `num`. The column may contain duplicate values.  
Write a query to find the **largest number** that appears **only once** in the table.  
If no such number exists, return `null`.

---

### **Sample Input**

| num |
|-----|
| 8   |
| 8   |
| 3   |
| 3   |
| 1   |
| 4   |
| 5   |
| 6   |

### **Expected Output**

| num |
|-----|
| 6   |

---

## **Approach 1: PySpark DataFrame API**

### **Steps**
1. Group by `num` and count occurrences.
2. Filter for numbers that occur only once.
3. Take the maximum of those.

### **Code**

In [3]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, max as spark_max

# Step 1: Initialize Spark
spark = SparkSession.builder.appName("BiggestSingleNumber").getOrCreate()

# Step 2: Sample Data
data = [
    (8,),
    (8,),
    (3,),
    (3,),
    (1,),
    (4,),
    (5,),
    (6,)
]
columns = ["num"]

# Step 3: Create DataFrame
my_numbers_df = spark.createDataFrame(data, columns)

# Step 4: Process Data
result_df = my_numbers_df.groupBy("num") \
                         .agg(count("*").alias("cnt")) \
                         .filter(col("cnt") == 1) \
                         .agg(spark_max("num").alias("num"))

# Step 5: Display Result
result_df.show()

StatementMeta(, 9dc5f343-c28a-42ab-beef-42af4301a688, 5, Finished, Available, Finished)

+---+
|num|
+---+
|  6|
+---+



---

## **Approach 2: SQL Query in PySpark**

### **Steps**
1. Group numbers and count their frequencies.
2. Select only those with count = 1.
3. Get the max of that filtered list.

### **Code**

In [4]:
my_numbers_df.createOrReplaceTempView("my_numbers")

spark.sql("""
    SELECT MAX(num) AS num
    FROM (
        SELECT num
        FROM my_numbers
        GROUP BY num
        HAVING COUNT(*) = 1
    ) AS singles
""").show()

StatementMeta(, 9dc5f343-c28a-42ab-beef-42af4301a688, 6, Finished, Available, Finished)

+---+
|num|
+---+
|  6|
+---+



---

## **Summary**

| Approach     | Method               | Highlights                                        |
|--------------|----------------------|---------------------------------------------------|
| Approach 1   | PySpark DataFrame API| GroupBy with count, filter, then max              |
| Approach 2   | SQL Query in PySpark | Clean SQL using `GROUP BY`, `HAVING`, and `MAX()` |
