# **Consecutive Available Seats**

## **Problem Statement**
We are given a cinema seat table like this:

| seat_id | free |
|---------|------|
| 1       | 1    |
| 2       | 0    |
| 3       | 1    |
| 4       | 1    |
| 6       | 1    |

We need to return the `seat_id`s of seats that are **part of at least two consecutive available seats (`free = 1`)**, ordered by `seat_id`.

---

## **Approach 1: PySpark DataFrame API**

### **Steps**
1. Filter only the free seats (`free == 1`).
2. Use a **window** with `lag()` and `lead()` to compare with neighboring seat statuses.
3. Keep the rows where **either the previous or the next seat is also free**.
4. Return the `seat_id`s ordered by `seat_id`.

### **Code**

In [11]:
from pyspark.sql import SparkSession
from pyspark.sql.window import Window
from pyspark.sql.functions import col, lag, lead

# Initialize Spark
spark = SparkSession.builder.appName("CinemaSeats").getOrCreate()

# Sample Data
data = [
    (1, 1),  # seat_id, free
    (2, 0),
    (3, 1),
    (4, 1),
    (5, 0),
    (6, 1)
]
columns = ["seat_id", "free"]

# Create DataFrame
cinema_df = spark.createDataFrame(data, columns)

# Define Window
window = Window.orderBy("seat_id")

# Find seats with adjacent free seats
result_df = cinema_df.withColumn("prev_free", lag("free", 1).over(window)) \
                     .withColumn("next_free", lead("free", 1).over(window)) \
                     .filter((col("free") == 1) & ((col("prev_free") == 1) | (col("next_free") == 1))) \
                     .select("seat_id") \
                     .orderBy("seat_id")

result_df.show()


StatementMeta(, 62b9881d-1c3c-4872-9341-4ea4610ccb9b, 13, Finished, Available, Finished)

+-------+
|seat_id|
+-------+
|      3|
|      4|
+-------+



---

## **Approach 2: SQL Query in PySpark**

### **Steps**
1. Register the DataFrame as a temporary view.
2. Use SQL with `LAG()` and `LEAD()` window functions to find previous and next seat's availability.
3. Filter rows where seat is free and has a neighboring free seat.

### **Code**

In [12]:
cinema_df.createOrReplaceTempView("cinema")

spark.sql("""
    SELECT seat_id
    FROM (
        SELECT seat_id,
               free,
               LAG(free) OVER (ORDER BY seat_id) AS prev_free,
               LEAD(free) OVER (ORDER BY seat_id) AS next_free
        FROM cinema
    ) t
    WHERE free = 1 AND (prev_free = 1 OR next_free = 1)
    ORDER BY seat_id
""").show()

StatementMeta(, 62b9881d-1c3c-4872-9341-4ea4610ccb9b, 14, Finished, Available, Finished)

+-------+
|seat_id|
+-------+
|      3|
|      4|
+-------+



---

## **Summary**

| Approach     | Method               | Highlights                                        |
|--------------|----------------------|---------------------------------------------------|
| Approach 1   | PySpark DataFrame API| Uses difference grouping for sequence blocks      |
| Approach 2   | SQL Query in PySpark | Uses window functions to detect adjacency         |