# PySpark SQL isin() Function: Checking Values in a List Easily

## Introduction to the `isin()` Function

The `isin()` function in PySpark is used to check whether a column's value exists in a given list of values. It works like an SQL `IN` clause, allowing you to filter rows where a column's value is present in the provided list. This function is especially useful when you want to match a column's value against a set of predefined options or categories.


## Basic Syntax:

```
DataFrame.filter(column.isin(list_of_values))
```

### Parameters:

- **`column`**: The column to check.
- **`list_of_values`**: The list of values to match the column against.


## Why Use `isin()`?

- It simplifies filtering based on a set of values and is often used for filtering rows based on specific categories, IDs, or other predefined lists.
- It can be more efficient and readable than chaining multiple equality conditions, especially when working with large datasets or multiple values.


## Practical Examples

### 1. Filtering Rows Based on a List of Values

**Scenario**: You have a DataFrame with product names, and you want to filter for only "Product A" and "Product C".

**Code Example**:

In [0]:
df = spark.createDataFrame([
    ("Product A", 500),
    ("Product B", 300),
    ("Product C", 700)
], ["product_name", "sales"])

# Filter rows where product_name is in the given list
df.filter(df.product_name.isin("Product A", "Product C")).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
|   Product C|  700|
+------------+-----+



### 2. Using a Python List with `isin()`

**Scenario**: Instead of hardcoding the values directly, you have a Python list containing the products you want to filter.

**Code Example**:

In [0]:
# Define a Python list of products
product_list = ["Product A", "Product C"]

# Use the Python list with isin()
df.filter(df.product_name.isin(product_list)).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
|   Product C|  700|
+------------+-----+



### 3. Combining `isin()` with Other Filters

**Scenario**: You want to filter rows where the product name is either "Product A" or "Product C", and the sales are greater than 400.

**Code Example**:

In [0]:
# Combine isin() with another condition
df.filter(df.product_name.isin("Product A", "Product C") & (df.sales > 400)).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
|   Product C|  700|
+------------+-----+



### 4. Using `isin()` with Multiple Columns

**Scenario**: You want to check if values from multiple columns are present in different lists.

**Code Example**:

In [0]:
df_multi = spark.createDataFrame([
    ("Product A", "Category 1", 500),
    ("Product B", "Category 2", 300),
    ("Product C", "Category 1", 700)
], ["product_name", "category", "sales"])

# Check if product_name and category match different lists
df_multi.filter(
    df_multi.product_name.isin("Product A", "Product C") &
    df_multi.category.isin("Category 1")
).show()


+------------+----------+-----+
|product_name|  category|sales|
+------------+----------+-----+
|   Product A|Category 1|  500|
|   Product C|Category 1|  700|
+------------+----------+-----+



### 5. Using `isin()` to Filter Numeric Columns

**Scenario**: You want to filter rows where the sales value is in a predefined list of numbers.

**Code Example**:

In [0]:
# Filter rows where sales are in the list of values
df.filter(df.sales.isin(300, 500)).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
|   Product B|  300|
+------------+-----+



### 6. Handling Null Values in `isin()`

**Scenario**: You have a DataFrame with null values, and you want to use `isin()` while handling nulls appropriately.

**Code Example**:

In [0]:
df_with_nulls = spark.createDataFrame([
    ("Product A", 500),
    ("Product B", None),
    ("Product C", 700)
], ["product_name", "sales"])

# Use isin() and handle null values
df_with_nulls.filter(df_with_nulls.sales.isin(500, 700)).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
|   Product C|  700|
+------------+-----+

