# PySpark SQL and(), or(), not() Functions: Mastering Boolean Logic

## Introduction to Boolean Logic with `and()`, `or()`, and `not()`

The `and()`, `or()`, and `not()` functions in PySpark are used to apply Boolean logic to DataFrame filtering and conditional expressions. These functions allow you to combine multiple conditions (`and()` and `or()`), or negate a condition (`not()`). These are commonly used in data filtering, where multiple criteria need to be satisfied, or specific conditions should be excluded.


## Basic Syntax:

```
DataFrame.filter((condition1).and(condition2))
DataFrame.filter((condition1).or(condition2))
DataFrame.filter(not(condition))
```

### Functions:

- **`and()`**: Returns true if both conditions are true.
- **`or()`**: Returns true if at least one condition is true.
- **`not()`**: Returns true if the condition is false.


## Why Use Boolean Logic?

These functions allow for powerful filtering and conditional logic by combining multiple expressions. This is especially useful in complex data queries where multiple conditions need to be evaluated simultaneously, making data filtering more flexible and precise.


## Practical Examples

### 1. Using `and()` for Combining Conditions

**Scenario**: You have a DataFrame with sales data, and you want to filter rows where sales are greater than 300 and less than 600.

**Code Example**:

In [0]:
df = spark.createDataFrame([
    ("Product A", 500),
    ("Product B", 300),
    ("Product C", 700)
], ["product_name", "sales"])

# Use and() to filter rows where sales are between 300 and 600
df.filter((df.sales > 300) & (df.sales < 600)).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
+------------+-----+



### 2. Using `or()` for Multiple Conditions

**Scenario**: You want to filter rows where either the product name is "Product A" or the sales are greater than 600.

**Code Example**:

In [0]:
# Use or() to filter rows where product_name is 'Product A' or sales > 600
df.filter((df.product_name == "Product A") | (df.sales > 600)).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
|   Product C|  700|
+------------+-----+



### 3. Using `not()` to Negate Conditions

**Scenario**: You want to filter rows where sales are not equal to 300.

**Code Example**:

In [0]:
# Use not() to filter rows where sales are not equal to 300
df.filter(~(df.sales == 300)).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
|   Product C|  700|
+------------+-----+



### 4. Combining `and()`, `or()`, and `not()` for Complex Filtering

**Scenario**: You want to filter rows where the sales are either greater than 600 or less than 300, but you want to exclude any rows with product name "Product C."

**Code Example**:

In [0]:
# Combine and(), or(), and not() for complex filtering
df.filter(((df.sales > 600)|(df.sales < 300)) & (~(df.product_name == "Product C"))).show()


+------------+-----+
|product_name|sales|
+------------+-----+
+------------+-----+



### 5. Using `and()` and `or()` for String Columns

**Scenario**: You want to filter rows where the product name starts with "Product A" and sales are greater than 400.

**Code Example**:

In [0]:
# Use and() to combine conditions for string columns and numeric columns
df.filter((df.product_name.startswith("Product A")) & (df.sales > 400)).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
+------------+-----+



### 6. Using `or()` and `not()` with Null Values

**Scenario**: You have a DataFrame with some null values, and you want to filter rows where either sales are null or the product name is not "Product B."

**Code Example**:

In [0]:
df_with_nulls = spark.createDataFrame([
    ("Product A", 500),
    ("Product B", None),
    ("Product C", 700)
], ["product_name", "sales"])

# Use or() and not() with null values
df_with_nulls.filter((df_with_nulls.sales.isNull()) | (~(df_with_nulls.product_name == "Product B"))).show()


+------------+-----+
|product_name|sales|
+------------+-----+
|   Product A|  500|
|   Product B| null|
|   Product C|  700|
+------------+-----+

