## PySpark SQL lower() and upper() Functions: Changing String Case Made Easy

## Introduction to `lower()` and `upper()` Functions

The `lower()` and `upper()` functions in PySpark are used to convert string columns to lowercase and uppercase, respectively. These functions are useful for standardizing text data, especially when performing comparisons, searches, or cleaning inconsistent case formatting.


## Basic Syntax:

```
from pyspark.sql.functions import lower, upper

DataFrame.select(lower(column).alias("lowercase_column"))
DataFrame.select(upper(column).alias("uppercase_column"))
```

### Functions:

- **`lower()`**: Converts the string to lowercase.
- **`upper()`**: Converts the string to uppercase.


## Why Use `lower()` and `upper()`?

- These functions are helpful for making text comparisons case-insensitive, standardizing case across datasets, or preparing text data for display.
- They are often used in data cleaning processes to handle inconsistent capitalization in names, addresses, product descriptions, and other text fields.


## Practical Examples

### 1. Converting a Column to Lowercase

**Scenario**: You have a DataFrame with product names in mixed case, and you want to convert all product names to lowercase.

**Code Example**:

In [0]:
from pyspark.sql.functions import lower, upper

df = spark.createDataFrame([
    ("Product A",),
    ("Product B",),
    ("Product C",)
], ["product_name"])

# Convert product_name to lowercase
df.select(lower(df.product_name).alias("lowercase_product_name")).show()


+----------------------+
|lowercase_product_name|
+----------------------+
|             product a|
|             product b|
|             product c|
+----------------------+



### 2. Converting a Column to Uppercase

**Scenario**: You want to convert product names to uppercase for display purposes.

**Code Example**:

In [0]:
# Convert product_name to uppercase
df.select(upper(df.product_name).alias("uppercase_product_name")).show()


+----------------------+
|uppercase_product_name|
+----------------------+
|             PRODUCT A|
|             PRODUCT B|
|             PRODUCT C|
+----------------------+



### 3. Combining `lower()` and `upper()` with Other Functions

**Scenario**: You want to concatenate first and last names and then convert the result to either lowercase or uppercase.

**Code Example**:

In [0]:
from pyspark.sql.functions import concat, lit

df_names = spark.createDataFrame([
    ("John", "Doe"),
    ("Jane", "Smith"),
    ("Tom", "Brown")
], ["first_name", "last_name"])

# Concatenate first_name and last_name and convert to lowercase
df_names.select(
    lower(concat(df_names.first_name, lit(" "), df_names.last_name)).alias("lowercase_full_name")
).show()

# Concatenate first_name and last_name and convert to uppercase
df_names.select(
    upper(concat(df_names.first_name, lit(" "), df_names.last_name)).alias("uppercase_full_name")
).show()


+-------------------+
|lowercase_full_name|
+-------------------+
|           john doe|
|         jane smith|
|          tom brown|
+-------------------+

+-------------------+
|uppercase_full_name|
+-------------------+
|           JOHN DOE|
|         JANE SMITH|
|          TOM BROWN|
+-------------------+



### 4. Using `lower()` and `upper()` for Case-Insensitive Matching

**Scenario**: You want to make a case-insensitive comparison, such as filtering rows where the product name is "product a" regardless of its case.

**Code Example**:

In [0]:
# Filter rows where product_name is 'product a', ignoring case
df.filter(lower(df.product_name) == "product a").show()


+------------+
|product_name|
+------------+
|   Product A|
+------------+



### 5. Handling Null Values in `lower()` and `upper()`

**Scenario**: You want to ensure that null values in a column don’t cause issues when converting case.

**Code Example**:

In [0]:
df_with_nulls = spark.createDataFrame([
    ("John",),
    (None,),
    ("Tom",)
], ["name"])

# Convert to lowercase and handle null values
df_with_nulls.select(lower(df_with_nulls.name).alias("lowercase_name")).show()

# Convert to uppercase and handle null values
df_with_nulls.select(upper(df_with_nulls.name).alias("uppercase_name")).show()


+--------------+
|lowercase_name|
+--------------+
|          john|
|          null|
|           tom|
+--------------+

+--------------+
|uppercase_name|
+--------------+
|          JOHN|
|          null|
|           TOM|
+--------------+

