# PySpark SQL trim(), ltrim(), rtrim() Functions: Trimming Spaces Easily

## Introduction to Trimming Functions

The `trim()`, `ltrim()`, and `rtrim()` functions in PySpark are used to remove unwanted spaces from strings:

- **`trim()`** removes leading and trailing spaces.
- **`ltrim()`** removes only leading (left) spaces.
- **`rtrim()`** removes only trailing (right) spaces.

These functions are essential for cleaning up string data, especially when working with user inputs, text data, or fields prone to extra spaces.


## Basic Syntax:

```
from pyspark.sql.functions import trim, ltrim, rtrim

DataFrame.select(trim(column).alias("trimmed_column"))
DataFrame.select(ltrim(column).alias("left_trimmed_column"))
DataFrame.select(rtrim(column).alias("right_trimmed_column"))
```


## Why Use Trimming Functions?

- Removing extra spaces ensures that string values are clean and ready for comparison, display, or further processing.
- Trimming is often used in data cleaning pipelines, especially when dealing with inconsistent string data that may have spaces accidentally added during entry or manipulation.


## Practical Examples

### 1. Trimming Leading and Trailing Spaces

**Scenario**: You have a DataFrame where names contain extra spaces at the beginning and end, and you want to remove them using `trim()`.

**Code Example**:

In [0]:
from pyspark.sql.functions import trim, ltrim, rtrim

df = spark.createDataFrame([
    ("  John  ",),
    ("  Jane   ",),
    ("  Tom  ",)
], ["name"])

# Remove leading and trailing spaces using trim()
df.select(trim(df.name).alias("trimmed_name")).show()


+------------+
|trimmed_name|
+------------+
|        John|
|        Jane|
|         Tom|
+------------+



### 2. Removing Leading Spaces with `ltrim()`

**Scenario**: You have a column of product names with extra spaces at the beginning, and you want to remove them using `ltrim()`.

**Code Example**:

In [0]:
# Remove leading spaces using ltrim()
df.select(ltrim(df.name).alias("left_trimmed_name")).show()


+-----------------+
|left_trimmed_name|
+-----------------+
|           John  |
|          Jane   |
|            Tom  |
+-----------------+



### 3. Removing Trailing Spaces with `rtrim()`

**Scenario**: You want to remove trailing spaces from the names using `rtrim()`.

**Code Example**:

In [0]:
# Remove trailing spaces using rtrim()
df.select(rtrim(df.name).alias("right_trimmed_name")).show()


+------------------+
|right_trimmed_name|
+------------------+
|              John|
|              Jane|
|               Tom|
+------------------+



### 4. Combining `trim()`, `ltrim()`, and `rtrim()`

**Scenario**: You want to show the difference between trimming functions by applying all three to the same column.

**Code Example**:

In [0]:
df.select(
    trim(df.name).alias("trimmed_name"),
    ltrim(df.name).alias("left_trimmed_name"),
    rtrim(df.name).alias("right_trimmed_name")
).show()


+------------+-----------------+------------------+
|trimmed_name|left_trimmed_name|right_trimmed_name|
+------------+-----------------+------------------+
|        John|           John  |              John|
|        Jane|          Jane   |              Jane|
|         Tom|            Tom  |               Tom|
+------------+-----------------+------------------+



### 5. Trimming Spaces and Concatenating Strings

**Scenario**: You want to trim spaces from a name field and then concatenate it with another column, such as an ID.

**Code Example**:

In [0]:
from pyspark.sql.functions import concat, lit

# Trim spaces and concatenate with another string
df_names = spark.createDataFrame([
    ("  John  ", "Doe"),
    ("  Jane   ", "Smith"),
    ("  Tom  ", "Brown")
], ["first_name", "last_name"])

df_names.select(
    concat(trim(df_names.first_name), lit(" "), df_names.last_name).alias("full_name")
).show()


+----------+
| full_name|
+----------+
|  John Doe|
|Jane Smith|
| Tom Brown|
+----------+



### 6. Handling Null Values in Trimming

**Scenario**: You want to handle cases where some values are null, ensuring that `trim()` doesn’t raise errors or affect your DataFrame processing.

**Code Example**:

In [0]:
df_with_nulls = spark.createDataFrame([
    ("  John  ",),
    (None,),
    ("  Tom  ",)
], ["name"])

# Trim values with handling null values
df_with_nulls.select(trim(df_with_nulls.name).alias("trimmed_name")).show()


+------------+
|trimmed_name|
+------------+
|        John|
|        null|
|         Tom|
+------------+

