# PySpark SQL length() Function: How to Calculate String Length

## Introduction to the `length()` Function

The `length()` function in PySpark is used to calculate the number of characters in a string. It’s similar to the `LEN()` function in SQL and the `len()` function in Python. This function is helpful for determining the length of strings, such as product codes, names, or descriptions.


## Basic Syntax:

```
from pyspark.sql.functions import length

DataFrame.select(length(column).alias("column_length"))
```

### Parameter:

- **`column`**: The string column for which you want to calculate the length.


## Why Use `length()`?

- The `length()` function is useful when analyzing string columns, especially when you need to validate the length of values or check if they meet certain criteria (like maximum length for a username).
- It is often used in data validation, formatting, and preparation tasks.


## Practical Examples

### 1. Calculating the Length of a Single Column

**Scenario**: You have a DataFrame with product codes, and you want to calculate the length of each code.

**Code Example**:

In [0]:
from pyspark.sql.functions import length

df = spark.createDataFrame([
    ("ABC12345",),
    ("XYZ98765",),
    ("LMN54321",)
], ["product_code"])

# Calculate the length of product_code
df.select(length(df.product_code).alias("code_length")).show()


+-----------+
|code_length|
+-----------+
|          8|
|          8|
|          8|
+-----------+



### 2. Calculating the Length of Multiple Columns

**Scenario**: You have both `first_name` and `last_name` columns, and you want to calculate the length of both.

**Code Example**:

In [0]:
df_names = spark.createDataFrame([
    ("John", "Doe"),
    ("Jane", "Smith"),
    ("Tom", "Brown")
], ["first_name", "last_name"])

# Calculate the length of first_name and last_name
df_names.select(
    length(df_names.first_name).alias("first_name_length"),
    length(df_names.last_name).alias("last_name_length")
).show()


+-----------------+----------------+
|first_name_length|last_name_length|
+-----------------+----------------+
|                4|               3|
|                4|               5|
|                3|               5|
+-----------------+----------------+



### 3. Filtering Data Based on String Length

**Scenario**: You want to filter the DataFrame to only show product codes that are exactly 8 characters long.

**Code Example**:

In [0]:
# Filter rows where the product_code length is 8
df.filter(length(df.product_code) == 8).show()


+------------+
|product_code|
+------------+
|    ABC12345|
|    XYZ98765|
|    LMN54321|
+------------+



### 4. Handling Null Values in String Length

**Scenario**: You have some columns with null values, and you want to calculate the length without encountering errors.

**Code Example**:

In [0]:
df_with_nulls = spark.createDataFrame([
    ("John", None),
    (None, "Smith"),
    ("Tom", "Brown")
], ["first_name", "last_name"])

# Calculate the length of columns with null values
df_with_nulls.select(
    length(df_with_nulls.first_name).alias("first_name_length"),
    length(df_with_nulls.last_name).alias("last_name_length")
).show()


+-----------------+----------------+
|first_name_length|last_name_length|
+-----------------+----------------+
|                4|            null|
|             null|               5|
|                3|               5|
+-----------------+----------------+



### 5. Combining `length()` with Other String Operations

**Scenario**: You want to calculate the length of concatenated columns, such as combining `first_name` and `last_name` and then calculating the length.

**Code Example**:

In [0]:
from pyspark.sql.functions import concat, lit

# Concatenate first_name and last_name and calculate length
df_names.select(
    concat(df_names.first_name, lit(" "), df_names.last_name).alias("full_name"),
    length(concat(df_names.first_name, lit(" "), df_names.last_name)).alias("full_name_length")
).show()


+----------+----------------+
| full_name|full_name_length|
+----------+----------------+
|  John Doe|               8|
|Jane Smith|              10|
| Tom Brown|               9|
+----------+----------------+

