# PySpark SQL concat() Function: Concatenating Columns Made Simple

## Introduction to the `concat()` Function

The `concat()` function in PySpark is used to concatenate multiple columns or expressions into a single column. It is similar to concatenation functions in other programming languages and SQL, where you combine two or more strings (or columns) into one.


## Basic Syntax:

```
from pyspark.sql.functions import concat

DataFrame.select(concat(column1, column2, ...).alias("new_column"))
```

### Parameters:

- **`column1, column2`**: The columns you want to concatenate.
- You can also concatenate string literals or expressions with columns.


## Why Use `concat()`?

- It is useful when you need to combine multiple columns into one, such as merging first and last names, joining city and state fields, or combining various string-based identifiers into a single column.
- `concat()` is often used when preparing data for display, reporting, or exporting as combined information.


## Practical Examples

### 1. Concatenating Two Columns

**Scenario**: You have a DataFrame with first and last names, and you want to concatenate them into a full name.

**Code Example**:

In [0]:
from pyspark.sql.functions import concat

df = spark.createDataFrame([
    ("John", "Doe"),
    ("Jane", "Smith"),
    ("Tom", "Brown")
], ["first_name", "last_name"])

# Concatenate first_name and last_name into full_name
df.select(concat(df.first_name, df.last_name).alias("full_name")).show()


+---------+
|full_name|
+---------+
|  JohnDoe|
|JaneSmith|
| TomBrown|
+---------+



### 2. Concatenating with a Separator

**Scenario**: You want to concatenate `FirstName` and `LastName` with a space in between.

**Code Example**:

In [0]:
from pyspark.sql.functions import lit

# Concatenate first_name and last_name with a space in between
df.select(concat(df.first_name, lit(" "), df.last_name).alias("full_name")).show()


+----------+
| full_name|
+----------+
|  John Doe|
|Jane Smith|
| Tom Brown|
+----------+



### 3. Concatenating Multiple Columns

**Scenario**: You have a DataFrame with multiple address fields, and you want to concatenate them into a full address.

**Code Example**:

In [0]:
df_address = spark.createDataFrame([
    ("123", "Main St", "City", "State", "12345"),
    ("456", "Broadway", "City2", "State2", "67890")
], ["house_number", "street", "city", "state", "zip_code"])

# Concatenate multiple columns into a full address
df_address.select(concat(
    df_address.house_number, lit(" "),
    df_address.street, lit(", "),
    df_address.city, lit(", "),
    df_address.state, lit(" "),
    df_address.zip_code
).alias("full_address")).show(truncate=False)


+---------------------------------+
|full_address                     |
+---------------------------------+
|123 Main St, City, State 12345   |
|456 Broadway, City2, State2 67890|
+---------------------------------+



### 4. Concatenating Columns with Null Values

**Scenario**: You have some columns with null values, and you want to concatenate them without resulting in null output.

**Code Example**:

In [0]:
df_with_nulls = spark.createDataFrame([
    ("John", None),
    (None, "Smith"),
    ("Tom", "Brown")
], ["first_name", "last_name"])

# Concatenate columns with null values using coalesce to handle nulls
from pyspark.sql.functions import coalesce

df_with_nulls.select(concat(
    coalesce(df_with_nulls.first_name, lit("")),
    lit(" "),
    coalesce(df_with_nulls.last_name, lit(""))
).alias("full_name")).show()


+---------+
|full_name|
+---------+
|    John |
|    Smith|
|Tom Brown|
+---------+



### 5. Concatenating Columns and Literals

**Scenario**: You want to concatenate columns with literal strings for formatting, such as creating a formatted full address.

**Code Example**:

In [0]:
# Concatenate columns with literal strings for formatting
df_address.select(concat(
    lit("Address: "), df_address.house_number, lit(", "),
    df_address.street, lit(", "),
    df_address.city, lit(", "),
    df_address.state, lit(" "),
    df_address.zip_code
).alias("formatted_address")).show(truncate=False)


+-------------------------------------------+
|formatted_address                          |
+-------------------------------------------+
|Address: 123, Main St, City, State 12345   |
|Address: 456, Broadway, City2, State2 67890|
+-------------------------------------------+

