# PySpark SQL alias() Function: How to Rename Columns

## Introduction to the `alias()` Function

The `alias()` function in PySpark allows you to assign a temporary name (alias) to a column. It’s commonly used when you need to rename columns during transformations or when performing operations like `select()`, `groupBy()`, or `withColumn()`.


## Basic Syntax:

```
column.alias(newName)
```

### Parameters

- **`newName`**: The new alias for the column. This name is temporary and only applies to the current DataFrame operation.


## Why Use `alias()`?

- It makes columns easier to reference in complex queries.
- It’s useful when you need a more meaningful or shorter name for columns in the result set.
- It’s often used when performing aggregation, joins, or other operations where column names might be ambiguous or need clarity.


## Practical Examples

### 1. Renaming a Single Column

**Scenario**: You have a DataFrame with employee data, and you want to rename the `Salary` column to `Employee_Salary` in the result set.

**Code Example**:

In [0]:
df = spark.createDataFrame([
    (1, "John", 3000),
    (2, "Jane", 4000),
    (3, "Tom", 3500)
], ["EMPLOYEE_ID", "NAME", "SALARY"])

# Rename the "SALARY" column to "EMPLOYEE_SALARY"
df.select("EMPLOYEE_ID", "NAME", df.SALARY.alias("EMPLOYEE_SALARY")).show()


+-----------+----+---------------+
|EMPLOYEE_ID|NAME|EMPLOYEE_SALARY|
+-----------+----+---------------+
|          1|John|           3000|
|          2|Jane|           4000|
|          3| Tom|           3500|
+-----------+----+---------------+



### 2. Renaming Multiple Columns

**Scenario**: You want to rename both the `Employee` column to `ID` and the `Salary` column to `Emp_Salary` using `alias()`.

**Code Example**:

In [0]:
# Rename multiple columns using alias
df.select(df.EMPLOYEE_ID.alias("ID"), "NAME", df.SALARY.alias("EMP_SALARY")).show()


+---+----+----------+
| ID|NAME|EMP_SALARY|
+---+----+----------+
|  1|John|      3000|
|  2|Jane|      4000|
|  3| Tom|      3500|
+---+----+----------+



### 3. Using `alias()` with Expressions

**Scenario**: You want to create a new column by calculating a 10% increase in the `Salary` and rename it as `Increased_Salary`.

**Code Example**:

In [0]:
# Using alias with an expression
df.select("EMPLOYEE_ID", "NAME", (df.SALARY * 1.1).alias("INCREASED_SALARY")).show()


+-----------+----+------------------+
|EMPLOYEE_ID|NAME|  INCREASED_SALARY|
+-----------+----+------------------+
|          1|John|3300.0000000000005|
|          2|Jane|            4400.0|
|          3| Tom|3850.0000000000005|
+-----------+----+------------------+



### 4. Using `alias()` in Aggregations

**Scenario**: You want to calculate the total salary for all employees and rename the result as `Total_Salary`.

**Code Example**:

In [0]:
from pyspark.sql.functions import sum

# Using alias in aggregation
df.select(sum("SALARY").alias("TOTAL_SALARY")).show()


+------------+
|TOTAL_SALARY|
+------------+
|       10500|
+------------+



### 5. Using `alias()` with Grouping

**Scenario**: You want to group employees by name and calculate their total salary, renaming the result as `Total_Salary_Per_Name`.

**Code Example**:

In [0]:
# Using alias with groupBy and aggregation
df.groupBy("NAME").agg(sum("SALARY").alias("TOTAL_SALARY_PER_NAME")).show()


+----+---------------------+
|NAME|TOTAL_SALARY_PER_NAME|
+----+---------------------+
|John|                 3000|
|Jane|                 4000|
| Tom|                 3500|
+----+---------------------+

