# PySpark SQL sort() Function: Sorting DataFrames Made Easy

## Introduction to the `sort()` Function

The `sort()` function in PySpark is used to sort a DataFrame based on one or more columns. It is essentially the same as `orderBy()`, but with slightly different syntax. Both functions are used interchangeably for sorting DataFrames in ascending or descending order.


## Basic Syntax:

```
DataFrame.sort(*cols, ascending=True)
```
### Parameters:

- **`cols`**: The column(s) you want to sort by.
- **`ascending`**: A boolean value (`True` for ascending, `False` for descending). You can pass a list if sorting by multiple columns.


## Why Use `sort()`?

- Like `orderBy()`, the `sort()` function is used to organize data for easy analysis or presentation. It’s particularly useful when you want to rank, prioritize, or visualize data in a specific order.
- Both `sort()` and `orderBy()` function similarly, but `sort()` is more commonly used when working with a small number of columns or straightforward sorting needs.


## Practical Examples

### 1. Sorting Data in Ascending Order

**Scenario**: You have a DataFrame with sales data, and you want to sort it by `Sales` in ascending order.

**Code Example**:

In [0]:
df = spark.createDataFrame([
    ("ItemA", 100),
    ("ItemB", 200),
    ("ItemA", 300),
    ("ItemC", 400),
    ("ItemB", 500)
], ["ITEM", "SALES"])

# Sort by SALES in ascending order
df.sort("SALES", ascending=True).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  100|
|ItemB|  200|
|ItemA|  300|
|ItemC|  400|
|ItemB|  500|
+-----+-----+



### 2. Sorting Data in Descending Order

**Scenario**: You want to sort the sales data by `Sales` in descending order.

**Code Example**:

In [0]:
# Sort by SALES in descending order
df.sort("SALES", ascending=False).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemB|  500|
|ItemC|  400|
|ItemA|  300|
|ItemB|  200|
|ItemA|  100|
+-----+-----+



### 3. Sorting by Multiple Columns

**Scenario**: You want to sort the data first by `ITEM` and then by `SALES` in ascending order.

**Code Example**:

In [0]:
# Sort by ITEM and then by SALES in ascending order
df.sort("ITEM", "SALES").show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  100|
|ItemA|  300|
|ItemB|  200|
|ItemB|  500|
|ItemC|  400|
+-----+-----+



### 4. Sorting by Multiple Columns with Mixed Orders

**Scenario**: You want to sort by `ITEM` in ascending order and `SALES` in descending order.

**Code Example**:

In [0]:
# Sort by ITEM in ascending and SALES in descending order
df.sort(["ITEM", "SALES"], ascending=[True, False]).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  300|
|ItemA|  100|
|ItemB|  500|
|ItemB|  200|
|ItemC|  400|
+-----+-----+



### 5. Sorting with Expressions

**Scenario**: You want to sort the data based on a calculated column, such as sorting based on sales values after adding 10%.

**Code Example**:

In [0]:
from pyspark.sql.functions import expr

# Sort by SALES after adding 10% to each value
df.sort(expr("SALES * 1.1")).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  100|
|ItemB|  200|
|ItemA|  300|
|ItemC|  400|
|ItemB|  500|
+-----+-----+



### 6. Handling Null Values in Sorting

**Scenario**: You want to sort data while handling null values in the `SALES` column, ensuring nulls are placed either first or last.

**Code Example**:

In [0]:
df_with_nulls = spark.createDataFrame([
    ("ItemA", 100),
    ("ItemB", None),
    ("ItemA", 300),
    ("ItemC", 400),
    ("ItemB", None)
], ["ITEM", "SALES"])

# Sort by SALES and place nulls first
df_with_nulls.sort(df_with_nulls.SALES.asc_nulls_first()).show()

# Sort by SALES and place nulls last
df_with_nulls.sort(df_with_nulls.SALES.asc_nulls_last()).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemB| null|
|ItemB| null|
|ItemA|  100|
|ItemA|  300|
|ItemC|  400|
+-----+-----+

+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  100|
|ItemA|  300|
|ItemC|  400|
|ItemB| null|
|ItemB| null|
+-----+-----+

