# PySpark SQL orderBy() Function: How to Sort Data Easily

## Introduction to the `orderBy()` Function

The `orderBy()` function in PySpark is used to sort rows in a DataFrame based on one or more columns. It works similarly to the SQL `ORDER BY` clause, allowing you to sort data in ascending or descending order. It is one of the most commonly used functions for organizing data for better readability or further processing.


## Basic Syntax:

```
DataFrame.orderBy(*cols, ascending=True)
```

### Parameters:

- **`cols`**: The columns you want to sort by.
- **`ascending`**: A boolean value (`True` for ascending, `False` for descending). You can pass a list if sorting by multiple columns.


## Why Use `orderBy()`?

- It helps organize data by sorting values in either ascending or descending order, making it easier to analyze trends, find maximum or minimum values, or present data more clearly.
- Useful for reporting, ranking, and preparing data for further analysis or visualization.


## Practical Examples

### 1. Sorting Data in Ascending Order

**Scenario**: You have a DataFrame with sales data, and you want to sort it by `Sales` in ascending order.

**Code Example**:

In [0]:
df = spark.createDataFrame([
    ("ItemA", 100),
    ("ItemB", 200),
    ("ItemA", 300),
    ("ItemC", 400),
    ("ItemB", 500)
], ["ITEM", "SALES"])

# Sort by SALES in ascending order
df.orderBy("SALES", ascending=True).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  100|
|ItemB|  200|
|ItemA|  300|
|ItemC|  400|
|ItemB|  500|
+-----+-----+



### 2. Sorting Data in Descending Order

**Scenario**: You want to sort the sales data by `Sales` in descending order.

**Code Example**:

In [0]:
# Sort by SALES in descending order
df.orderBy("SALES", ascending=False).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemB|  500|
|ItemC|  400|
|ItemA|  300|
|ItemB|  200|
|ItemA|  100|
+-----+-----+



### 3. Sorting by Multiple Columns

**Scenario**: You want to sort the data first by `ITEM` and then by `SALES` in ascending order.

**Code Example**:

In [0]:
# Sort by ITEM and then by SALES in ascending order
df.orderBy("ITEM", "SALES").show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  100|
|ItemA|  300|
|ItemB|  200|
|ItemB|  500|
|ItemC|  400|
+-----+-----+



### 4. Sorting by Multiple Columns with Mixed Orders

**Scenario**: You want to sort by `ITEM` in ascending order and `SALES` in descending order.

**Code Example**:

In [0]:
# Sort by ITEM in ascending and SALES in descending order
df.orderBy(["ITEM", "SALES"], ascending=[True, False]).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  300|
|ItemA|  100|
|ItemB|  500|
|ItemB|  200|
|ItemC|  400|
+-----+-----+



### 5. Sorting by Column Expressions

**Scenario**: You want to sort the data by the result of an expression, such as sorting based on sales values after adding 10% to each.

**Code Example**:

In [0]:
from pyspark.sql.functions import expr

# Sort by SALES after increasing it by 10%
df.orderBy(expr("SALES * 1.1")).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  100|
|ItemB|  200|
|ItemA|  300|
|ItemC|  400|
|ItemB|  500|
+-----+-----+



### 6. Sorting Null Values

**Scenario**: You want to sort the data while handling null values in the `SALES` column, placing nulls either first or last.

**Code Example**:

In [0]:
df_with_nulls = spark.createDataFrame([
    ("ItemA", 100),
    ("ItemB", None),
    ("ItemA", 300),
    ("ItemC", 400),
    ("ItemB", None)
], ["ITEM", "SALES"])

# Sort by SALES and place nulls first
df_with_nulls.orderBy(df_with_nulls.SALES.asc_nulls_first()).show()

# Sort by SALES and place nulls last
df_with_nulls.orderBy(df_with_nulls.SALES.asc_nulls_last()).show()


+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemB| null|
|ItemB| null|
|ItemA|  100|
|ItemA|  300|
|ItemC|  400|
+-----+-----+

+-----+-----+
| ITEM|SALES|
+-----+-----+
|ItemA|  100|
|ItemA|  300|
|ItemC|  400|
|ItemB| null|
|ItemB| null|
+-----+-----+

