---
# Sorting Data
Sorting a column and multiple columns

---

In [2]:
import pandas as pd
import numpy as np
from IPython.display import display

In [3]:
# Function for printing a horizontal line. For display purpose
def printhr(s: str = None, n: int = 40):
    """Print a horizontal rule of the character "=" of length n.

    Args:
        s (str, optional): Header message. Defaults to None.
        n (int, optional): Number of characters. Defaults to 50.
    """

    if s:
        print("=" * int(n / 2), s, "=" * int(n / 2))
    else:
        print("=" * n)

In [4]:
people = {
    "first": ["Lorem", "John", "Jane", "Foo"],
    "last": ["Ipsum", "Doe", "Doe", "Bar"],
    "email": ["lorem@yahoo.com", "john@gmail.com", "jane@outlook.com", "foobar@email.com"],
    "score": [88, 90, 85, 90]
}
df = pd.DataFrame(people)
display(df)

Unnamed: 0,first,last,email,score
0,Lorem,Ipsum,lorem@yahoo.com,88
1,John,Doe,john@gmail.com,90
2,Jane,Doe,jane@outlook.com,85
3,Foo,Bar,foobar@email.com,90


---
## Sort Single Row or Column

An axis (defaults to the "rows" axis, meaning sort the rows) can be sorted using `sort_values()` and passing in the label name (either index or column name) of the axis to be sorted to the **by** parameter.

Values are sorted in ascending order by default and determined by the bool **ascending** parameter.  

**Note:** this method does not modify the original df. Modify the original by setting the inplace parameter to True.  

---

In [5]:
# Sort by score in descending order
display(df)
printhr()

# Sort the rows
sorted_df = df.sort_values(by="score", ascending=False)
display(sorted_df)

Unnamed: 0,first,last,email,score
0,Lorem,Ipsum,lorem@yahoo.com,88
1,John,Doe,john@gmail.com,90
2,Jane,Doe,jane@outlook.com,85
3,Foo,Bar,foobar@email.com,90




Unnamed: 0,first,last,email,score
1,John,Doe,john@gmail.com,90
3,Foo,Bar,foobar@email.com,90
0,Lorem,Ipsum,lorem@yahoo.com,88
2,Jane,Doe,jane@outlook.com,85


In [6]:
# Sorting the columns (axis=1 or axis="columns")
# Create new df since we cant sort an axis containing a str and a number.
people2 = {
    "first": ["Lorem", "John", "Jane"],
    "last": ["Ipsum", "Doe", "Doe"],
    "email": ["lorem@yahoo.com", "john@gmail.com", "jane@outlook.com"],
}
df2 = pd.DataFrame(people2)

display(df2)
printhr()

# Sort by the "1" index
sorted_df = df2.sort_values(by=1, axis="columns")
display(sorted_df)

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com




Unnamed: 0,last,first,email
0,Ipsum,Lorem,lorem@yahoo.com
1,Doe,John,john@gmail.com
2,Doe,Jane,jane@outlook.com


---
## Sort Multiple Rows or Columns

When 2 or more values are the same, we can have a second (or more if needed) sort category that will be applied to determine how the sorting will be carried out.  

Sorting by multiple categories can be done by passing in an iterable to the **by** parameter. Each sorting category can have their own sort order by specifying the corresponding bool value in an iterable to the **ascending** parameter.

**Note:** this method does not modify the original df. Modify the original by setting the inplace parameter to True.  

---

In [7]:
# Sort through score. If grades are equal, sort
# the rows with equal values by last name
display(df)
printhr()

# An iterable is passed both to the by and ascending
# parameters to fine-tune the sorting. This effectively 
# sorts the score in descending order (dictated by False),
# and sort the rows with the same scores by last name in
# ascending order (true)
sorted_df = df.sort_values(["score", "last"], ascending=[False, True])
display(sorted_df)

Unnamed: 0,first,last,email,score
0,Lorem,Ipsum,lorem@yahoo.com,88
1,John,Doe,john@gmail.com,90
2,Jane,Doe,jane@outlook.com,85
3,Foo,Bar,foobar@email.com,90




Unnamed: 0,first,last,email,score
3,Foo,Bar,foobar@email.com,90
1,John,Doe,john@gmail.com,90
0,Lorem,Ipsum,lorem@yahoo.com,88
2,Jane,Doe,jane@outlook.com,85


---
## Sort Using `nlargest()` and `nsmallest()`

Return n rows sorted in descending (nlargest) or ascending order (nsmallest)  

Syntax:  
nlargest(n, column) - DataFrames  
n_largest(n) - Series

**Note:** this method does not modify the original df. Modify the original by setting the inplace parameter to True.  

---

In [16]:
display(df)
printhr()

df_high = df.nlargest(5, columns="score")
df_low =  df.nsmallest(5, columns="score")
display(df_high)
display(df_low)

Unnamed: 0,first,last,email,score
0,Lorem,Ipsum,lorem@yahoo.com,88
1,John,Doe,john@gmail.com,90
2,Jane,Doe,jane@outlook.com,85
3,Foo,Bar,foobar@email.com,90




Unnamed: 0,first,last,email,score
1,John,Doe,john@gmail.com,90
3,Foo,Bar,foobar@email.com,90
0,Lorem,Ipsum,lorem@yahoo.com,88
2,Jane,Doe,jane@outlook.com,85


Unnamed: 0,first,last,email,score
2,Jane,Doe,jane@outlook.com,85
0,Lorem,Ipsum,lorem@yahoo.com,88
1,John,Doe,john@gmail.com,90
3,Foo,Bar,foobar@email.com,90
