# 03.3 — Pandas: Sorting & Ranking

This notebook covers **sorting and ranking operations** in Pandas:

- `.sort_values(by=...)` to sort rows by one or more columns
- `.rank()` to assign ranks to values (handling ties automatically)

Dataset used: **Titanic** (loaded from GitHub). This notebook is Google Colab-ready.

---

In [1]:
import pandas as pd

url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Sorting Rows

- Use `.sort_values(by=...)` to sort rows by one or more columns.
- Can specify ascending/descending order.


In [3]:
# Sort passengers by Age (ascending)
df_sorted_age = df.sort_values(by='Age')
print('-----df_sorted_age------')
print(df_sorted_age[['Name','Age']].head())

# Sort by Fare (descending)
df_sorted_fare = df.sort_values(by='Fare', ascending=False)
print('-----df_sorted_fare------')
print(df_sorted_fare[['Name','Fare']].head())

# Sort by multiple columns (Pclass ascending, Age descending)

df_sorted_multi = df.sort_values(by=['Pclass','Age'], ascending=[True,False])
print('-----df_sorted_multi------')
print(df_sorted_multi[['Pclass','Age']].head())

-----df_sorted_age------
                                Name   Age
803  Thomas, Master. Assad Alexander  0.42
755        Hamalainen, Master. Viljo  0.67
644           Baclini, Miss. Eugenie  0.75
469    Baclini, Miss. Helene Barbara  0.75
78     Caldwell, Master. Alden Gates  0.83
-----df_sorted_fare------
                                   Name      Fare
679  Cardeza, Mr. Thomas Drake Martinez  512.3292
258                    Ward, Miss. Anna  512.3292
737              Lesurer, Mr. Gustave J  512.3292
88           Fortune, Miss. Mabel Helen  263.0000
438                   Fortune, Mr. Mark  263.0000
-----df_sorted_multi------
     Pclass   Age
630       1  80.0
96        1  71.0
493       1  71.0
745       1  70.0
54        1  65.0


## Ranking Values

- Use `.rank()` to assign ranks to values.
- Ties are handled by averaging ranks (default).
- Methods: `'average'`, `'min'`, `'max'`, `'first'`, `'dense'`.


In [5]:
# Rank passengers by Fare
df['FareRank'] = df['Fare'].rank()
print('----------------------------------')
print(df[['Name','Fare','FareRank']].head())

# Rank with different method: dense
print('----------------------------------')
print(df['Fare'].rank(method='dense').head())

# Example: Top 5 highest Fare passengers
top5 = df.sort_values(by='Fare', ascending=False).head(5)
print('----------------------------------')
print(top5[['Name','Fare','FareRank']])

----------------------------------
                                                Name     Fare  FareRank
0                            Braund, Mr. Owen Harris   7.2500      77.0
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  71.2833     789.0
2                             Heikkinen, Miss. Laina   7.9250     232.5
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  53.1000     748.0
4                           Allen, Mr. William Henry   8.0500     264.0
----------------------------------
0     19.0
1    208.0
2     42.0
3    190.0
4     44.0
Name: Fare, dtype: float64
----------------------------------
                                   Name      Fare  FareRank
679  Cardeza, Mr. Thomas Drake Martinez  512.3292     890.0
258                    Ward, Miss. Anna  512.3292     890.0
737              Lesurer, Mr. Gustave J  512.3292     890.0
88           Fortune, Miss. Mabel Helen  263.0000     886.5
438                   Fortune, Mr. Mark  263.0000     886.5


## Best Practices

- Always check ascending/descending order explicitly.
- When sorting by multiple columns, pass a list to `by` and `ascending`.
- Be aware of NaN values: they are placed at the end by default.
- Ranking is useful for percentiles, leaderboards, and assigning order.


## Exercises

1. Sort Titanic passengers by `Age` in descending order.
2. Find the top 10 passengers with the highest `Fare`.
3. Assign ranks to passengers by `Age`.
4. Compare results using `.rank(method='min')` and `.rank(method='dense')` on `Fare`.
