# Sorting and ranking

Sorting a dataset by a criterion is another important built-in function. To sort lexicographically by row or column index, use the methods [pandas.Series.sort_index](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_index.html) or [pandas.DataFrame.sort_index](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html), which returns a new sorted object. With `ascending=False` the sort order is reversed:

In [1]:
import numpy as np
import pandas as pd

rng = np.random.default_rng()
s = pd.Series(rng.normal(size=7))

s.sort_index(ascending=False)

6    1.276976
5   -0.787146
4    0.052761
3    0.111477
2   -0.431729
1    1.321374
0    1.376616
dtype: float64

To sort a series by its values, you can use the `sort_values` method:

In [2]:
s.sort_values()

5   -0.787146
2   -0.431729
4    0.052761
3    0.111477
6    1.276976
1    1.321374
0    1.376616
dtype: float64

All missing values are sorted to the end of the row by default:

In [3]:
s = pd.Series(rng.normal(size=7))
s[s < 0] = np.nan

s.sort_values()

0    0.645635
3    0.875170
6    1.369406
5    1.554600
4    1.975220
1         NaN
2         NaN
dtype: float64

With a DataFrame you can sort by index on both axes:

In [4]:
df = pd.DataFrame(rng.normal(size=(7, 3)))

df

Unnamed: 0,0,1,2
0,1.675691,-0.875081,0.623822
1,-0.25764,-0.32803,1.33465
2,-0.167721,1.448707,-0.926976
3,-0.509818,1.438268,-0.559362
4,-0.247201,1.840276,-1.155154
5,-0.428365,0.341423,0.202372
6,-0.568956,1.68618,-0.871047


In [5]:
df.sort_index(ascending=False)

Unnamed: 0,0,1,2
6,-0.568956,1.68618,-0.871047
5,-0.428365,0.341423,0.202372
4,-0.247201,1.840276,-1.155154
3,-0.509818,1.438268,-0.559362
2,-0.167721,1.448707,-0.926976
1,-0.25764,-0.32803,1.33465
0,1.675691,-0.875081,0.623822


In [6]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,2,1,0
0,0.623822,-0.875081,1.675691
1,1.33465,-0.32803,-0.25764
2,-0.926976,1.448707,-0.167721
3,-0.559362,1.438268,-0.509818
4,-1.155154,1.840276,-0.247201
5,0.202372,0.341423,-0.428365
6,-0.871047,1.68618,-0.568956


When sorting a DataFrame, you can use the data in one or more columns as sort keys. To do this, you pass one or more column names to the `by` option of `sort_values`:

In [7]:
df.sort_values(by=2)

Unnamed: 0,0,1,2
4,-0.247201,1.840276,-1.155154
2,-0.167721,1.448707,-0.926976
6,-0.568956,1.68618,-0.871047
3,-0.509818,1.438268,-0.559362
5,-0.428365,0.341423,0.202372
0,1.675691,-0.875081,0.623822
1,-0.25764,-0.32803,1.33465


To sort by several columns, you can pass a list of names.

Ranking assigns ranks from one to the number of valid data points in an array:

In [8]:
df.rank()

Unnamed: 0,0,1,2
0,7.0,1.0,6.0
1,4.0,2.0,7.0
2,6.0,5.0,2.0
3,2.0,4.0,4.0
4,5.0,7.0,1.0
5,3.0,3.0,5.0
6,1.0,6.0,3.0


If ties appear in the ranking, `rank` assigns the middle rank to each group.

In [9]:
df.rank(method='max')

Unnamed: 0,0,1,2
0,7.0,1.0,6.0
1,4.0,2.0,7.0
2,6.0,5.0,2.0
3,2.0,4.0,4.0
4,5.0,7.0,1.0
5,3.0,3.0,5.0
6,1.0,6.0,3.0


## Other methods with `rank`

Method | Description
:----- | :----------
`average` | default: assign the average rank to each entry in the same group
`min` | uses the minimum rank for the whole group
`max` | uses the maximum rank for the whole group
`first` | assigns the ranks in the order in which the values appear in the data
`dense` | like `method='min'` but the ranks always increase by 1 between groups and not according to the number of same items in a group