# Sorting and ranking

Sorting a record by a criterion is another important built-in function. Sorting lexicographically by row or column index is already described in the section [Reordering and sorting from levels](indexing.ipynb#Rearranging-and-Sorting-Levels). In the following we look at sorting the values with [DataFrame.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html) and [Series.sort_values](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html):

In [1]:
import numpy as np
import pandas as pd

rng = np.random.default_rng()
s = pd.Series(rng.normal(size=7))

s.sort_index(ascending=False)

6    0.807524
5   -0.045012
4   -0.276246
3    1.032609
2    1.067726
1    0.488613
0    1.324534
dtype: float64

All missing values are sorted to the end of the row by default:

In [2]:
s = pd.Series(rng.normal(size=7))
s[s < 0] = np.nan

s.sort_values()

5    0.186232
1    0.826051
6    1.649605
0         NaN
2         NaN
3         NaN
4         NaN
dtype: float64

With a DataFrame you can sort on both axes. With `by` you specify which column or row is to be sorted:

In [3]:
df = pd.DataFrame(rng.normal(size=(7, 3)))

df.sort_values(by=2, ascending=False)

Unnamed: 0,0,1,2
1,-0.013109,0.060716,1.83768
0,0.095855,-0.804874,1.20181
2,0.278646,-0.608821,0.498333
5,-0.680013,0.314085,0.382935
3,-0.368188,1.192103,-0.944575
6,1.097941,-0.207889,-0.974132
4,0.684861,1.651951,-2.388397


You can also sort rows with `axis=1` and `by`:

In [4]:
df.sort_values(axis=1, by=[0,1], ascending=False)

Unnamed: 0,2,0,1
0,1.20181,0.095855,-0.804874
1,1.83768,-0.013109,0.060716
2,0.498333,0.278646,-0.608821
3,-0.944575,-0.368188,1.192103
4,-2.388397,0.684861,1.651951
5,0.382935,-0.680013,0.314085
6,-0.974132,1.097941,-0.207889


## Ranking

[DataFrame.rank](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html) and [Series.rank](https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html) assign ranks from one to the number of valid data points in an array:

In [5]:
df.rank()

Unnamed: 0,0,1,2
0,4.0,1.0,6.0
1,3.0,4.0,7.0
2,5.0,2.0,5.0
3,2.0,6.0,3.0
4,6.0,7.0,1.0
5,1.0,5.0,4.0
6,7.0,3.0,2.0


If ties occur in the ranking, the middle rank is usually assigned in each group.

In [6]:
df2 = df.append(df[5:])

df2.rank()

Unnamed: 0,0,1,2
0,5.0,1.0,8.0
1,4.0,5.0,9.0
2,6.0,2.0,7.0
3,3.0,8.0,4.0
4,7.0,9.0,1.0
5,1.5,6.5,5.5
6,8.5,3.5,2.5
5,1.5,6.5,5.5
6,8.5,3.5,2.5


The parameter `min`, on the other hand, assigns the smallest rank in the group:

In [7]:
df2.rank(method='min')

Unnamed: 0,0,1,2
0,5.0,1.0,8.0
1,4.0,5.0,9.0
2,6.0,2.0,7.0
3,3.0,8.0,4.0
4,7.0,9.0,1.0
5,1.0,6.0,5.0
6,8.0,3.0,2.0
5,1.0,6.0,5.0
6,8.0,3.0,2.0


## Other methods with `rank`

Method | Description
:----- | :----------
`average` | default: assign the average rank to each entry in the same group
`min` | uses the minimum rank for the whole group
`max` | uses the maximum rank for the whole group
`first` | assigns the ranks in the order in which the values appear in the data
`dense` | like `method='min'` but the ranks always increase by 1 between groups and not according to the number of same items in a group