# Sorting and ranking

Sorting a dataset by a criterion is another important built-in function. To sort lexicographically by row or column index, use the methods [pandas.Series.sort_index](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_index.html) or [pandas.DataFrame.sort_index](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html), which returns a new sorted object. With `ascending=False` the sort order is reversed:

In [1]:
import numpy as np
import pandas as pd

s = pd.Series(np.random.randn(7))

s.sort_index(ascending=False)

6    1.327001
5   -1.601136
4    1.641455
3   -0.724621
2    0.071338
1   -2.000136
0    0.317815
dtype: float64

To sort a series by its values, you can use the `sort_values` method:

In [2]:
s.sort_values()

1   -2.000136
5   -1.601136
3   -0.724621
2    0.071338
0    0.317815
6    1.327001
4    1.641455
dtype: float64

All missing values are sorted to the end of the row by default:

In [3]:
s = pd.Series(np.random.randn(7))
s[s < 0] = np.nan

s.sort_values()

3    0.553928
1    0.733592
5    1.148924
0         NaN
2         NaN
4         NaN
6         NaN
dtype: float64

With a DataFrame you can sort by index on both axes:

In [4]:
df = pd.DataFrame(np.random.randn(7, 3))

df

Unnamed: 0,0,1,2
0,0.799196,-0.067099,-0.411729
1,-0.81073,1.266002,-0.253321
2,0.363349,0.393135,-0.842521
3,0.635861,0.066355,-1.138751
4,2.500409,0.334972,-3.329376
5,0.824858,0.809429,-0.411104
6,0.491889,0.985929,1.102593


In [5]:
df.sort_index(ascending=False)

Unnamed: 0,0,1,2
6,0.491889,0.985929,1.102593
5,0.824858,0.809429,-0.411104
4,2.500409,0.334972,-3.329376
3,0.635861,0.066355,-1.138751
2,0.363349,0.393135,-0.842521
1,-0.81073,1.266002,-0.253321
0,0.799196,-0.067099,-0.411729


In [6]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,2,1,0
0,-0.411729,-0.067099,0.799196
1,-0.253321,1.266002,-0.81073
2,-0.842521,0.393135,0.363349
3,-1.138751,0.066355,0.635861
4,-3.329376,0.334972,2.500409
5,-0.411104,0.809429,0.824858
6,1.102593,0.985929,0.491889


When sorting a DataFrame, you can use the data in one or more columns as sort keys. To do this, you pass one or more column names to the `by` option of `sort_values`:

In [7]:
df.sort_values(by=2)

Unnamed: 0,0,1,2
4,2.500409,0.334972,-3.329376
3,0.635861,0.066355,-1.138751
2,0.363349,0.393135,-0.842521
0,0.799196,-0.067099,-0.411729
5,0.824858,0.809429,-0.411104
1,-0.81073,1.266002,-0.253321
6,0.491889,0.985929,1.102593


To sort by several columns, you can pass a list of names.

Ranking assigns ranks from one to the number of valid data points in an array:

In [8]:
df.rank()

Unnamed: 0,0,1,2
0,5.0,1.0,4.0
1,1.0,7.0,6.0
2,2.0,4.0,3.0
3,4.0,2.0,2.0
4,7.0,3.0,1.0
5,6.0,5.0,5.0
6,3.0,6.0,7.0


If ties appear in the ranking, `rank` assigns the middle rank to each group.

In [10]:
df.rank(method='max')

Unnamed: 0,0,1,2
0,5.0,1.0,4.0
1,1.0,7.0,6.0
2,2.0,4.0,3.0
3,4.0,2.0,2.0
4,7.0,3.0,1.0
5,6.0,5.0,5.0
6,3.0,6.0,7.0


## Other methods with `rank`

Method | Description
:----- | :----------
`average` | default: assign the average rank to each entry in the same group
`min` | uses the minimum rank for the whole group
`max` | uses the maximum rank for the whole group
`first` | assigns the ranks in the order in which the values appear in the data
`dense` | like `method='min'` but the ranks always increase by 1 between groups and not according to the number of same items in a group