### Preparing the Dataset

In [17]:
import pandas as pd
import os

column_subset = [
     "id",
     "make",
     "model",
     "year",
     "cylinders",
     "fuelType",
     "trany",
     "mpgData",
     "city08",
     "highway08"
 ]


file_path = os.path.join('resources','vehicles.csv')

df = pd.read_csv(
    file_path,
    usecols=column_subset,
    nrows=100
 )


df.head()

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
4,17,4,Premium,23,10000,Subaru,Legacy AWD Turbo,N,Manual 5-spd,1993


## Sorting Your DataFrame on a Single Column

To sort the DataFrame based on the values in a single column, we’ll use ``.sort_values()``. 

By default, this will return a new DataFrame sorted in ascending order. 

It does not modify the original DataFrame.

In this example, we sort the DataFrame by the ``city08`` column, which represents city MPG for fuel-only cars:

In [25]:
df1 = df.sort_values("city08")

print('Sort DataFrame in ascending Order')

df1

Sort DataFrame in ascending Order


Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
99,9,8,Premium,13,10087,Rolls-Royce,Brooklands/Brklnds L,N,Automatic 4-spd,1993
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
...,...,...,...,...,...,...,...,...,...,...
76,23,4,Regular,31,10066,Mazda,626,Y,Manual 5-spd,1993
10,23,4,Regular,30,10006,Toyota,Corolla,Y,Manual 5-spd,1993
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993


### Changing the Sort Order

By default ``.sort_values()`` has ascending set to True. 

To sort the DataFrame in descending order, then we can pass False to this parameter.

In [26]:
print('Sort DataFrame in descending Order')

df2 = df.sort_values("city08",ascending=False)
df2

Sort DataFrame in descending Order


Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993
10,23,4,Regular,30,10006,Toyota,Corolla,Y,Manual 5-spd,1993
...,...,...,...,...,...,...,...,...,...,...
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985


### Choosing a Sorting Algorithm


It’s good to note that pandas allows us to choose different sorting algorithms to use with both ``.sort_values()`` and ``.sort_index()``.

The available algorithms are ``quicksort``, ``mergesort``, and ``heapsort``.

The algorithm used by default when sorting on a single column is ``quicksort``. 

To change sorting algorithm, use kind parameter in .sort_values() or .sort_index(), like this:

In [27]:
df3 = df.sort_values("city08",ascending=False,kind="mergesort")

df3

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
76,23,4,Regular,31,10066,Mazda,626,Y,Manual 5-spd,1993
10,23,4,Regular,30,10006,Toyota,Corolla,Y,Manual 5-spd,1993
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993
...,...,...,...,...,...,...,...,...,...,...
69,10,8,Regular,11,1006,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985
99,9,8,Premium,13,10087,Rolls-Royce,Brooklands/Brklnds L,N,Automatic 4-spd,1993
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985


## Sorting Your DataFrame on Multiple Columns

To sort DataFrame by two keys, we can pass a list of column names ``.sort_values``.

By specifying a list of the column names ``city08`` and ``highway08``, we sort the DataFrame on two columns using ``.sort_values()``.

In [31]:
df4 = df.sort_values(["city08","highway08"])[["city08","highway08"]]

df4.head(20)

Unnamed: 0,city08,highway08
80,9,10
47,9,11
99,9,13
1,9,14
58,10,11
69,10,11
3,10,12
25,11,17
22,11,17
36,11,17


### Sorting by Multiple Columns in Descending Order

To sort in descending order, set ascending to False.

In [34]:
df5 = df.sort_values(["make","model"],ascending=False)[["make","model"]]

df5

Unnamed: 0,make,model
16,Volvo,240
17,Volvo,240
13,Volkswagen,Jetta III
15,Volkswagen,Jetta III
11,Volkswagen,Golf III / GTI
...,...,...
21,BMW,740il
20,BMW,740i
19,Audi,100
18,Audi,100


Please note, with textual data, the sort is case sensitive, meaning capitalized text will appear first in ascending order and last in descending order.

### Sorting by Multiple Columns With Different Sort Orders

In this example, we sort your DataFrame by the make, model, and city08 columns, with the first two columns sorted in ascending order and city08 sorted in descending order. 

In [35]:
df6 = df.sort_values(["make","model","city08"],ascending=[True,True,False])[["make","model","city08"]]

df6

Unnamed: 0,make,model,city08
0,Alfa Romeo,Spider Veloce 2000,19
19,Audi,100,17
18,Audi,100,17
20,BMW,740i,14
21,BMW,740il,14
...,...,...,...
11,Volkswagen,Golf III / GTI,18
15,Volkswagen,Jetta III,20
13,Volkswagen,Jetta III,18
17,Volvo,240,19


## Sorting Your DataFrame on Its Index

Sorting by column values reorders the rows in your DataFrame, so the index becomes disorganized. 

DataFrame can be sorted based on its row index with ``.sort_index()``.

To illustrate the use of ``.sort_index()``, start by creating a new sorted DataFrame using ``.sort_values()``:

In [36]:
sort_df = df.sort_values(["make","model"])

sort_df

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985
19,17,6,Premium,24,10014,Audi,100,N,Manual 5-spd,1993
18,17,6,Premium,22,10013,Audi,100,Y,Automatic 4-spd,1993
20,14,8,Premium,20,10015,BMW,740i,N,Automatic 5-spd,1993
21,14,8,Premium,20,10016,BMW,740il,N,Automatic 5-spd,1993
...,...,...,...,...,...,...,...,...,...,...
12,21,4,Regular,29,10008,Volkswagen,Golf III / GTI,Y,Manual 5-spd,1993
13,18,4,Regular,26,10009,Volkswagen,Jetta III,N,Automatic 4-spd,1993
15,20,4,Regular,28,10010,Volkswagen,Jetta III,N,Manual 5-spd,1993
16,18,4,Regular,23,10011,Volvo,240,Y,Automatic 4-spd,1993


We’ve created a DataFrame that’s sorted using multiple values. Notice how the row index is in no particular order. To get new DataFrame back to the original order, you can use ``.sort_index()``.

The default argument for ascending in ``.sort_index()`` is ``True``.

In [37]:
sort_df.sort_index()

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
4,17,4,Premium,23,10000,Subaru,Legacy AWD Turbo,N,Manual 5-spd,1993
...,...,...,...,...,...,...,...,...,...,...
95,17,6,Regular,25,10083,Pontiac,Grand Prix,Y,Automatic 3-spd,1993
96,17,6,Regular,27,10084,Pontiac,Grand Prix,N,Automatic 4-spd,1993
97,15,6,Regular,24,10085,Pontiac,Grand Prix,N,Automatic 4-spd,1993
98,15,6,Regular,24,10086,Pontiac,Grand Prix,N,Manual 5-spd,1993


### Set Custom Index

If we want to set a custom index using the make and model columns, then we can pass a list to ``.set_index()``.

In [38]:
assigned_index_df = df.set_index(["make","model"])

assigned_index_df

Unnamed: 0_level_0,Unnamed: 1_level_0,city08,cylinders,fuelType,highway08,id,mpgData,trany,year
make,model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Rolls-Royce,Brooklands/Brklnds L,9,8,Premium,13,10087,N,Automatic 4-spd,1993
Ferrari,Testarossa,9,12,Regular,14,10,N,Manual 5-spd,1985
Dodge,B350 Wagon 2WD,9,8,Regular,10,1007,N,Automatic 3-spd,1985
Dodge,B150/B250 Wagon 2WD,9,8,Regular,11,1004,N,Automatic 3-spd,1985
Dodge,B150/B250 Wagon 2WD,10,8,Regular,12,1000,N,Automatic 3-spd,1985
...,...,...,...,...,...,...,...,...,...
Toyota,Corolla,23,4,Regular,30,10006,Y,Manual 5-spd,1993
Toyota,Corolla,23,4,Regular,30,10005,Y,Automatic 4-spd,1993
Toyota,Corolla,23,4,Regular,31,10004,Y,Manual 5-spd,1993
Toyota,Corolla,23,4,Regular,26,10003,Y,Automatic 3-spd,1993


Using this method, we can replace the default integer-based row index with two axis labels. This is considered a MultiIndex or a hierarchical index. 

Now DataFrame ``assigned_index_df`` is now indexed by more than one key, which we can sort on with ``.sort_index()``.

In [39]:
assigned_index_df.sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,city08,cylinders,fuelType,highway08,id,mpgData,trany,year
make,model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alfa Romeo,Spider Veloce 2000,19,4,Regular,25,1,Y,Manual 5-spd,1985
Audi,100,17,6,Premium,24,10014,N,Manual 5-spd,1993
Audi,100,17,6,Premium,22,10013,Y,Automatic 4-spd,1993
BMW,740i,14,8,Premium,20,10015,N,Automatic 5-spd,1993
BMW,740il,14,8,Premium,20,10016,N,Automatic 5-spd,1993
...,...,...,...,...,...,...,...,...,...
Volkswagen,Golf III / GTI,21,4,Regular,29,10008,Y,Manual 5-spd,1993
Volkswagen,Jetta III,18,4,Regular,26,10009,N,Automatic 4-spd,1993
Volkswagen,Jetta III,20,4,Regular,28,10010,N,Manual 5-spd,1993
Volvo,240,18,4,Regular,23,10011,Y,Automatic 4-spd,1993


### Sorting by Index in Descending Order

In [40]:
assigned_index_df.sort_index(ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,city08,cylinders,fuelType,highway08,id,mpgData,trany,year
make,model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Volvo,240,18,4,Regular,23,10011,Y,Automatic 4-spd,1993
Volvo,240,19,4,Regular,26,10012,Y,Manual 5-spd,1993
Volkswagen,Jetta III,18,4,Regular,26,10009,N,Automatic 4-spd,1993
Volkswagen,Jetta III,20,4,Regular,28,10010,N,Manual 5-spd,1993
Volkswagen,Golf III / GTI,18,4,Regular,26,10007,N,Automatic 4-spd,1993
...,...,...,...,...,...,...,...,...,...
BMW,740il,14,8,Premium,20,10016,N,Automatic 5-spd,1993
BMW,740i,14,8,Premium,20,10015,N,Automatic 5-spd,1993
Audi,100,17,6,Premium,24,10014,N,Manual 5-spd,1993
Audi,100,17,6,Premium,22,10013,Y,Automatic 4-spd,1993


### Working With the DataFrame axis

When you use .sort_index() without passing any explicit arguments, it uses axis=0 as a default argument. The axis of a DataFrame refers to either the index (axis=0) or the columns (axis=1). 

#### Using Column Labels to Sort

Setting axis to 1 sorts the columns of your DataFrame based on the column labels:

In [43]:
df.sort_index(axis=1)

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
99,9,8,Premium,13,10087,Rolls-Royce,Brooklands/Brklnds L,N,Automatic 4-spd,1993
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
...,...,...,...,...,...,...,...,...,...,...
10,23,4,Regular,30,10006,Toyota,Corolla,Y,Manual 5-spd,1993
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993


The columns of your DataFrame are sorted from left to right in ascending alphabetical order.

### Working With Missing Data 

The following piece of code creates a new column based on the existing mpgData column, mapping True where mpgData equals Y and NaN where it doesn’t:

In [44]:
df["mpgdata_"] = df["mpgData"].map({"Y":True})
df

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year,mpgdata_
99,9,8,Premium,13,10087,Rolls-Royce,Brooklands/Brklnds L,N,Automatic 4-spd,1993,
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985,
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985,
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
...,...,...,...,...,...,...,...,...,...,...,...
10,23,4,Regular,30,10006,Toyota,Corolla,Y,Manual 5-spd,1993,True
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993,True
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993,True
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993,True


If we sort on a column with missing data, then the rows with the missing values will appear at the end of your DataFrame. 

In [46]:
df.sort_values("mpgdata_")

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year,mpgdata_
32,15,8,Premium,23,10026,Cadillac,Eldorado,Y,Automatic 4-spd,1993,True
46,18,6,Premium,24,10039,Nissan,Maxima,Y,Manual 5-spd,1993,True
55,18,6,Regular,26,10047,Dodge,Spirit,Y,Automatic 4-spd,1993,True
83,18,6,Regular,26,10072,Oldsmobile,Cutlass Ciera,Y,Automatic 4-spd,1993,True
49,18,6,Regular,26,10041,Dodge,Dynasty,Y,Automatic 4-spd,1993,True
...,...,...,...,...,...,...,...,...,...,...,...
52,21,4,Regular,26,10044,Dodge,Spirit,N,Automatic 3-spd,1993,
23,21,4,Regular,28,10018,Buick,Century,N,Automatic 3-spd,1993,
81,21,4,Regular,28,10070,Oldsmobile,Cutlass Ciera,N,Automatic 3-spd,1993,
53,22,4,Regular,29,10045,Dodge,Spirit,N,Manual 5-spd,1993,


To change that behavior and have the missing data appear first in the DataFrame, we can set ``na_position`` to first. 

In [48]:
df.sort_values("mpgdata_",na_position="first")

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year,mpgdata_
99,9,8,Premium,13,10087,Rolls-Royce,Brooklands/Brklnds L,N,Automatic 4-spd,1993,
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985,
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985,
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
...,...,...,...,...,...,...,...,...,...,...,...
74,17,6,Regular,25,10064,Mercury,Sable,Y,Automatic 4-spd,1993,True
85,17,6,Regular,27,10074,Oldsmobile,Cutlass Supreme,Y,Automatic 4-spd,1993,True
29,17,6,Regular,26,10023,Buick,Regal,Y,Automatic 4-spd,1993,True
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993,True


### Using .sort_values() In Place

Both ``.sort_values()`` and ``.sort_index()`` have returned DataFrame objects. That’s because sorting in pandas doesn’t work in place by default. 


However, we can modify the original DataFrame directly by specifying the optional parameter inplace with the value of True.

With inplace set to True, we modify the original DataFrame, so the sort methods return None.

```python
df.sort_values("city08", inplace=True)
```

or

```python
df.sort_index(inplace=True)
```