## 8. How do I rename columns, remove columns, and reverse column order?

In this blog, we will learn about renaming column and row labels, dropping columns and rows, and reversing the row and column order to bring data at the bottom to top and on right extreme to left for rows and columns respectively.

### 8.1. Renaming columns and rows

In [1]:
import pandas as pd

We will use the UFO sightings report dataset to learn how to answer the above question. We can use the "columns" attribute to check the names of columns in the data frame, for example, "ufo.columns" returns a list like an index object with the name of all columns in the ufo data frame.

In [2]:
ufo = pd.read_csv("http://bit.ly/uforeports", parse_dates=["Time"])
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [3]:
ufo.columns

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

In [4]:
ufo.index

RangeIndex(start=0, stop=18241, step=1)

#### 8.1.1. Renaming Columns

##### I. Renaming a subset  of columns

We will use the data frame's "rename( )" method to change the name of the column. Also, "inplace=True” changes the underlying data and the default value of inplace i.e. “inplace=False" allows us to play with the method without changing the underlying data. We will also use two syntaxes for the "rename( )” method. Suppose we want to change “Colors Reported” and “Shape Reported” columns so that they don’t have space in their names.

In [5]:
#newer method
ufo.rename({"Colors Reported":"Colors_Reported", "Shape Reported":"Shape_Reported"}, axis="columns", inplace=True)

In [6]:
ufo.head()

Unnamed: 0,City,Colors_Reported,Shape_Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [7]:
ufo.columns

Index(['City', 'Colors_Reported', 'Shape_Reported', 'State', 'Time'], dtype='object')

Notice that while using the old method, we didn’t use “inplace=True” so underlying data was not changed as reflected by “ufo.columns” under it.

In [8]:
#old method
ufo.rename(columns={"Colors_Reported":"colors_reported", "Shape_Reported":"shape_reported"})

Unnamed: 0,City,colors_reported,shape_reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,2000-12-31 23:00:00
18237,Spirit Lake,,DISK,IA,2000-12-31 23:00:00
18238,Eagle River,,,WI,2000-12-31 23:45:00
18239,Eagle River,RED,LIGHT,WI,2000-12-31 23:45:00


In [9]:
ufo.columns

Index(['City', 'Colors_Reported', 'Shape_Reported', 'State', 'Time'], dtype='object')

#### II. Renaming all the columns

To rename all the columns we can equate "ufo.columns” to a list containing all our desired column names. Alternatively, we can pass our desired names while reading the dataset by equating “names” to the list containing our desired names and using “header=0” (note that header=None means dataset doesn’t have column names while header=0 says the first row has column names, and we want to replace it with our desired names)

In [10]:
ufo_cols = ["city", "colors reported", "shape reported", "state", "time"]

In [11]:
ufo.columns = ufo_cols

In [12]:
ufo.head()

Unnamed: 0,city,colors reported,shape reported,state,time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [13]:
#Alternatively
ufo = pd.read_csv("http://bit.ly/uforeports", header=0, names=ufo_cols, parse_dates=["time"])
ufo.head(3)

Unnamed: 0,city,colors reported,shape reported,state,time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00


We can also use string methods to change our column names by using them as a "map" for the rename method. We will change the case to the upper for all column names. Notice “inplace=True” is missing so no change will be made to underlying data.

In [14]:
ufo.rename(str.upper, axis="columns")

Unnamed: 0,CITY,COLORS REPORTED,SHAPE REPORTED,STATE,TIME
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,2000-12-31 23:00:00
18237,Spirit Lake,,DISK,IA,2000-12-31 23:00:00
18238,Eagle River,,,WI,2000-12-31 23:45:00
18239,Eagle River,RED,LIGHT,WI,2000-12-31 23:45:00


In [15]:
ufo.columns

Index(['city', 'colors reported', 'shape reported', 'state', 'time'], dtype='object')

If we want to replace “ ” with “_” we can use string methods to modify the column names and equate it to “ufo.columns” to change the names.

In [16]:
ufo.columns = ufo.columns.str.replace(" ", "_")

In [17]:
ufo

Unnamed: 0,city,colors_reported,shape_reported,state,time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00
...,...,...,...,...,...
18236,Grant Park,,TRIANGLE,IL,2000-12-31 23:00:00
18237,Spirit Lake,,DISK,IA,2000-12-31 23:00:00
18238,Eagle River,,,WI,2000-12-31 23:45:00
18239,Eagle River,RED,LIGHT,WI,2000-12-31 23:45:00


If you simply want to add a prefix or add a suffix, you can use the data frame's add_prefix( ) and add_suffix( ) methods.

In [18]:
ufo.add_prefix("X_").head()

Unnamed: 0,X_city,X_colors_reported,X_shape_reported,X_state,X_time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [19]:
ufo.add_suffix("_Y").head()

Unnamed: 0,city_Y,colors_reported_Y,shape_reported_Y,state_Y,time_Y
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


#### 8.1.2. Renaming Index

We can set the name of the index using the data frame's name followed by index.name. Other operations of changing index labels are similar to changing column names. Take a look at the codes below.

In [20]:
ufo.index.name = "S.N."
ufo.head()

Unnamed: 0_level_0,city,colors_reported,shape_reported,state,time
S.N.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [21]:
ufo.rename({0:"0th", 1:"1th"}, axis="index", inplace=True)
ufo.head(3)

Unnamed: 0_level_0,city,colors_reported,shape_reported,state,time
S.N.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0th,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1th,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00


In [22]:
ufo.rename({2:"2nd", 3:"3rd"}, axis=0, inplace=True)
ufo.head()

Unnamed: 0_level_0,city,colors_reported,shape_reported,state,time
S.N.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0th,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1th,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2nd,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3rd,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [23]:
ufo.rename(index={4:"4th"}, inplace=True)
ufo.head()

Unnamed: 0_level_0,city,colors_reported,shape_reported,state,time
S.N.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0th,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1th,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2nd,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3rd,Abilene,,DISK,KS,1931-06-01 13:00:00
4th,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


<hr>

### 8.2. Removing columns and rows

In [24]:
ufo = pd.read_csv("http://bit.ly/uforeports", parse_dates=["Time"])
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


In [25]:
ufo.shape

(18241, 5)

#### 8.2.1. Removing Columns

We can always choose which columns we want to load in our data frame using the "usecols” parameter. Sometimes though, when we are done using a column we may drop it before exporting our data frame (another blog :) ). Pretend we don't need the "Shape Reported" column. We will use the data frame's "drop( )” to drop the column. We can do this in at least three different ways. Take a look at the code below. We will use “inplace=True" for the third alternative.

In [26]:
ufo.drop("Colors Reported", axis="columns").head()

Unnamed: 0,City,Shape Reported,State,Time
0,Ithaca,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,OVAL,CO,1931-02-15 14:00:00
3,Abilene,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,LIGHT,NY,1933-04-18 19:00:00


In [27]:
#Alternatively
ufo.drop("Colors Reported", axis=1).head()

Unnamed: 0,City,Shape Reported,State,Time
0,Ithaca,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,OVAL,CO,1931-02-15 14:00:00
3,Abilene,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,LIGHT,NY,1933-04-18 19:00:00


In [28]:
#Alternatively
ufo.drop(columns="Colors Reported", inplace = True)

In [29]:
ufo.head()

Unnamed: 0,City,Shape Reported,State,Time
0,Ithaca,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,OVAL,CO,1931-02-15 14:00:00
3,Abilene,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,LIGHT,NY,1933-04-18 19:00:00


In [30]:
ufo.drop(["City", "State"], axis="columns").head()

Unnamed: 0,Shape Reported,Time
0,TRIANGLE,1930-06-01 22:00:00
1,OTHER,1930-06-30 20:00:00
2,OVAL,1931-02-15 14:00:00
3,DISK,1931-06-01 13:00:00
4,LIGHT,1933-04-18 19:00:00


In [31]:
ufo.drop(["City", "State"], axis=1).head()

Unnamed: 0,Shape Reported,Time
0,TRIANGLE,1930-06-01 22:00:00
1,OTHER,1930-06-30 20:00:00
2,OVAL,1931-02-15 14:00:00
3,DISK,1931-06-01 13:00:00
4,LIGHT,1933-04-18 19:00:00


We can also drop multiple columns at the same time, by replacing the string representing a single column name with a list containing column names for all columns we want to drop.

In [32]:
ufo.drop(columns=["City", "State"], inplace=True)

In [33]:
ufo.head()

Unnamed: 0,Shape Reported,Time
0,TRIANGLE,1930-06-01 22:00:00
1,OTHER,1930-06-30 20:00:00
2,OVAL,1931-02-15 14:00:00
3,DISK,1931-06-01 13:00:00
4,LIGHT,1933-04-18 19:00:00


#### 8.2.2. Removing Rows

In [34]:
ufo.shape

(18241, 2)

The method of dropping rows is similar to dropping columns, but we need to use "index" instead of "columns", and "0" instead of "1". Take a look at the code below.

In [35]:
ufo.drop(0, axis="index").head()

Unnamed: 0,Shape Reported,Time
1,OTHER,1930-06-30 20:00:00
2,OVAL,1931-02-15 14:00:00
3,DISK,1931-06-01 13:00:00
4,LIGHT,1933-04-18 19:00:00
5,DISK,1934-09-15 15:30:00


In [36]:
#Alternatively
ufo.drop(1, axis=0).head()

Unnamed: 0,Shape Reported,Time
0,TRIANGLE,1930-06-01 22:00:00
2,OVAL,1931-02-15 14:00:00
3,DISK,1931-06-01 13:00:00
4,LIGHT,1933-04-18 19:00:00
5,DISK,1934-09-15 15:30:00


In [37]:
#Alternatively
ufo.drop(index=2, inplace = True)

In [38]:
ufo.head()

Unnamed: 0,Shape Reported,Time
0,TRIANGLE,1930-06-01 22:00:00
1,OTHER,1930-06-30 20:00:00
3,DISK,1931-06-01 13:00:00
4,LIGHT,1933-04-18 19:00:00
5,DISK,1934-09-15 15:30:00


We can also use the "shape" attribute to confirm the column was dropped. We can, similar to columns, drop multiple rows at a time as well.

In [39]:
ufo.shape

(18240, 2)

In [40]:
ufo.drop([0, 1], axis="index").head()

Unnamed: 0,Shape Reported,Time
3,DISK,1931-06-01 13:00:00
4,LIGHT,1933-04-18 19:00:00
5,DISK,1934-09-15 15:30:00
6,CIRCLE,1935-06-15 00:00:00
7,DISK,1936-07-15 00:00:00


In [41]:
ufo.drop([3, 4], axis=0).head()

Unnamed: 0,Shape Reported,Time
0,TRIANGLE,1930-06-01 22:00:00
1,OTHER,1930-06-30 20:00:00
5,DISK,1934-09-15 15:30:00
6,CIRCLE,1935-06-15 00:00:00
7,DISK,1936-07-15 00:00:00


In [42]:
ufo.drop(index=[0, 1], inplace=True)

In [43]:
ufo.head()

Unnamed: 0,Shape Reported,Time
3,DISK,1931-06-01 13:00:00
4,LIGHT,1933-04-18 19:00:00
5,DISK,1934-09-15 15:30:00
6,CIRCLE,1935-06-15 00:00:00
7,DISK,1936-07-15 00:00:00


In [44]:
ufo.shape

(18238, 2)

<hr>

### 8.3. Reversing Row and Column order

We will use two datasets for this exercise, the first one is a dataset of average alcohol consumption by country, and the second is movie rating dataset from IMDB.

In [45]:
drinks = pd.read_csv("http://bit.ly/drinksbycountry")
movies = pd.read_csv("http://bit.ly/imdbratings")

#### 8.3.1. Reversing Index order

We can reverse row order by placing “::-1" in row's position of loc accessor, the slicing notation used to reverse a python list.

In [46]:
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [47]:
drinks.loc[::-1, :].head(3)

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
192,Zimbabwe,64,18,4,4.7,Africa
191,Zambia,32,19,4,2.5,Africa
190,Yemen,6,0,0,0.1,Asia


We can also reset the index so that it starts from zero for the reversed data frame using the "reset_index( )” method with parameter “drop=True”.

In [48]:
drinks.loc[::-1, :].reset_index(drop=True).head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Zimbabwe,64,18,4,4.7,Africa
1,Zambia,32,19,4,2.5,Africa
2,Yemen,6,0,0,0.1,Asia
3,Vietnam,111,2,1,2.0,Asia
4,Venezuela,333,100,3,7.7,South America


#### 8.3.2. Reversing Column order

Reversing column order is similar to reversing row order, but we pass “::-1” in columns position in loc.

In [49]:
movies.head()

Unnamed: 0,star_rating,title,content_rating,genre,duration,actors_list
0,9.3,The Shawshank Redemption,R,Crime,142,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt..."
1,9.2,The Godfather,R,Crime,175,"[u'Marlon Brando', u'Al Pacino', u'James Caan']"
2,9.1,The Godfather: Part II,R,Crime,200,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv..."
3,9.0,The Dark Knight,PG-13,Action,152,"[u'Christian Bale', u'Heath Ledger', u'Aaron E..."
4,8.9,Pulp Fiction,R,Crime,154,"[u'John Travolta', u'Uma Thurman', u'Samuel L...."


In [50]:
movies.loc[:, ::-1].head()

Unnamed: 0,actors_list,duration,genre,content_rating,title,star_rating
0,"[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...",142,Crime,R,The Shawshank Redemption,9.3
1,"[u'Marlon Brando', u'Al Pacino', u'James Caan']",175,Crime,R,The Godfather,9.2
2,"[u'Al Pacino', u'Robert De Niro', u'Robert Duv...",200,Crime,R,The Godfather: Part II,9.1
3,"[u'Christian Bale', u'Heath Ledger', u'Aaron E...",152,Action,PG-13,The Dark Knight,9.0
4,"[u'John Travolta', u'Uma Thurman', u'Samuel L....",154,Crime,R,Pulp Fiction,8.9
