## Class Description:

In this class we will learn how to delete columns as part of our data cleaning task. We will cover the following:

1. How to delete columns
    * Using the drop() method to delete columns and the various techniques
        - Remove specific single column.
        - Remove specific multiple columns.
        - Remove columns as based on column index.
        - Drop columns using iloc[] and drop() method.
        - Drop columns using loc[] and drop() method.



### Create a sample Dataset

In [33]:
import pandas as pd
df = pd.DataFrame({'Name':['Jane','Henry','Praise','Peter'],
                  'Age':[20,56,49,45],
                   'Sex':['Female','Male','Female','Male'],
                  'Location':['Russia','France','USA','Nigeria'],
                  'Status': ['Single','Married','Divoiced','Married'],
                  'Job':['Student','Retired','Doctor','Web Developer']}, index = (1,2,3,4))

df

Unnamed: 0,Name,Age,Sex,Location,Status,Job
1,Jane,20,Female,Russia,Single,Student
2,Henry,56,Male,France,Married,Retired
3,Praise,49,Female,USA,Divoiced,Doctor
4,Peter,45,Male,Nigeria,Married,Web Developer


## Remove specific single column.

To demonstrate how to drop a specific column, let us drop the 'Job' column.

In [23]:
#Let us make a copy of our dataframe
df2 = df.copy()

In [24]:
'''
The axis = 1 parameter species that we are dropping based on column and not row. 
To specify a row, you use axis = 0.
'''
df2.drop(['Job'], axis = 1)
df2

Unnamed: 0,Name,Age,Sex,Location,Status,Job
1,Jane,20,Female,Russia,Single,Student
2,Henry,56,Male,France,Married,Retired
3,Praise,49,Female,USA,Divoiced,Doctor
4,Peter,45,Male,Nigeria,Married,Web Developer


You will notice that despite performing the drop operation, our dataframe still have the 'Job' column. This is because the drop() method does not overwrite the orignal dataframe object. To overwrite the original dataframe object we will need to use the inplace parameter along. See the next code:

In [25]:
df2.drop(['Job'], axis = 1, inplace = True)
df2

Unnamed: 0,Name,Age,Sex,Location,Status
1,Jane,20,Female,Russia,Single
2,Henry,56,Male,France,Married
3,Praise,49,Female,USA,Divoiced
4,Peter,45,Male,Nigeria,Married


You will notice that the 'Job' columnn is now out since we have now used the inplace parameter.

## Remove specific multiple columns.

Let us remove the 'Location' and 'Status' columns this time.

In [26]:
df2.drop(['Location','Status'], axis = 1, inplace = True)
df2

Unnamed: 0,Name,Age,Sex
1,Jane,20,Female
2,Henry,56,Male
3,Praise,49,Female
4,Peter,45,Male


## Remove columns as based on column index.

We can drop columns based on their index location. The index of a column like the reference address or position for that column.

Let us remove the column having index 0

In [27]:
df2.drop(df2.columns[[0]], axis = 1, inplace = True)
df2

Unnamed: 0,Age,Sex
1,20,Female
2,56,Male
3,49,Female
4,45,Male


You will notice that the 'Name' column is gone as Python begins its indexing from 0.

## Remove columns as based on multiple column index.

We will recopy our dataframe and use the index position to drop columns 0, 3, 5

In [35]:
df2 = df.copy()
df2.drop(df2.columns[[0,3,5]], axis = 1, inplace = True)
df2

Unnamed: 0,Age,Sex,Status
1,20,Female,Single
2,56,Male,Married
3,49,Female,Divoiced
4,45,Male,Married


This takes away the Name, Location and Job columns

## Drop columns using iloc[] and drop() method.

The iloc[] is the integer location of a particular column.

In [38]:
df2 = df.copy()
df2.drop(df.iloc[:, 1:3], inplace = True, axis = 1)
df2

Unnamed: 0,Name,Location,Status,Job
1,Jane,Russia,Single,Student
2,Henry,France,Married,Retired
3,Praise,USA,Divoiced,Doctor
4,Peter,Nigeria,Married,Web Developer


Our iloc[] operation has dropped columns 1 and 2 which are 'Age' and 'Sex'.

## Drop columns using loc[] and drop() method.

The loc[] function references columns in a dataframe by name.

In [39]:
df2.drop(df2.loc[:, 'Location':'Status'].columns, axis = 1, inplace = True)

df2

Unnamed: 0,Name,Job
1,Jane,Student
2,Henry,Retired
3,Praise,Doctor
4,Peter,Web Developer


This last operation leaves us with just the 'Name' and 'Job' columns.

### Wrap Up

At the end of this class, you are now able to clean a dataset that requires you to drop columns based on different requirements and you can do so using different techniques. 

To follow more classes please follow [Data Science Arena](https://twitter.com/@xtian4zy). 

Complete the exercise that accompanies this class and tag me on twitter to your solution.