In [10]:
# Review of head/shape
# Inplace = True
# Renaming columns/dropping columns

In [2]:
# The data is in the csv file called titanic.csv
# create a dataframe object using it, and look at the head to start getting familiar with its structure
import pandas as pd
df = pd.read_csv('../titanic.csv', index_col='PassengerId')

# 1. Learn to interact and manipulate dataframe columns

Let's take a look at the head of the data frame and the shape, just to get a quick overview.

In [3]:
df.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
df.shape

(891, 11)

### Quick knowledge check
We always want to be aware of what a row represents. 

What does each row in the dataframe represent? 

In [None]:
# Type answer here

Like most things code, there are several ways to view columns.

The first way is to look at the columns attribute of the dataframe.

In [5]:
# We are getting familiar with dataframe attributes: .shape and now .columns
df.columns

Index(['Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket',
       'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [6]:
# We can confirm that the number of columns matches the second index of the shape attribute

len(df.columns) == df.shape[1]

True

A second way to see the columns is using the built in list() method:

In [7]:
list(df)

['Survived',
 'Pclass',
 'Name',
 'Sex',
 'Age',
 'SibSp',
 'Parch',
 'Ticket',
 'Fare',
 'Cabin',
 'Embarked']

Consider the situation where you want to rename a column in the dataframe. Let's say you are getting tired of remembering that SibSp refers to siblings and spouses. We can rename it like so:

In [8]:
df.rename({'SibSp':'siblings_and_spouses'}, axis=1) # Axis tells the rename method to look for SibSp along the columns axis

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,siblings_and_spouses,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...
887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


Great. Now print out the head of the df

In [None]:
df.head()

Looks like something did not register.  The column name is back to SibSp. 
A finicky thing about Pandas is the use of inplace.  
In order for the object to be transformed in memory, we need to assign the inplace paramater the value of True

In [None]:
df.rename({'SibSp':'siblings_and_spouses'}, axis=1, inplace=True)

In [None]:
df.head()

We can also change multiple columns at once with a dictionary:

In [None]:
df.rename(columns = {'Parch': 'parent_child_ratio', 'Pclass': 'ticket_class'}, inplace=True)

In [None]:
df.head()

We can also interact directly with the .columns attribute


In [None]:
df_columns = df.columns # saved for pairprogramming

df.columns = list('ABCDEFGHIJK')
# What will the columns of our dataframe look like now?

If we find a column is not useful, we can drop columns with the drop method.



In [None]:
df.drop('A', axis=1)

# Pair Program 1:

Take 5 minutes with a partner to perform this activity.

We just renamed our columns to a useless series of letters. Luckily we saved our column names in the variable df_columns. Let's rename our columns using columns attribute.  To make things neater, we want the column names to all be lowercase.   You can perform this in any way you prefer, but a list comprehension can do it in one line.

Remember, list comprehensions look like this:
> [function(variable) for variable in iterable]

In [None]:
# your answer here