# Introduction to Pandas

**Learning Objectives:**
  * Gain an introduction to the *pandas* library
  * Slice, dice and summarize data within a `DataFrame`
  

## Library import

The following line imports the *pandas* library

In [1]:
import pandas as pd


## Data loading and DataFrame creation


. The following example loads a file with the famous titanic data. Run the following cell to load the data and create your first `DataFrame`

In [10]:
titanic = pd.read_csv("https://raw.githubusercontent.com/thousandoaks/Python4DS101/master/data/titanic.csv", sep=",")


Let's display the first few records of a `DataFrame`:

In [3]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [7]:
titanic.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


The data consists of the following data columns:

* PassengerId: Id of every passenger.

* Survived: This feature have value 0 and 1. 0 for not survived and 1 for survived.

* Pclass: There are 3 classes: Class 1, Class 2 and Class 3.

* Name: Name of passenger.

* Sex: Gender of passenger.

* Age: Age of passenger.

* SibSp: Indication that passenger have siblings and spouse.

* Parch: Whether a passenger is alone or have family.

* Ticket: Ticket number of passenger.

* Fare: Indicating the fare.

* Cabin: The cabin of passenger.

* Embarked: The embarked category.





## Basic Operations

### Index Manipulation

In [8]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [11]:
titanic.set_index('Name',inplace=True)
titanic

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
"Braund, Mr. Owen Harris",1,0,3,male,22.0,1,0,A/5 21171,7.2500,,S
"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C
"Heikkinen, Miss. Laina",3,1,3,female,26.0,0,0,STON/O2. 3101282,7.9250,,S
"Futrelle, Mrs. Jacques Heath (Lily May Peel)",4,1,1,female,35.0,1,0,113803,53.1000,C123,S
"Allen, Mr. William Henry",5,0,3,male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...
"Montvila, Rev. Juozas",887,0,2,male,27.0,0,0,211536,13.0000,,S
"Graham, Miss. Margaret Edith",888,1,1,female,19.0,0,0,112053,30.0000,B42,S
"Johnston, Miss. Catherine Helen ""Carrie""",889,0,3,female,,1,2,W./C. 6607,23.4500,,S
"Behr, Mr. Karl Howell",890,1,1,male,26.0,0,0,111369,30.0000,C148,C


### Sorting

In [12]:
titanic.sort_index(inplace=True)
titanic

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
"Abbing, Mr. Anthony",846,0,3,male,42.0,0,0,C.A. 5547,7.5500,,S
"Abbott, Mr. Rossmore Edward",747,0,3,male,16.0,1,1,C.A. 2673,20.2500,,S
"Abbott, Mrs. Stanton (Rosa Hunt)",280,1,3,female,35.0,1,1,C.A. 2673,20.2500,,S
"Abelson, Mr. Samuel",309,0,2,male,30.0,1,0,P/PP 3381,24.0000,,C
"Abelson, Mrs. Samuel (Hannah Wizosky)",875,1,2,female,28.0,1,0,P/PP 3381,24.0000,,C
...,...,...,...,...,...,...,...,...,...,...,...
"de Mulder, Mr. Theodore",287,1,3,male,30.0,0,0,345774,9.5000,,S
"de Pelsmaeker, Mr. Alfons",283,0,3,male,16.0,0,0,345778,9.5000,,S
"del Carlo, Mr. Sebastiano",362,0,2,male,29.0,1,0,SC/PARIS 2167,27.7208,,C
"van Billiard, Mr. Austin Blyler",154,0,3,male,40.5,0,2,A/5. 851,14.5000,,S


In [13]:
titanic.sort_values(by='Fare', ascending=False, inplace=True)
titanic

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
"Ward, Miss. Anna",259,1,1,female,35.0,0,0,PC 17755,512.3292,,C
"Cardeza, Mr. Thomas Drake Martinez",680,1,1,male,36.0,0,1,PC 17755,512.3292,B51 B53 B55,C
"Lesurer, Mr. Gustave J",738,1,1,male,35.0,0,0,PC 17755,512.3292,B101,C
"Fortune, Miss. Alice Elizabeth",342,1,1,female,24.0,3,2,19950,263.0000,C23 C25 C27,S
"Fortune, Miss. Mabel Helen",89,1,1,female,23.0,3,2,19950,263.0000,C23 C25 C27,S
...,...,...,...,...,...,...,...,...,...,...,...
"Tornquist, Mr. William Henry",272,1,3,male,25.0,0,0,LINE,0.0000,,S
"Parkes, Mr. Francis ""Frank""",278,0,2,male,,0,0,239853,0.0000,,S
"Parr, Mr. William Henry Marsh",634,0,1,male,,0,0,112052,0.0000,,S
"Reuchlin, Jonkheer. John George",823,0,1,male,38.0,0,0,19972,0.0000,,S


## Column selection by name

In [14]:
titanic['Age']

Unnamed: 0_level_0,Age
Name,Unnamed: 1_level_1
"Ward, Miss. Anna",35.0
"Cardeza, Mr. Thomas Drake Martinez",36.0
"Lesurer, Mr. Gustave J",35.0
"Fortune, Miss. Alice Elizabeth",24.0
"Fortune, Miss. Mabel Helen",23.0
...,...
"Tornquist, Mr. William Henry",25.0
"Parkes, Mr. Francis ""Frank""",
"Parr, Mr. William Henry Marsh",
"Reuchlin, Jonkheer. John George",38.0


In [16]:
titanic[['Age','Sex','Fare']]

Unnamed: 0_level_0,Age,Sex,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Ward, Miss. Anna",35.0,female,512.3292
"Cardeza, Mr. Thomas Drake Martinez",36.0,male,512.3292
"Lesurer, Mr. Gustave J",35.0,male,512.3292
"Fortune, Miss. Alice Elizabeth",24.0,female,263.0000
"Fortune, Miss. Mabel Helen",23.0,female,263.0000
...,...,...,...
"Tornquist, Mr. William Henry",25.0,male,0.0000
"Parkes, Mr. Francis ""Frank""",,male,0.0000
"Parr, Mr. William Henry Marsh",,male,0.0000
"Reuchlin, Jonkheer. John George",38.0,male,0.0000


## Row selection by name

In [18]:
titanic.loc['Knight, Mr. Robert J']

Unnamed: 0,"Knight, Mr. Robert J"
PassengerId,733
Survived,0
Pclass,2
Sex,male
Age,
SibSp,0
Parch,0
Ticket,239855
Fare,0.0
Cabin,


In [19]:
titanic.loc['Parkes, Mr. Francis "Frank"']

Unnamed: 0,"Parkes, Mr. Francis ""Frank"""
PassengerId,278
Survived,0
Pclass,2
Sex,male
Age,
SibSp,0
Parch,0
Ticket,239853
Fare,0.0
Cabin,


## Basic arithmetic operations

In [20]:
titanic.head()

Unnamed: 0_level_0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
"Ward, Miss. Anna",259,1,1,female,35.0,0,0,PC 17755,512.3292,,C
"Cardeza, Mr. Thomas Drake Martinez",680,1,1,male,36.0,0,1,PC 17755,512.3292,B51 B53 B55,C
"Lesurer, Mr. Gustave J",738,1,1,male,35.0,0,0,PC 17755,512.3292,B101,C
"Fortune, Miss. Alice Elizabeth",342,1,1,female,24.0,3,2,19950,263.0,C23 C25 C27,S
"Fortune, Miss. Mabel Helen",89,1,1,female,23.0,3,2,19950,263.0,C23 C25 C27,S


In [22]:
titanic['Age'].mean()

29.69911764705882

In [23]:
titanic['Age'].std()

14.526497332334046

In [25]:
titanic['Fare'].max()

512.3292

In [26]:
titanic['Fare'].min()

0.0

In [28]:
titanic['PassengerId'].count()

891

In [30]:
titanic['Fare']/1000

Unnamed: 0_level_0,Fare
Name,Unnamed: 1_level_1
"Ward, Miss. Anna",0.512329
"Cardeza, Mr. Thomas Drake Martinez",0.512329
"Lesurer, Mr. Gustave J",0.512329
"Fortune, Miss. Alice Elizabeth",0.263000
"Fortune, Miss. Mabel Helen",0.263000
...,...
"Tornquist, Mr. William Henry",0.000000
"Parkes, Mr. Francis ""Frank""",0.000000
"Parr, Mr. William Henry Marsh",0.000000
"Reuchlin, Jonkheer. John George",0.000000
