# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

<img src="iloc_loc.png">
Ref: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/

In [1]:
import pandas as pd
import numpy as np
import random

# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)

In [2]:
df = pd.read_csv("uk-500.csv")
# df = pd.read_csv('https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv')

In [3]:
df.head()

Unnamed: 0,first_name,last_name,company_name,address,city,county,postal,phone1,phone2,email,web
0,Aleshia,Tomkiewicz,Alan D Rosenburg Cpa Pc,14 Taylor St,St. Stephens Ward,Kent,CT2 7PP,01835-703597,01944-369967,atomkiewicz@hotmail.com,http://www.alandrosenburgcpapc.co.uk
1,Evan,Zigomalas,Cap Gemini America,5 Binney St,Abbey Ward,Buckinghamshire,HP11 2AX,01937-864715,01714-737668,evan.zigomalas@gmail.com,http://www.capgeminiamerica.co.uk
2,France,Andrade,"Elliott, John W Esq",8 Moor Place,East Southbourne and Tuckton W,Bournemouth,BH6 3BE,01347-368222,01935-821636,france.andrade@hotmail.com,http://www.elliottjohnwesq.co.uk
3,Ulysses,Mcwalters,"Mcmahan, Ben L",505 Exeter Rd,Hawerby cum Beesby,Lincolnshire,DN36 5RP,01912-771311,01302-601380,ulysses@hotmail.com,http://www.mcmahanbenl.co.uk
4,Tyisha,Veness,Champagne Room,5396 Forth Street,Greets Green and Lyng Ward,West Midlands,B70 9DT,01547-429341,01290-367248,tyisha.veness@hotmail.com,http://www.champagneroom.co.uk


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 11 columns):
first_name      500 non-null object
last_name       500 non-null object
company_name    500 non-null object
address         500 non-null object
city            500 non-null object
county          500 non-null object
postal          500 non-null object
phone1          500 non-null object
phone2          500 non-null object
email           500 non-null object
web             500 non-null object
dtypes: object(11)
memory usage: 43.1+ KB


### 1. Selecting data using “iloc” indexer
*integer location-based indexer*

In [5]:
df.iloc[0] # -- First row, index=0

first_name                                   Aleshia
last_name                                 Tomkiewicz
company_name                 Alan D Rosenburg Cpa Pc
address                                 14 Taylor St
city                               St. Stephens Ward
county                                          Kent
postal                                       CT2 7PP
phone1                                  01835-703597
phone2                                  01944-369967
email                        atomkiewicz@hotmail.com
web             http://www.alandrosenburgcpapc.co.uk
Name: 0, dtype: object

In [6]:
df.iloc[2] # -- Third row, index=2

first_name                                France
last_name                                Andrade
company_name                 Elliott, John W Esq
address                             8 Moor Place
city              East Southbourne and Tuckton W
county                               Bournemouth
postal                                   BH6 3BE
phone1                              01347-368222
phone2                              01935-821636
email                 france.andrade@hotmail.com
web             http://www.elliottjohnwesq.co.uk
Name: 2, dtype: object

In [7]:
df.iloc[:, 0].head() # -- First column of the dataframe

0    Aleshia
1       Evan
2     France
3    Ulysses
4     Tyisha
Name: first_name, dtype: object

In [8]:
df.iloc[:, 0:2].head() # -- First two columns

Unnamed: 0,first_name,last_name
0,Aleshia,Tomkiewicz
1,Evan,Zigomalas
2,France,Andrade
3,Ulysses,Mcwalters
4,Tyisha,Veness


In [9]:
# 1st, 4th, 7th, 25th row + 1st 6th 7th columns
df.iloc[[0, 3, 6, 24], [0, 5, 6]]

Unnamed: 0,first_name,county,postal
0,Aleshia,Kent,CT2 7PP
3,Ulysses,Lincolnshire,DN36 5RP
6,Marg,Southampton,SO14 3TY
24,Tess,West Sussex,PO19 1RH


In [10]:
# first 5 rows and 5th, 6th, 7th columns of data frame
df.iloc[0:5, 4:7]

Unnamed: 0,city,county,postal
0,St. Stephens Ward,Kent,CT2 7PP
1,Abbey Ward,Buckinghamshire,HP11 2AX
2,East Southbourne and Tuckton W,Bournemouth,BH6 3BE
3,Hawerby cum Beesby,Lincolnshire,DN36 5RP
4,Greets Green and Lyng Ward,West Midlands,B70 9DT


### 2. Selecting data using “loc” indexer
*location-based indexer*

In [11]:
# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email'

In [12]:
df.set_index("last_name", inplace=True)
df.head()

Unnamed: 0_level_0,first_name,company_name,address,city,county,postal,phone1,phone2,email,web
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Tomkiewicz,Aleshia,Alan D Rosenburg Cpa Pc,14 Taylor St,St. Stephens Ward,Kent,CT2 7PP,01835-703597,01944-369967,atomkiewicz@hotmail.com,http://www.alandrosenburgcpapc.co.uk
Zigomalas,Evan,Cap Gemini America,5 Binney St,Abbey Ward,Buckinghamshire,HP11 2AX,01937-864715,01714-737668,evan.zigomalas@gmail.com,http://www.capgeminiamerica.co.uk
Andrade,France,"Elliott, John W Esq",8 Moor Place,East Southbourne and Tuckton W,Bournemouth,BH6 3BE,01347-368222,01935-821636,france.andrade@hotmail.com,http://www.elliottjohnwesq.co.uk
Mcwalters,Ulysses,"Mcmahan, Ben L",505 Exeter Rd,Hawerby cum Beesby,Lincolnshire,DN36 5RP,01912-771311,01302-601380,ulysses@hotmail.com,http://www.mcmahanbenl.co.uk
Veness,Tyisha,Champagne Room,5396 Forth Street,Greets Green and Lyng Ward,West Midlands,B70 9DT,01547-429341,01290-367248,tyisha.veness@hotmail.com,http://www.champagneroom.co.uk


In [13]:
df.loc['Andrade']

first_name                                France
company_name                 Elliott, John W Esq
address                             8 Moor Place
city              East Southbourne and Tuckton W
county                               Bournemouth
postal                                   BH6 3BE
phone1                              01347-368222
phone2                              01935-821636
email                 france.andrade@hotmail.com
web             http://www.elliottjohnwesq.co.uk
Name: Andrade, dtype: object

In [14]:
# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email'
df.loc[['Andrade', 'Veness'], 'city':'email']

Unnamed: 0_level_0,city,county,postal,phone1,phone2,email
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Andrade,East Southbourne and Tuckton W,Bournemouth,BH6 3BE,01347-368222,01935-821636,france.andrade@hotmail.com
Veness,Greets Green and Lyng Ward,West Midlands,B70 9DT,01547-429341,01290-367248,tyisha.veness@hotmail.com


In [15]:
# Select same rows, with just 'first_name', 'address' and 'city' columns
df.loc[['Andrade', 'Veness'], ['first_name', 'address', 'city']]

Unnamed: 0_level_0,first_name,address,city
last_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Andrade,France,8 Moor Place,East Southbourne and Tuckton W
Veness,Tyisha,5396 Forth Street,Greets Green and Lyng Ward


In [16]:
# Change the index to be based on the 'id' column

df['id'] = [random.randint(0,1000) for x in range(df.shape[0])]

df.set_index('id', inplace=True)

In [17]:
df.head()

Unnamed: 0_level_0,first_name,company_name,address,city,county,postal,phone1,phone2,email,web
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
814,Aleshia,Alan D Rosenburg Cpa Pc,14 Taylor St,St. Stephens Ward,Kent,CT2 7PP,01835-703597,01944-369967,atomkiewicz@hotmail.com,http://www.alandrosenburgcpapc.co.uk
437,Evan,Cap Gemini America,5 Binney St,Abbey Ward,Buckinghamshire,HP11 2AX,01937-864715,01714-737668,evan.zigomalas@gmail.com,http://www.capgeminiamerica.co.uk
101,France,"Elliott, John W Esq",8 Moor Place,East Southbourne and Tuckton W,Bournemouth,BH6 3BE,01347-368222,01935-821636,france.andrade@hotmail.com,http://www.elliottjohnwesq.co.uk
472,Ulysses,"Mcmahan, Ben L",505 Exeter Rd,Hawerby cum Beesby,Lincolnshire,DN36 5RP,01912-771311,01302-601380,ulysses@hotmail.com,http://www.mcmahanbenl.co.uk
631,Tyisha,Champagne Room,5396 Forth Street,Greets Green and Lyng Ward,West Midlands,B70 9DT,01547-429341,01290-367248,tyisha.veness@hotmail.com,http://www.champagneroom.co.uk


In [18]:
# select the row with 'id' = 281
df.loc[281]

first_name                              Margarett
company_name                 Reid, Carleton B Esq
address                               3 August Rd
city                  Maybury and Sheerwater Ward
county                                     Surrey
postal                                   GU21 5QL
phone1                               01670-813697
phone2                               01903-424890
email                         margarett@gmail.com
web             http://www.reidcarletonbesq.co.uk
Name: 281, dtype: object

## Next: Missing Data