<div style="color:#006666; padding:0px 10px; border-radius:5px; font-size:18px;"><h1 style='margin:10px 5px'>Select, Create and Conditional Filtering</h1>
</div>


# 1. Selecting specific data with .loc , .iloc , .at and .it

In [1]:
import numpy as np
import pandas as pd

In [3]:
df = pd.read_csv(r"../Datasets/churn.csv")
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


Pandas offers 4 primary methods to select items: 

1. __Dot notation__ : Select a single column.
1. __loc__   : select based on column names and index names. 
2. __iloc__  : select based on the column number and row number.
2. __iat__   : select one item only based on column and row number.

__Dot Notation__

select one column only as reference

In [5]:
df.state

0       KS
1       OH
2       NJ
3       OH
4       OK
        ..
3328    AZ
3329    WV
3330    RI
3331    CT
3332    TN
Name: state, Length: 3333, dtype: object

In [6]:
type(df.state)

pandas.core.series.Series

The dot notation can't be used for column names that contain a space character.

### .loc example

.loc takes 2 arguments inside the square brackets. One for index names (row names) an another for columns names.

In [8]:
df['state']

0       KS
1       OH
2       NJ
3       OH
4       OK
        ..
3328    AZ
3329    WV
3330    RI
3331    CT
3332    TN
Name: state, Length: 3333, dtype: object

In [9]:
df.loc[:,'state']

0       KS
1       OH
2       NJ
3       OH
4       OK
        ..
3328    AZ
3329    WV
3330    RI
3331    CT
3332    TN
Name: state, Length: 3333, dtype: object

__So what is the difference between dot notation and using `[]`?__

The dot notation is a convenience that allows for column access as an attribute. But if you want to create a new column using dot, it wont work. It silently creates a new attribute without it appearing as a column.

In [11]:
df.state2 = 'a'
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [12]:
df.state2

'a'

But you can create a new column with bracket notation.

In [13]:
# create a new column
df.loc[:, 'state2'] = 'a'
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn,state2
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,16.78,244.7,91,11.01,10.0,3,2.7,1,False,a
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,16.62,254.4,103,11.45,13.7,3,3.7,1,False,a
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,10.3,162.6,104,7.32,12.2,5,3.29,0,False,a
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,5.26,196.9,89,8.86,6.6,7,1.78,2,False,a
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,12.61,186.9,121,8.41,10.1,3,2.73,3,False,a


Alright, if you want to select more than one column at a time, put them all in a list.

In [None]:
# This is Wrong
# df.loc[:, 'account length', 'area code', 'phone number', 'international plan'].head()

In [17]:
df.loc[:, ['account length', 'area code', 'phone number', 'international plan']].head()

Unnamed: 0,account length,area code,phone number,international plan
0,128,415,382-4657,no
1,107,415,371-7191,no
2,137,415,358-1921,no
3,84,408,375-9999,yes
4,75,415,330-6626,yes


If you select contiguous columns, you can use the ':' notation.

In [18]:
df.loc[:,'account length':'international plan'].head()

Unnamed: 0,account length,area code,phone number,international plan
0,128,415,382-4657,no
1,107,415,371-7191,no
2,137,415,358-1921,no
3,84,408,375-9999,yes
4,75,415,330-6626,yes


### .iloc example

## iloc works on indexes of rows and indexes of column

In [19]:
df.iloc[[0,1,2,3,4], [1,2,3,4]]

Unnamed: 0,account length,area code,phone number,international plan
0,128,415,382-4657,no
1,107,415,371-7191,no
2,137,415,358-1921,no
3,84,408,375-9999,yes
4,75,415,330-6626,yes


In [20]:
# Another way
df.iloc[0:5, 1:5]

Unnamed: 0,account length,area code,phone number,international plan
0,128,415,382-4657,no
1,107,415,371-7191,no
2,137,415,358-1921,no
3,84,408,375-9999,yes
4,75,415,330-6626,yes


### at and iat Example

`at` and `iat` provide access to scalar, that is a single element in the dataframe. 

__Advantage:__ It is much faster than doing operations with .loc and .iloc.

In [21]:
# access single element with iat
df.iat[1, 1]

np.int64(107)

In [22]:
# access single element with at
df.at[1, 'account length']

np.int64(107)

### 2. Gain speed using .at and .iat


The main advantage of using .at and .iat is speed. 

Vectorization > .at > ,iat > loc > iloc