<div style="color:#006666; padding:0px 10px; border-radius:5px; font-size:18px;"><h1 style='margin:10px 5px'>Select, Create and Conditional Filtering</h1>
</div>


# 1. Selecting specific data with .loc , .iloc , .at and .it

In [52]:
import numpy as np
import pandas as pd

In [53]:
df = pd.read_csv(r"../Datasets/churn.csv")
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


Pandas offers 4 primary methods to select items: 

1. __Dot notation__ : Select a single column.
1. __loc__   : select based on column names and index names. 
2. __iloc__  : select based on the column number and row number.
2. __iat__   : select one item only based on column and row number.

__Dot Notation__

select one column only as reference

In [54]:
df.state

0       KS
1       OH
2       NJ
3       OH
4       OK
        ..
3328    AZ
3329    WV
3330    RI
3331    CT
3332    TN
Name: state, Length: 3333, dtype: object

In [55]:
type(df.state)

pandas.core.series.Series

The dot notation can't be used for column names that contain a space character.

### .loc example

.loc takes 2 arguments inside the square brackets. One for index names (row names) an another for columns names.

In [56]:
df['state']

0       KS
1       OH
2       NJ
3       OH
4       OK
        ..
3328    AZ
3329    WV
3330    RI
3331    CT
3332    TN
Name: state, Length: 3333, dtype: object

In [57]:
df.loc[:,'state']

0       KS
1       OH
2       NJ
3       OH
4       OK
        ..
3328    AZ
3329    WV
3330    RI
3331    CT
3332    TN
Name: state, Length: 3333, dtype: object

__So what is the difference between dot notation and using `[]`?__

The dot notation is a convenience that allows for column access as an attribute. But if you want to create a new column using dot, it wont work. It silently creates a new attribute without it appearing as a column.

In [58]:
df.state2 = 'a'
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [59]:
df.state2

'a'

But you can create a new column with bracket notation.

In [60]:
# create a new column
df.loc[:, 'state2'] = 'a'
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn,state2
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,16.78,244.7,91,11.01,10.0,3,2.7,1,False,a
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,16.62,254.4,103,11.45,13.7,3,3.7,1,False,a
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,10.3,162.6,104,7.32,12.2,5,3.29,0,False,a
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,5.26,196.9,89,8.86,6.6,7,1.78,2,False,a
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,12.61,186.9,121,8.41,10.1,3,2.73,3,False,a


Alright, if you want to select more than one column at a time, put them all in a list.

In [61]:
# This is Wrong
# df.loc[:, 'account length', 'area code', 'phone number', 'international plan'].head()

In [62]:
df.loc[:, ['account length', 'area code', 'phone number', 'international plan']].head()

Unnamed: 0,account length,area code,phone number,international plan
0,128,415,382-4657,no
1,107,415,371-7191,no
2,137,415,358-1921,no
3,84,408,375-9999,yes
4,75,415,330-6626,yes


If you select contiguous columns, you can use the ':' notation.

In [63]:
df.loc[:,'account length':'international plan'].head()

Unnamed: 0,account length,area code,phone number,international plan
0,128,415,382-4657,no
1,107,415,371-7191,no
2,137,415,358-1921,no
3,84,408,375-9999,yes
4,75,415,330-6626,yes


### .iloc example

## iloc works on indexes of rows and indexes of column

In [64]:
df.iloc[[0,1,2,3,4], [1,2,3,4]]

Unnamed: 0,account length,area code,phone number,international plan
0,128,415,382-4657,no
1,107,415,371-7191,no
2,137,415,358-1921,no
3,84,408,375-9999,yes
4,75,415,330-6626,yes


In [65]:
# Another way
df.iloc[0:5, 1:5]

Unnamed: 0,account length,area code,phone number,international plan
0,128,415,382-4657,no
1,107,415,371-7191,no
2,137,415,358-1921,no
3,84,408,375-9999,yes
4,75,415,330-6626,yes


### at and iat Example

`at` and `iat` provide access to scalar, that is a single element in the dataframe. 

__Advantage:__ It is much faster than doing operations with .loc and .iloc.

In [66]:
# access single element with iat
df.iat[1, 1]

np.int64(107)

In [67]:
# access single element with at
df.at[1, 'account length']

np.int64(107)

### 2. Gain speed using .at and .iat


The main advantage of using .at and .iat is speed. 

Vectorization > .at > ,iat > loc > iloc

# __Filtering on One or More Conditions__

In [68]:
import numpy as np
import pandas as pd

In [69]:
df = pd.read_csv(r'..\\Datasets\\Churn.csv')
df

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.70,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.70,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.30,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.90,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3328,AZ,192,415,414-4276,no,yes,36,156.2,77,26.55,...,126,18.32,279.1,83,12.56,9.9,6,2.67,2,False
3329,WV,68,415,370-3271,no,no,0,231.1,57,39.29,...,55,13.04,191.3,123,8.61,9.6,4,2.59,3,False
3330,RI,28,510,328-8230,no,no,0,180.8,109,30.74,...,58,24.55,191.9,91,8.64,14.1,6,3.81,2,False
3331,CT,184,510,364-6381,yes,no,0,213.8,105,36.35,...,84,13.57,139.2,137,6.26,5.0,10,1.35,2,False


__Row Filter Mask__

In [70]:
row_filter_mask = df['account length'] > 100
row_filter_mask
# type(row_filter_mask) is pandas series

0        True
1        True
2        True
3       False
4       False
        ...  
3328     True
3329    False
3330    False
3331     True
3332    False
Name: account length, Length: 3333, dtype: bool

__Boolean Filter Mask__

In [71]:
column_filter_mask = df.columns.str.startswith('t')
column_filter_mask
# type(column_filter_mask) = numpy.ndarray

array([False, False, False, False, False, False, False,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True, False, False])

In [72]:
df.loc[row_filter_mask, :]

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.70,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.70,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.30,162.6,104,7.32,12.2,5,3.29,0,False
5,AL,118,510,391-8027,yes,no,0,223.4,98,37.98,...,101,18.75,203.9,118,9.18,6.3,6,1.70,0,False
6,MA,121,510,355-9993,no,yes,24,218.2,88,37.09,...,108,29.62,212.6,118,9.57,7.5,7,2.03,3,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3320,GA,122,510,411-5677,yes,no,0,140.0,101,23.80,...,77,16.69,120.1,133,5.40,9.7,4,2.62,4,True
3323,IN,117,415,362-5899,no,no,0,118.4,126,20.13,...,97,21.19,227.0,56,10.22,13.6,3,3.67,5,True
3324,WV,159,415,377-1164,no,no,0,169.8,114,28.87,...,105,16.80,193.7,82,8.72,11.6,4,3.13,1,False
3328,AZ,192,415,414-4276,no,yes,36,156.2,77,26.55,...,126,18.32,279.1,83,12.56,9.9,6,2.67,2,False


In [73]:
df.loc[:,column_filter_mask]

Unnamed: 0,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge
0,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.70
1,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.70
2,243.4,114,41.38,121.2,110,10.30,162.6,104,7.32,12.2,5,3.29
3,299.4,71,50.90,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78
4,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73
...,...,...,...,...,...,...,...,...,...,...,...,...
3328,156.2,77,26.55,215.5,126,18.32,279.1,83,12.56,9.9,6,2.67
3329,231.1,57,39.29,153.4,55,13.04,191.3,123,8.61,9.6,4,2.59
3330,180.8,109,30.74,288.8,58,24.55,191.9,91,8.64,14.1,6,3.81
3331,213.8,105,36.35,159.6,84,13.57,139.2,137,6.26,5.0,10,1.35


In [74]:
df.loc[row_filter_mask, column_filter_mask]

Unnamed: 0,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge
0,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.70
1,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.70
2,243.4,114,41.38,121.2,110,10.30,162.6,104,7.32,12.2,5,3.29
5,223.4,98,37.98,220.6,101,18.75,203.9,118,9.18,6.3,6,1.70
6,218.2,88,37.09,348.5,108,29.62,212.6,118,9.57,7.5,7,2.03
...,...,...,...,...,...,...,...,...,...,...,...,...
3320,140.0,101,23.80,196.4,77,16.69,120.1,133,5.40,9.7,4,2.62
3323,118.4,126,20.13,249.3,97,21.19,227.0,56,10.22,13.6,3,3.67
3324,169.8,114,28.87,197.7,105,16.80,193.7,82,8.72,11.6,4,3.13
3328,156.2,77,26.55,215.5,126,18.32,279.1,83,12.56,9.9,6,2.67


# __Multiple Conditions__

In [75]:
filter1 = df['account length'] > 100
filter2 = df['total night calls'] < 90


In [76]:
#It will print the rows which satisfy one of the conditions
df.loc[(filter1 | filter2),:]

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.70,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.70,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.30,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.90,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
5,AL,118,510,391-8027,yes,no,0,223.4,98,37.98,...,101,18.75,203.9,118,9.18,6.3,6,1.70,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3323,IN,117,415,362-5899,no,no,0,118.4,126,20.13,...,97,21.19,227.0,56,10.22,13.6,3,3.67,5,True
3324,WV,159,415,377-1164,no,no,0,169.8,114,28.87,...,105,16.80,193.7,82,8.72,11.6,4,3.13,1,False
3328,AZ,192,415,414-4276,no,yes,36,156.2,77,26.55,...,126,18.32,279.1,83,12.56,9.9,6,2.67,2,False
3331,CT,184,510,364-6381,yes,no,0,213.8,105,36.35,...,84,13.57,139.2,137,6.26,5.0,10,1.35,2,False


In [77]:
#It will print rows which satisfy both the conditions
df.loc[(filter1 & filter2),:]

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
22,AZ,130,415,358-1958,no,no,0,183.0,112,31.11,...,99,6.20,181.8,78,8.18,9.5,19,2.57,0,False
32,LA,172,408,383-1121,no,no,0,212.0,121,36.04,...,115,2.65,293.3,78,13.20,12.6,10,3.40,3,False
41,MD,135,408,383-6029,yes,yes,41,173.1,85,29.43,...,107,17.33,122.2,78,5.50,14.6,15,3.94,0,True
57,CO,121,408,370-7574,no,yes,30,198.4,129,33.73,...,77,6.40,181.2,77,8.15,5.8,3,1.57,3,True
60,ID,174,408,359-5893,no,no,0,192.1,97,32.66,...,94,14.44,166.6,54,7.50,11.4,4,3.08,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3306,AL,106,408,404-5283,no,yes,29,83.6,131,14.21,...,131,17.33,229.5,73,10.33,8.1,3,2.19,1,False
3307,OK,172,408,398-3632,no,no,0,203.9,109,34.66,...,123,19.89,160.7,65,7.23,17.8,4,4.81,4,False
3323,IN,117,415,362-5899,no,no,0,118.4,126,20.13,...,97,21.19,227.0,56,10.22,13.6,3,3.67,5,True
3324,WV,159,415,377-1164,no,no,0,169.8,114,28.87,...,105,16.80,193.7,82,8.72,11.6,4,3.13,1,False


__applying both multiple row filter and column filter__

In [78]:
df.loc[(filter1 & filter2),column_filter_mask]

Unnamed: 0,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge
22,183.0,112,31.11,72.9,99,6.20,181.8,78,8.18,9.5,19,2.57
32,212.0,121,36.04,31.2,115,2.65,293.3,78,13.20,12.6,10,3.40
41,173.1,85,29.43,203.9,107,17.33,122.2,78,5.50,14.6,15,3.94
57,198.4,129,33.73,75.3,77,6.40,181.2,77,8.15,5.8,3,1.57
60,192.1,97,32.66,169.9,94,14.44,166.6,54,7.50,11.4,4,3.08
...,...,...,...,...,...,...,...,...,...,...,...,...
3306,83.6,131,14.21,203.9,131,17.33,229.5,73,10.33,8.1,3,2.19
3307,203.9,109,34.66,234.0,123,19.89,160.7,65,7.23,17.8,4,4.81
3323,118.4,126,20.13,249.3,97,21.19,227.0,56,10.22,13.6,3,3.67
3324,169.8,114,28.87,197.7,105,16.80,193.7,82,8.72,11.6,4,3.13


In [79]:
filter1 = df['account length'] > 100
filter2 = df['total night calls'] < 90
row_filter = df.loc[(filter1 & filter2),:]
row_filter

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
22,AZ,130,415,358-1958,no,no,0,183.0,112,31.11,...,99,6.20,181.8,78,8.18,9.5,19,2.57,0,False
32,LA,172,408,383-1121,no,no,0,212.0,121,36.04,...,115,2.65,293.3,78,13.20,12.6,10,3.40,3,False
41,MD,135,408,383-6029,yes,yes,41,173.1,85,29.43,...,107,17.33,122.2,78,5.50,14.6,15,3.94,0,True
57,CO,121,408,370-7574,no,yes,30,198.4,129,33.73,...,77,6.40,181.2,77,8.15,5.8,3,1.57,3,True
60,ID,174,408,359-5893,no,no,0,192.1,97,32.66,...,94,14.44,166.6,54,7.50,11.4,4,3.08,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3306,AL,106,408,404-5283,no,yes,29,83.6,131,14.21,...,131,17.33,229.5,73,10.33,8.1,3,2.19,1,False
3307,OK,172,408,398-3632,no,no,0,203.9,109,34.66,...,123,19.89,160.7,65,7.23,17.8,4,4.81,4,False
3323,IN,117,415,362-5899,no,no,0,118.4,126,20.13,...,97,21.19,227.0,56,10.22,13.6,3,3.67,5,True
3324,WV,159,415,377-1164,no,no,0,169.8,114,28.87,...,105,16.80,193.7,82,8.72,11.6,4,3.13,1,False


In [80]:
column_filter_mask1 = df.columns.str.startswith('t')
column_filter_mask2 = df.columns.str.startswith('total')
column_filter = df.loc[:,(column_filter_mask1 | column_filter_mask2)]
column_filter

Unnamed: 0,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge
0,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.70
1,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.70
2,243.4,114,41.38,121.2,110,10.30,162.6,104,7.32,12.2,5,3.29
3,299.4,71,50.90,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78
4,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73
...,...,...,...,...,...,...,...,...,...,...,...,...
3328,156.2,77,26.55,215.5,126,18.32,279.1,83,12.56,9.9,6,2.67
3329,231.1,57,39.29,153.4,55,13.04,191.3,123,8.61,9.6,4,2.59
3330,180.8,109,30.74,288.8,58,24.55,191.9,91,8.64,14.1,6,3.81
3331,213.8,105,36.35,159.6,84,13.57,139.2,137,6.26,5.0,10,1.35


.where() method is also used for filtering it will give output as NaN where condition do not satisfy

In [81]:
#Nan = Not a number (Missing Values)

In [82]:
df.where(filter1 & filter2)

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3328,AZ,192.0,415.0,414-4276,no,yes,36.0,156.2,77.0,26.55,...,126.0,18.32,279.1,83.0,12.56,9.9,6.0,2.67,2.0,False
3329,,,,,,,,,,,...,,,,,,,,,,
3330,,,,,,,,,,,...,,,,,,,,,,
3331,,,,,,,,,,,...,,,,,,,,,,


In [83]:
df.where(filter1 & filter2).fillna(0) #Fill the null values with 0

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.00,...,0.0,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0
1,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.00,...,0.0,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0
2,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.00,...,0.0,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0
3,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.00,...,0.0,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0
4,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.00,...,0.0,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3328,AZ,192.0,415.0,414-4276,no,yes,36.0,156.2,77.0,26.55,...,126.0,18.32,279.1,83.0,12.56,9.9,6.0,2.67,2.0,False
3329,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.00,...,0.0,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0
3330,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.00,...,0.0,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0
3331,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.00,...,0.0,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0


In [84]:
df.where(filter1 & filter2).dropna() #drop all the NaN values

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
22,AZ,130.0,415.0,358-1958,no,no,0.0,183.0,112.0,31.11,...,99.0,6.20,181.8,78.0,8.18,9.5,19.0,2.57,0.0,False
32,LA,172.0,408.0,383-1121,no,no,0.0,212.0,121.0,36.04,...,115.0,2.65,293.3,78.0,13.20,12.6,10.0,3.40,3.0,False
41,MD,135.0,408.0,383-6029,yes,yes,41.0,173.1,85.0,29.43,...,107.0,17.33,122.2,78.0,5.50,14.6,15.0,3.94,0.0,True
57,CO,121.0,408.0,370-7574,no,yes,30.0,198.4,129.0,33.73,...,77.0,6.40,181.2,77.0,8.15,5.8,3.0,1.57,3.0,True
60,ID,174.0,408.0,359-5893,no,no,0.0,192.1,97.0,32.66,...,94.0,14.44,166.6,54.0,7.50,11.4,4.0,3.08,1.0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3306,AL,106.0,408.0,404-5283,no,yes,29.0,83.6,131.0,14.21,...,131.0,17.33,229.5,73.0,10.33,8.1,3.0,2.19,1.0,False
3307,OK,172.0,408.0,398-3632,no,no,0.0,203.9,109.0,34.66,...,123.0,19.89,160.7,65.0,7.23,17.8,4.0,4.81,4.0,False
3323,IN,117.0,415.0,362-5899,no,no,0.0,118.4,126.0,20.13,...,97.0,21.19,227.0,56.0,10.22,13.6,3.0,3.67,5.0,True
3324,WV,159.0,415.0,377-1164,no,no,0.0,169.8,114.0,28.87,...,105.0,16.80,193.7,82.0,8.72,11.6,4.0,3.13,1.0,False


# __MemberShip Filtering__

`isin()`, `between()`,  `~` , `any`, `all`

__isin( )__: True if the value `isin` a given list-like object, else False.

In [85]:
filterr = df.state.isin(['AZ', 'LA'])
df.loc[filterr, :].head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
8,LA,117,408,335-4719,no,no,0,184.5,97,31.37,...,80,29.89,215.8,90,9.71,8.7,4,2.35,1,False
22,AZ,130,415,358-1958,no,no,0,183.0,112,31.11,...,99,6.2,181.8,78,8.18,9.5,19,2.57,0,False
32,LA,172,408,383-1121,no,no,0,212.0,121,36.04,...,115,2.65,293.3,78,13.2,12.6,10,3.4,3,False
33,AZ,12,408,360-1596,no,no,0,249.6,118,42.43,...,119,21.45,280.2,90,12.61,11.8,3,3.19,1,True
91,LA,155,415,334-1275,no,no,0,203.4,100,34.58,...,104,16.23,196.0,119,8.82,8.9,4,2.4,0,True


__~__: True if value is `Not In` a given list.

In [86]:
filterr = ~df['state'].isin(['AZ', 'LA'])
df.loc[filterr, :].head(9)

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False
5,AL,118,510,391-8027,yes,no,0,223.4,98,37.98,...,101,18.75,203.9,118,9.18,6.3,6,1.7,0,False
6,MA,121,510,355-9993,no,yes,24,218.2,88,37.09,...,108,29.62,212.6,118,9.57,7.5,7,2.03,3,False
7,MO,147,415,329-9001,yes,no,0,157.0,79,26.69,...,94,8.76,211.8,96,9.53,7.1,6,1.92,0,False
9,WV,141,415,330-8173,yes,yes,37,258.6,84,43.96,...,111,18.87,326.4,97,14.69,11.2,5,3.02,0,False


__between( ):__ True if the value is in a given range, else False.

In [87]:
filterr = df['total day minutes'].between(100, 120)
df.loc[filterr, :].head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
23,SC,111,415,350-2565,no,no,0,110.4,103,18.77,...,102,11.67,189.6,105,8.53,7.7,6,2.08,2,False
29,HI,49,510,410-7789,no,no,0,119.3,117,20.28,...,109,18.28,178.7,90,8.04,11.1,1,3.0,1,False
107,NM,93,510,383-4361,no,yes,21,117.9,131,20.04,...,115,13.98,217.0,86,9.76,9.8,3,2.65,1,False
118,MO,112,510,409-1244,no,yes,36,113.7,117,19.33,...,82,13.39,177.6,118,7.99,10.0,3,2.7,2,False
151,NE,117,415,354-3436,no,no,0,102.8,119,17.48,...,91,17.57,299.0,105,13.46,10.1,7,2.73,1,False


In [88]:
## Between is concise method

In [89]:
filter = df['total day minutes'] > 100
filter2 = df['total day minutes'] < 120
df.loc[filter & filter2, :].head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
23,SC,111,415,350-2565,no,no,0,110.4,103,18.77,...,102,11.67,189.6,105,8.53,7.7,6,2.08,2,False
29,HI,49,510,410-7789,no,no,0,119.3,117,20.28,...,109,18.28,178.7,90,8.04,11.1,1,3.0,1,False
107,NM,93,510,383-4361,no,yes,21,117.9,131,20.04,...,115,13.98,217.0,86,9.76,9.8,3,2.65,1,False
118,MO,112,510,409-1244,no,yes,36,113.7,117,19.33,...,82,13.39,177.6,118,7.99,10.0,3,2.7,2,False
151,NE,117,415,354-3436,no,no,0,102.8,119,17.48,...,91,17.57,299.0,105,13.46,10.1,7,2.73,1,False


`any()` and `all()` are boolean methods that return `True` whenever the value evaluates to `True` for `any` or `all` the values in the column or row.

__any()__: True if any of the items satisfies condition

1. `True` and Non-zero value evaluates to `True`. 
2. `False` and Zero evaluates to `False`. 

__Example:__ 

A lab test is conducted on 12 individuals. Two samples are taken from each person. The test result is +ve if:
1. Any of the samples is +ve
2. All of the sample are +ve.

In [90]:
df = pd.DataFrame(np.random.randint(0, 2, (2,12)), 
                  columns=["id"+str(i) for i in range(12)],
                  index=['sample1', 'sample2'])
df.head(10)

Unnamed: 0,id0,id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,id11
sample1,0,1,0,1,0,1,0,1,0,1,0,0
sample2,1,0,0,0,0,1,1,1,0,0,0,1


In [91]:
df.any()

id0      True
id1      True
id2     False
id3      True
id4     False
id5      True
id6      True
id7      True
id8     False
id9      True
id10    False
id11     True
dtype: bool

In [92]:
df.any(axis='columns')

sample1    True
sample2    True
dtype: bool

By default, the `axis='rows'` => returns result for every column.

Set `axis='columns'` to return result for every row.

Similar logic applies for `all()`.

__all()__: True only if all of the items satisty the condition./

In [93]:
df.all()

id0     False
id1     False
id2     False
id3     False
id4     False
id5      True
id6     False
id7      True
id8     False
id9     False
id10    False
id11    False
dtype: bool

In [94]:
df.all(axis='columns') #Here we get both false because we can see it is not all true in above one

sample1    False
sample2    False
dtype: bool