### Accessing rows in a dataframe using the DataFrame indexer objects  .loc, .iloc and how it differentiates itself from using a boolean mask.



# Accessing a DataFrame with a boolean index

In [1]:
import pandas as pd
import numpy as np


In [2]:
df = pd.DataFrame({"color": ['red', 'blue', 'red', 'blue']},
 index=[True, False, True, False])

df

Unnamed: 0,color
True,red
False,blue
True,red
False,blue


we can access using .loc function which were True

In [3]:
df.loc[True]

Unnamed: 0,color
True,red
True,red


accessing values using row numbers 

In [11]:
df.iloc[1]

color    blue
dtype: object

In [7]:
df.iloc[2]

color    red
Name: True, dtype: object

In [8]:
df.iloc[3]

color    blue
dtype: object

In [9]:
df.iloc[0]

color    red
Name: True, dtype: object

In [21]:
df = pd.DataFrame([['red','rose','big'],['blue','violet','big'],['red','tulip','small'],['blue','harebell','small']],columns=['color','name','size'])

Using the magic "__getitem__" or [] accessor. Giving it a list of True and False of the same length as
the dataframe will give you:

In [22]:
df[[True,False,True,False]]

Unnamed: 0,color,name,size
0,red,rose,big
2,red,tulip,small


Accessing a single column from a data frame, we can use a simple comparison == to compare
every element in the column to the given variable, producing a pd.Series of True and False

In [23]:
df['size'] == 'small'


0    False
1    False
2     True
3     True
Name: size, dtype: bool

This pd.Series is an extension of an np.array which is an extension of a simple list, Thus we can
hand this to the __getitem__ or [] accessor as in the above example.

In [26]:
size_small_mask = df['size'] == 'small'

df[size_small_mask]

Unnamed: 0,color,name,size
2,red,tulip,small
3,blue,harebell,small


In [48]:
df_new = pd.DataFrame({'name':['rose','violet','tulip','harebell'],
                   'color':['red','blue','red','blue'],
                   'size':['big','small','small','small']})


In [49]:
df_new.set_index('name',drop=True,inplace=True)
df_new

Unnamed: 0_level_0,color,size
name,Unnamed: 1_level_1,Unnamed: 2_level_1
rose,red,big
violet,blue,small
tulip,red,small
harebell,blue,small


We can create a mask based on the index values, just like on a column value.

In [50]:
rose_mask = df_new.index =='rose'

df_new[rose_mask]

Unnamed: 0_level_0,color,size
name,Unnamed: 1_level_1,Unnamed: 2_level_1
rose,red,big


In [51]:
df_new.loc['rose']

color    red
size     big
Name: rose, dtype: object

### Note:: The important difference being, when .loc only encounters one row in the index that matches, it will return a pd.Series, if it encounters more rows that matches, it will return a pd.DataFrame. This makes this method rather unstable.


This behavior can be controlled by giving the .loc a list of a single entry. This will force it to return
a data frame.

In [54]:
type(df_new.loc['rose'])

pandas.core.series.Series

In [53]:
df_new.loc[['rose']]

Unnamed: 0_level_0,color,size
name,Unnamed: 1_level_1,Unnamed: 2_level_1
rose,red,big


In [55]:
type(df_new.loc[['rose']])

pandas.core.frame.DataFrame