## Indexing DataFrames

First we create a DataFrame

In [76]:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 3), 
                  index = ['a' + str(i) for i in range(10)], 
                  columns = ['X', 'Y', 'Z'])
df

Unnamed: 0,X,Y,Z
a0,0.392927,-2.100382,-1.215565
a1,-0.107654,0.32574,-0.049682
a2,-0.594777,-0.161098,0.174579
a3,-0.019471,-0.50776,-0.684097
a4,0.021074,1.800023,0.055341
a5,0.07257,0.008762,-0.512995
a6,-0.644128,-1.522256,-0.215864
a7,0.019807,-1.026799,-0.44736
a8,1.344719,-1.412707,0.249155
a9,0.614135,-1.039814,0.745657


In [78]:
type(df[['X']])

pandas.core.frame.DataFrame

In [85]:
a = np.eye(2,3)
a.resize(3,2)

### Using loc indexer

A specific element can be Selected using loc indexer. This method is called <font color = 'blue'>_labelled indexing_</font>.

In [86]:
x = pd.Series([1,2,"2"])

In [89]:
print(x)
x[0] = "1"

0    1
1    2
2    2
dtype: object


In [90]:
print(x)

0    1
1    2
2    2
dtype: object


In [4]:
df.loc['a3', 'Y']

-1.2087244823092782

Note that we have used labels for indexing the row and the column.

### Using iloc indexer

This alternative method is called <font color = 'blue'>_positional indexing_</font>. In this method row and column is indexed using their positions.

In [5]:
df.iloc[3, 1]

-1.2087244823092782

### Selecting a column 

A column of DataFrame can be accessed using the index of required column

In [16]:
df.columns = ["X", 'Y', 'Z']

In [17]:
df['X']

a0   -0.546055
a1    0.322362
a2   -0.209497
a3    1.113290
a4    1.255462
a5    0.987666
a6   -0.530425
a7    0.425529
a8   -0.356411
a9    0.772900
Name: X, dtype: float64

Alternatively, index of a column is also available as attribute of the DataFrame.

In [18]:
df.X

a0   -0.546055
a1    0.322362
a2   -0.209497
a3    1.113290
a4    1.255462
a5    0.987666
a6   -0.530425
a7    0.425529
a8   -0.356411
a9    0.772900
Name: X, dtype: float64

In both the cases, the result is a Series object containing the data of the indexed column. The name attribute of the resultant Series is the label of the indexed column.

### Slicing the rows

A slice of rows can be selected using regular slicing notation as shown below.

In [19]:
df[1:4]

Unnamed: 0,X,Y,Z
a1,0.322362,0.607224,1.253364
a2,-0.209497,2.219572,-0.503345
a3,1.11329,-1.208724,0.010915


In [27]:
df['a3']  # Doesn't select column 1 as 1 is not a column index

KeyError: 'a3'

In [28]:
df.iloc[1:4]

Unnamed: 0,X,Y,Z
a1,0.322362,0.607224,1.253364
a2,-0.209497,2.219572,-0.503345
a3,1.11329,-1.208724,0.010915


### Slicing using index

Another way of accessing a row (or slice of rows) is using labeled indexing.

In [29]:
df.loc['a1':'a4']

Unnamed: 0,X,Y,Z
a1,0.322362,0.607224,1.253364
a2,-0.209497,2.219572,-0.503345
a3,1.11329,-1.208724,0.010915
a4,1.255462,-0.920016,-0.053315


**Question:** Did you notice the difference in the results?


A single row (or slice of rows) can be accessed positional indexing.

In [30]:
df.iloc[1]

X    0.322362
Y    0.607224
Z    1.253364
Name: a1, dtype: float64

In [31]:
# dataframe_name.iloc[row, col]
# dataframe_name.loc[row, col]
df.iloc[1, :]

X    0.322362
Y    0.607224
Z    1.253364
Name: a1, dtype: float64

In [32]:
df.loc['a1']

X    0.322362
Y    0.607224
Z    1.253364
Name: a1, dtype: float64

In [54]:
df.loc['a1':"a4",:]

Unnamed: 0,X,Y,Z
a1,0.322362,0.607224,1.253364
a2,-0.209497,2.219572,-0.503345
a3,1.11329,-1.208724,0.010915
a4,1.255462,-0.920016,-0.053315


In [56]:
df.iloc[:,1]

a0    0.611003
a1    0.607224
a2    2.219572
a3   -1.208724
a4   -0.920016
a5    1.679077
a6    0.517255
a7    1.501272
a8   -1.536688
a9   -0.437823
Name: Y, dtype: float64

In [57]:
df.loc[:,'Y']

a0    0.611003
a1    0.607224
a2    2.219572
a3   -1.208724
a4   -0.920016
a5    1.679077
a6    0.517255
a7    1.501272
a8   -1.536688
a9   -0.437823
Name: Y, dtype: float64

In [59]:
df.iloc[:,1:]

Unnamed: 0,Y,Z
a0,0.611003,0.388817
a1,0.607224,1.253364
a2,2.219572,-0.503345
a3,-1.208724,0.010915
a4,-0.920016,-0.053315
a5,1.679077,-1.289337
a6,0.517255,0.55017
a7,1.501272,2.115656
a8,-1.536688,-0.178436
a9,-0.437823,1.362477


## Fancy indexing

### Selecting multiple columns

Multiple columns can be selected by using the list of columns as index.

In [60]:
df[['X']]

Unnamed: 0,X
a0,-0.546055
a1,0.322362
a2,-0.209497
a3,1.11329
a4,1.255462
a5,0.987666
a6,-0.530425
a7,0.425529
a8,-0.356411
a9,0.7729


Note how the output differs from that produced by df['X']

In [61]:
df[['X', 'Z']]

Unnamed: 0,X,Z
a0,-0.546055,0.388817
a1,0.322362,1.253364
a2,-0.209497,-0.503345
a3,1.11329,0.010915
a4,1.255462,-0.053315
a5,0.987666,-1.289337
a6,-0.530425,0.55017
a7,0.425529,2.115656
a8,-0.356411,-0.178436
a9,0.7729,1.362477


In [69]:
df.loc[:,['X', 'Z']]

Unnamed: 0,X,Z
a0,-0.546055,0.388817
a1,0.322362,1.253364
a2,-0.209497,-0.503345
a3,1.11329,0.010915
a4,1.255462,-0.053315
a5,0.987666,-1.289337
a6,-0.530425,0.55017
a7,0.425529,2.115656
a8,-0.356411,-0.178436
a9,0.7729,1.362477


df[index] selects columns

df[slice] selects rows
df[bool Series] selects rows

### Selecting multiple rows and columns

In [71]:
df.iloc[[1, 3, 7], [0, 2]]

Unnamed: 0,X,Z
a1,0.322362,1.253364
a3,1.11329,0.010915
a7,0.425529,2.115656


Note how this type of indexing produces different result than that in case of numpy.

## Using Boolean index

To select only speciic rows satisfying a criterion, we can use boolean index as

In [72]:
df.X

a0   -0.546055
a1    0.322362
a2   -0.209497
a3    1.113290
a4    1.255462
a5    0.987666
a6   -0.530425
a7    0.425529
a8   -0.356411
a9    0.772900
Name: X, dtype: float64

In [73]:
df.X >0

a0    False
a1     True
a2    False
a3     True
a4     True
a5     True
a6    False
a7     True
a8    False
a9     True
Name: X, dtype: bool

In [74]:
df[df.X > 0]  # When bool series is used an an index, the rows corresponding to True value are selected

Unnamed: 0,X,Y,Z
a1,0.322362,0.607224,1.253364
a3,1.11329,-1.208724,0.010915
a4,1.255462,-0.920016,-0.053315
a5,0.987666,1.679077,-1.289337
a7,0.425529,1.501272,2.115656
a9,0.7729,-0.437823,1.362477


In [75]:
cindex = [True, False, True]
df.loc[:,cindex]   # also woerks with iloc indexer

Unnamed: 0,X,Z
a0,-0.546055,0.388817
a1,0.322362,1.253364
a2,-0.209497,-0.503345
a3,1.11329,0.010915
a4,1.255462,-0.053315
a5,0.987666,-1.289337
a6,-0.530425,0.55017
a7,0.425529,2.115656
a8,-0.356411,-0.178436
a9,0.7729,1.362477
