## Indexing Pandas DataFrames
Pandas supports three types of indexing.  
**(a)	Direct Indexing**  
Direct indexing, i.e. indexing with `[]` selects the columns using labels. Allowable index values are  
* A single label
* A list of labels  
Slicing using direct indexing slices the rows and not the columns. This is so, as to provide convenience since ‘slicing of rows’ is a very common operation.

**(b)	Using loc indexer**  
.loc indexer primarily supports label-based indexing, but it also supports indexing with a boolean array. Allowable index values are  
* A single label
* A list (or array) of labels 
* A slice using labels
* A Boolean array
* An expression with any of above as values

**(c)	Using iloc indexer**  
.iloc indexer primarily supports position-based indexing, but it also supports indexing with a boolean array. Allowable index values are  
* An integer
* A list (or array) of integers 
* A slice using integers
* A Boolean array
* An expression with any of above as values


## Hierarchical Indexing
Hierarchical index is a multilevel index. Such an index object can be used either for indexing rows (index) or columns or both. Hierarchical indexes provide a convenient way to store arbitrarily higher dimensional data.

Following code creates a Series with two-level index.


In [1]:
id = [['a']*3 + ['b']*3,[1, 2, 3]*2]
id

[['a', 'a', 'a', 'b', 'b', 'b'], [1, 2, 3, 1, 2, 3]]

In [2]:
from pandas import Series, DataFrame
import numpy as np
x = Series(np.random.rand(6), index = id)
x

a  1    0.619333
   2    0.875343
   3    0.217505
b  1    0.680226
   2    0.478881
   3    0.640077
dtype: float64

Following code creates a DataFrame with hierarchical index for rows.

In [8]:
A = DataFrame(np.random.rand(6, 2), index = id, columns = ['x', 'y'])
A

Unnamed: 0,Unnamed: 1,x,y
a,1,0.350943,0.359993
a,2,0.400509,0.047719
a,3,0.008612,0.339622
b,1,0.061692,0.554118
b,2,0.147604,0.021106
b,3,0.34168,0.427537


Following code creates a DataFrame with hierarchical index for columns.

In [None]:
import numpy as np


In [6]:
B = DataFrame(np.random.rand(2, 6), index = ['x', 'y'], columns = id)
B

Unnamed: 0_level_0,a,a,a,b,b,b
Unnamed: 0_level_1,1,2,3,1,2,3
x,0.033303,0.70482,0.366361,0.738885,0.258468,0.103365
y,0.599884,0.07677,0.231075,0.897133,0.330559,0.810639


### Indexing with hierarchical index
#### Basic indexing

**(a) Indexing Series**

  Single index selects the outer level

In [5]:
x

a  1    0.727109
   2    0.578448
   3    0.636683
b  1    0.280511
   2    0.789969
   3    0.555514
dtype: float64

In [6]:
x['a']

1    0.727109
2    0.578448
3    0.636683
dtype: float64

Inner level can be specified as second index.

In [7]:
x['a', 2]

0.5784476821645574

Alternatively we can use tuple.

In [8]:
x[('a', 2)]

0.5784476821645574

Inner level can be selected as shown below

In [9]:
x

a  1    0.619333
   2    0.875343
   3    0.217505
b  1    0.680226
   2    0.478881
   3    0.640077
dtype: float64

In [10]:
x[:,2]

a    0.875343
b    0.478881
dtype: float64

**(b) Indexing a DataFrame**

Direct indexing selects a column/ outermost level on column

In [11]:
A

Unnamed: 0,Unnamed: 1,x,y
a,1,0.350943,0.359993
a,2,0.400509,0.047719
a,3,0.008612,0.339622
b,1,0.061692,0.554118
b,2,0.147604,0.021106
b,3,0.34168,0.427537


In [12]:
A['x']

a  1    0.350943
   2    0.400509
   3    0.008612
b  1    0.061692
   2    0.147604
   3    0.341680
Name: x, dtype: float64

In [69]:
print(B)
B.xs(1, level = 1,axis = 1)

          a                            b                    
          1        2         3         1         2         3
x  0.033303  0.70482  0.366361  0.738885  0.258468  0.103365
y  0.599884  0.07677  0.231075  0.897133  0.330559  0.810639


Unnamed: 0,a,b
x,0.033303,0.738885
y,0.599884,0.897133


Similarly,

In [14]:
B['a']

Unnamed: 0,1,2,3
x,0.033303,0.70482,0.366361
y,0.599884,0.07677,0.231075


Second index selects inner levels on column.

In [21]:
B['a', 1]

x    0.033303
y    0.599884
Name: (a, 1), dtype: float64

However, unlike in case of Series, the following code doesn't work.

In [23]:
B[:, 1] # This does not work

TypeError: unhashable type: 'slice'

#### Indexing using cross section
The xs method of DataFrame additionally takes a level argument to make selecting data at a particular level of a MultiIndex easier.

In [26]:
x

a  1    0.619333
   2    0.875343
   3    0.217505
b  1    0.680226
   2    0.478881
   3    0.640077
dtype: float64

In [None]:
x

In [51]:
x.xs(1, level = 1)

a    0.619333
b    0.680226
dtype: float64

Above code has given result similar to the following code for series.

In [28]:
x[:,2]

a    0.875343
b    0.478881
dtype: float64

We have seen that above syntax doesn't work for DaraFrames. However, xs method is useful.

In [36]:
print(A)
A.xs('a', level = 0)

            x         y
a 1  0.350943  0.359993
  2  0.400509  0.047719
  3  0.008612  0.339622
b 1  0.061692  0.554118
  2  0.147604  0.021106
  3  0.341680  0.427537


Unnamed: 0,x,y
1,0.350943,0.359993
2,0.400509,0.047719
3,0.008612,0.339622


In case of DataFrame with hierarchical index used for columns, we need to supply axis argument.

In [20]:
print(B)  # printed for ready reference
B.xs(2, level = 1, axis = 1)

          a                             b                    
          1         2         3         1         2         3
x  0.242076  0.112891  0.552287  0.898636  0.655094  0.943601
y  0.104244  0.386338  0.001799  0.873708  0.087655  0.264081


Unnamed: 0,a,b
x,0.112891,0.655094
y,0.386338,0.087655


#### Using Slicer
Earlier we have seen that tuple can be used as an index to select a value corresponding an inner level. However, tuple syntax does not allow slicing.

To perform slicing using tuples, we can use a **slicer** as shown below.

In [21]:
print(x)    # printed for ready reference
x.loc[(slice(None), 2)]

a  1    0.727109
   2    0.578448
   3    0.636683
b  1    0.280511
   2    0.789969
   3    0.555514
dtype: float64


a    0.578448
b    0.789969
dtype: float64

In [22]:
x.loc[('a', slice(1,2))] # The slicing achieved here is similar to 1:3

a  1    0.727109
   2    0.578448
dtype: float64

** It is important to note that slicer performs inclusive slicing.**

In [23]:
B.loc[:,(slice(None), 1)]

Unnamed: 0_level_0,a,b
Unnamed: 0_level_1,1,1
x,0.242076,0.898636
y,0.104244,0.873708


In [24]:
import pandas as pd
pd.read_excel?