## Hierarchical Indexing
Hierarchical index is a multilevel index. Such an index object can be used either for indexing rows (index) or columns or both. Hierarchical indexes provide a convenient way to store arbitrarily high dimensional data.

Following code creates a Series with two-level index.


In [1]:
import pandas as pd
import numpy as np
arrays = [['a']*3 + ['b']*3,[1, 2, 3]*2]
arrays

[['a', 'a', 'a', 'b', 'b', 'b'], [1, 2, 3, 1, 2, 3]]

#### Creating `MultiIndex` from arrays
Next we create a `MultiIndex` object using `from_arrays` function.

In [2]:
mindex = pd.MultiIndex.from_arrays(arrays)
mindex

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 2),
            ('b', 3)],
           )

Now we construct a Series using the `MultiIndex` object as index.

In [3]:
    x = pd.Series(np.random.rand(6), index = mindex)
x

a  1    0.960336
   2    0.519694
   3    0.669085
b  1    0.602780
   2    0.656330
   3    0.785381
dtype: float64

A `MultiIndex` is automatically created when a list of arrays is supplied as index.

In [4]:
pd.Series(np.random.randn(6), index = arrays)

a  1    0.941434
   2    0.688647
   3    0.843778
b  1   -0.196930
   2   -0.303005
   3   -1.151707
dtype: float64

#### Creating `MultiIndex` from tuples

In [5]:
tuples = [('UG', 'Male'), ('UG', 'Female'), ('PG', 'Male'), ('PG', 'Female')]
faculties = ['Science', 'Commerce', 'Arts']
mindex = pd.MultiIndex.from_tuples(tuples)
StudentCount = pd.DataFrame([[1300, 1600, 180, 225], 
                             [3400, 3850, 310, 340], 
                             [745, 912, 320, 390]],
                             index = faculties,
                             columns = mindex)
StudentCount

Unnamed: 0_level_0,UG,UG,PG,PG
Unnamed: 0_level_1,Male,Female,Male,Female
Science,1300,1600,180,225
Commerce,3400,3850,310,340
Arts,745,912,320,390


If tuples are directly used as index, they are used as index labels, instead of creating a hierarchical index.

In [6]:
tuples = [('UG', 'Male'), ('UG', 'Female'), ('PG', 'Male'), ('PG', 'Female')]
faculties = ['Science', 'Commerce', 'Arts']
StudentCount = pd.DataFrame([[1300, 1600, 180, 225], 
                             [3400, 3850, 310, 340], 
                             [745, 912, 320, 390]],
                             index = faculties,
                             columns = tuples)
StudentCount

Unnamed: 0,"(UG, Male)","(UG, Female)","(PG, Male)","(PG, Female)"
Science,1300,1600,180,225
Commerce,3400,3850,310,340
Arts,745,912,320,390


#### Creating `MultiIndex` from `DataFrame`

In [7]:
idf = pd.DataFrame(
         [('UG', 'Male'), ('UG', 'Female'), ('PG', 'Male'), ('PG', 'Female')]
      )
faculties = ['Science', 'Commerce', 'Arts']
mindex = pd.MultiIndex.from_frame(idf, names = ['Program Type', 'Gender'])
SCount = pd.DataFrame([[1300, 1600, 180, 225], 
                             [3400, 3850, 310, 340], 
                             [745, 912, 320, 390]],
                             index = faculties,
                             columns = mindex)
SCount.index.name = 'Faculty'
SCount

Program Type,UG,UG,PG,PG
Gender,Male,Female,Male,Female
Faculty,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Science,1300,1600,180,225
Commerce,3400,3850,310,340
Arts,745,912,320,390


#### Creating `MultiIndex` from cartesian product

In [8]:
arrays = [['UG', 'PG'], ['Male','Female']]
faculties = ['Science', 'Commerce', 'Arts']
mindex = pd.MultiIndex.from_product(arrays)
SCnt = pd.DataFrame([[1300, 1600, 180, 225], 
                             [3400, 3850, 310, 340], 
                             [745, 912, 320, 390]],
                             index = faculties,
                             columns = mindex)
SCnt

Unnamed: 0_level_0,UG,UG,PG,PG
Unnamed: 0_level_1,Male,Female,Male,Female
Science,1300,1600,180,225
Commerce,3400,3850,310,340
Arts,745,912,320,390


Following code creates a DataFrame with hierarchical index for rows.

In [9]:
A = pd.DataFrame(np.random.rand(6, 2), 
              index = pd.MultiIndex.from_product([['a', 'b'], [1, 2, 3]]),
              columns = ['x', 'y'])
A

Unnamed: 0,Unnamed: 1,x,y
a,1,0.696317,0.631555
a,2,0.892435,0.039625
a,3,0.139296,0.344128
b,1,0.855028,0.348601
b,2,0.802479,0.368763
b,3,0.535442,0.966694


Following code creates a DataFrame with hierarchical index for columns.

In [10]:
B = A.T
B

Unnamed: 0_level_0,a,a,a,b,b,b
Unnamed: 0_level_1,1,2,3,1,2,3
x,0.696317,0.892435,0.139296,0.855028,0.802479,0.535442
y,0.631555,0.039625,0.344128,0.348601,0.368763,0.966694


### Indexing with hierarchical index
#### Basic indexing

**(a) Indexing Series**

  Single index selects the outer level

In [11]:
x

a  1    0.960336
   2    0.519694
   3    0.669085
b  1    0.602780
   2    0.656330
   3    0.785381
dtype: float64

In [12]:
x['a']

1    0.960336
2    0.519694
3    0.669085
dtype: float64

Inner level can be specified as second index.

In [13]:
x['a', 2]

0.5196935678248816

Alternatively we can use tuple.

In [14]:
x[('a', 2)]

0.5196935678248816

In [23]:
x['a'][2]

0.5196935678248816

Inner level can be selected as shown below

In [15]:
x[:,2]

a    0.519694
b    0.656330
dtype: float64

**(b) Indexing a DataFrame**

Direct indexing selects a column/ outermost level on column

In [16]:
A

Unnamed: 0,Unnamed: 1,x,y
a,1,0.696317,0.631555
a,2,0.892435,0.039625
a,3,0.139296,0.344128
b,1,0.855028,0.348601
b,2,0.802479,0.368763
b,3,0.535442,0.966694


In [17]:
A['x']

a  1    0.696317
   2    0.892435
   3    0.139296
b  1    0.855028
   2    0.802479
   3    0.535442
Name: x, dtype: float64

In [18]:
A.loc['a']

Unnamed: 0,x,y
1,0.696317,0.631555
2,0.892435,0.039625
3,0.139296,0.344128


In [19]:
A.loc['a', 1]

x    0.696317
y    0.631555
Name: (a, 1), dtype: float64

In [24]:
A.loc[('a',1)]

x    0.696317
y    0.631555
Name: (a, 1), dtype: float64

In [25]:
B

Unnamed: 0_level_0,a,a,a,b,b,b
Unnamed: 0_level_1,1,2,3,1,2,3
x,0.696317,0.892435,0.139296,0.855028,0.802479,0.535442
y,0.631555,0.039625,0.344128,0.348601,0.368763,0.966694


Indexing a DataFrame with MultiIndex columns is similar.

In [20]:
B['a']

Unnamed: 0,1,2,3
x,0.696317,0.892435,0.139296
y,0.631555,0.039625,0.344128


Second index selects inner levels on column.

In [21]:
B['a', 1]

x    0.696317
y    0.631555
Name: (a, 1), dtype: float64

In [26]:
B[('a', 1)]

x    0.696317
y    0.631555
Name: (a, 1), dtype: float64

However, unlike in case of Series, the following code doesn't work.

In [22]:
B[:, 1] # This does not work

TypeError: unhashable type: 'slice'

#### Indexing using `xs` method
The xs method of DataFrame additionally takes a level argument to make selecting data at a particular level of a MultiIndex easier.

In [27]:
x.xs(2, level = 1)

a    0.519694
b    0.656330
dtype: float64

Above code has given result similar to the following code for series.

In [28]:
x[:,2]

a    0.519694
b    0.656330
dtype: float64

We have seen that above syntax doesn't work for DaraFrames. However, xs method is useful.

In [29]:
print(A)
A.xs('a', level = 0)

            x         y
a 1  0.696317  0.631555
  2  0.892435  0.039625
  3  0.139296  0.344128
b 1  0.855028  0.348601
  2  0.802479  0.368763
  3  0.535442  0.966694


Unnamed: 0,x,y
1,0.696317,0.631555
2,0.892435,0.039625
3,0.139296,0.344128


In case of DataFrame with hierarchical column index, we need to supply axis argument.

In [30]:
print(B)  # printed for ready reference
B.xs(2, level = 1, axis = 1)

          a                             b                    
          1         2         3         1         2         3
x  0.696317  0.892435  0.139296  0.855028  0.802479  0.535442
y  0.631555  0.039625  0.344128  0.348601  0.368763  0.966694


Unnamed: 0,a,b
x,0.892435,0.802479
y,0.039625,0.368763


#### Using Slicer
Earlier we have seen that tuple can be used as an index to select a value corresponding an inner level. However, tuple syntax does not allow slicing.

To perform slicing using tuples, we can use a **slicer** as shown below.

In [31]:
print(x)    # printed for ready reference
x.loc[(slice(None), 2)]

a  1    0.960336
   2    0.519694
   3    0.669085
b  1    0.602780
   2    0.656330
   3    0.785381
dtype: float64


a    0.519694
b    0.656330
dtype: float64

In [32]:
x.loc[('a', slice(1,2))] # The slicing achieved here is similar to 1:3

a  1    0.960336
   2    0.519694
dtype: float64

** It is important to note that slicer performs inclusive slicing.**

In [34]:
print(B)
B.loc[:,(slice(None), 1)]

          a                             b                    
          1         2         3         1         2         3
x  0.696317  0.892435  0.139296  0.855028  0.802479  0.535442
y  0.631555  0.039625  0.344128  0.348601  0.368763  0.966694


Unnamed: 0_level_0,a,b
Unnamed: 0_level_1,1,1
x,0.696317,0.855028
y,0.631555,0.348601


In [35]:
A

Unnamed: 0,Unnamed: 1,x,y
a,1,0.696317,0.631555
a,2,0.892435,0.039625
a,3,0.139296,0.344128
b,1,0.855028,0.348601
b,2,0.802479,0.368763
b,3,0.535442,0.966694


In [37]:
A.loc[('a', slice(1,2)),:]

Unnamed: 0,Unnamed: 1,x,y
a,1,0.696317,0.631555
a,2,0.892435,0.039625
