# 1. Hierarchical Indexing

- Multi-Indexing: Commonly used to handle higher-dimensional data in Pandas by incorporating multiple index levels within Series and DataFrames.

- Advantages: Compact representation of complex data, simplifies indexing, slicing, and computations.

- Alternatives: Pandas offers Panel and Panel4D for 3D and 4D data, but these are less commonly used.

- Key Operations: Learn to create MultiIndex objects, work with multi-indexed data, and convert between simple and hierarchical indexes.

- Multi-indexing provides a powerful way to manage and manipulate higher-dimensional data within the familiar Pandas framework.

# 2. A Multiply Indexed Series

## 2.1 Representation of 2D Data in 1D Series

- Let's say we want to store a 2D data in 1D Series.
- We want to store Statewise population for years 2000 and 2010.

**Bad way:** 
- Define index as list of tuples.



In [137]:
import pandas as pd 
import numpy as np

In [138]:
index = [('California', 2000), ('California', 2010),
         ('New York', 2000), ('New York', 2010),
         ('Texas', 2000), ('Texas', 2010)]

index

[('California', 2000),
 ('California', 2010),
 ('New York', 2000),
 ('New York', 2010),
 ('Texas', 2000),
 ('Texas', 2010)]

In [139]:
populations = [33871648, 37253956, 18976457, 19378102, 20851820, 25145561]
populations

[33871648, 37253956, 18976457, 19378102, 20851820, 25145561]

In [140]:
pop = pd.Series(populations, index=index)
pop

(California, 2000)    33871648
(California, 2010)    37253956
(New York, 2000)      18976457
(New York, 2010)      19378102
(Texas, 2000)         20851820
(Texas, 2010)         25145561
dtype: int64

- We have stored a 2D data in 1D Series.
- What if we want to access the population of states only for year 2000?
- Here, we will need to use generators.

In [141]:
pop[[i for i in pop.index if i[1]==2000]]  
     

(California, 2000)    33871648
(New York, 2000)      18976457
(Texas, 2000)         20851820
dtype: int64

- The bad approach worked but not efficient like pandas slicing systax.
- For large datasets not suitable.

**The better way:**
- Pandas Multiindex

In [142]:
# We defined index as tuples
index  

[('California', 2000),
 ('California', 2010),
 ('New York', 2000),
 ('New York', 2010),
 ('Texas', 2000),
 ('Texas', 2010)]

In [143]:

# Use defined tuple list to create multiindex
index_multi = pd.MultiIndex.from_tuples(index)
index_multi

MultiIndex([('California', 2000),
            ('California', 2010),
            (  'New York', 2000),
            (  'New York', 2010),
            (     'Texas', 2000),
            (     'Texas', 2010)],
           )

In [144]:
# Redefine series with this multiindex
pop_multi = pd.Series(populations, index=index_multi)
pop_multi   

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

- This represents our data in hierarchical form.

In [145]:
# Alternative way to reindex series
pop_multi = pop.reindex(index_multi)
pop_multi

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

**Access data from Multiindex Series**

In [146]:
# Extract year 2000 population only
pop_multi

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

In [147]:
pop_multi[:, 2000]  # all states only year 2000

California    33871648
New York      18976457
Texas         20851820
dtype: int64

In [148]:
pop_multi['California', 2000]   # year 2000 state california

np.int64(33871648)

## 2.2 MultiIndex as extra dimension
- Becomes dataframe with multiple columns.
- We can even use all unfuncs.
- Let's add one more column of under 18 population in our pop_multi data.


In [149]:
pop_multi

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

In [150]:
pop_df = pd.DataFrame({
    'total': pop_multi,
    'under18': [9267089, 9284094, 4687374, 4318033, 5906301, 6879014],
})

pop_df

Unnamed: 0,Unnamed: 1,total,under18
California,2000,33871648,9267089
California,2010,37253956,9284094
New York,2000,18976457,4687374
New York,2010,19378102,4318033
Texas,2000,20851820,5906301
Texas,2010,25145561,6879014


In [151]:
fraction_u18 = pop_df['under18']/pop_df['total']
fraction_u18

California  2000    0.273594
            2010    0.249211
New York    2000    0.247010
            2010    0.222831
Texas       2000    0.283251
            2010    0.273568
dtype: float64

In [152]:
fraction_u18.shape

(6,)

In [153]:
pivot = fraction_u18.unstack()
pivot

Unnamed: 0,2000,2010
California,0.273594,0.249211
New York,0.24701,0.222831
Texas,0.283251,0.273568


In [154]:
pivot.shape

(3, 2)

# 3. Methods of MultiIndex Creation
1. Using List if index arrays [[index],[subindex]]
2. Using dict of {index_tuples:values}
3. Explicit Constructor `Multiindex`

In [155]:
# 1. Using list of index arrays
rng = np.random.RandomState(42)
df = pd.DataFrame(rng.rand(6,3), 
                  index=[['A', 'A', 'B', 'B', 'C', 'C'], [1,2,1,2,1,2]],
                  columns= ['col1', 'col2', 'col3']
                  )

df

Unnamed: 0,Unnamed: 1,col1,col2,col3
A,1,0.37454,0.950714,0.731994
A,2,0.598658,0.156019,0.155995
B,1,0.058084,0.866176,0.601115
B,2,0.708073,0.020584,0.96991
C,1,0.832443,0.212339,0.181825
C,2,0.183405,0.304242,0.524756


In [156]:
# Using Dict with tuple index
df = pd.DataFrame({('A', 1): [0.374540, 0.950714, 0.731994],
                   ('A', 2): [0.598658, 0.156019,  0.155995],
                   ('B', 1): [0.058084, 0.866176,  0.601115],
                   ('B', 2): [0.708073, 0.020584,  0.969910],
                   ('C', 1): [0.832443, 0.212339,  0.181825],
                   ('C', 2): [0.183405,  0.304242, 0.524756]})

df

Unnamed: 0_level_0,A,A,B,B,C,C
Unnamed: 0_level_1,1,2,1,2,1,2
0,0.37454,0.598658,0.058084,0.708073,0.832443,0.183405
1,0.950714,0.156019,0.866176,0.020584,0.212339,0.304242
2,0.731994,0.155995,0.601115,0.96991,0.181825,0.524756


In [157]:
# 3. Explicit MultiIndex constructors

# Create multiindex using arrays
index = pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])
index

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )

In [158]:
# Create multiindex using tuples
index = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])
index


MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )

In [159]:
# Create multiindex from product
index = pd.MultiIndex.from_product([['a', 'b'], [1, 2]])
index


MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )

- **We can pass the index created to index attribute while creating a series/dataframe or while using reindex.**


## 3.2 MultiIndex level names

- We can provide level names also.
  1. When defining index using multiindex
  2. Using names attribute assign level names later.

In [160]:
# Create a DF
rng = np.random.RandomState(42)
df = pd.DataFrame(rng.rand(6,3), 
                  index=[['A', 'A', 'B', 'B', 'C', 'C'], [1,2,1,2,1,2]],
                  columns= ['col1', 'col2', 'col3']
                  )

df

Unnamed: 0,Unnamed: 1,col1,col2,col3
A,1,0.37454,0.950714,0.731994
A,2,0.598658,0.156019,0.155995
B,1,0.058084,0.866176,0.601115
B,2,0.708073,0.020584,0.96991
C,1,0.832443,0.212339,0.181825
C,2,0.183405,0.304242,0.524756


In [161]:
# Define names of multi index columns

df.index.names = ['index1', 'index2']
df

Unnamed: 0_level_0,Unnamed: 1_level_0,col1,col2,col3
index1,index2,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,1,0.37454,0.950714,0.731994
A,2,0.598658,0.156019,0.155995
B,1,0.058084,0.866176,0.601115
B,2,0.708073,0.020584,0.96991
C,1,0.832443,0.212339,0.181825
C,2,0.183405,0.304242,0.524756


In [162]:
# Directly give index name in constructor

index_arr = [['A', 'A', 'B', 'B', 'C', 'C'], [1,2,1,2,1,2]]
index = pd.MultiIndex.from_arrays(index_arr, names= ('index1', 'index2'))

df = pd.DataFrame(rng.rand(6,3), 
                  index=index,
                  columns= ['col1', 'col2', 'col3']
                  )

df

Unnamed: 0_level_0,Unnamed: 1_level_0,col1,col2,col3
index1,index2,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,1,0.431945,0.291229,0.611853
A,2,0.139494,0.292145,0.366362
B,1,0.45607,0.785176,0.199674
B,2,0.514234,0.592415,0.04645
C,1,0.607545,0.170524,0.065052
C,2,0.948886,0.965632,0.808397


## 3.3 MultiIndex for columns

- The properties present for rows in pandas are also there for columns.
- Let's do multiindexing on a medical data for both rows and cols.
- rows multiindex: year and visit(1st and 2nd).
- columns multiindex: patient and data type(HR and Temp).

In [163]:
import pandas as pd
import numpy as np

row_index = pd.MultiIndex.from_product([[2013, 2014], ['1st', '2nd']], names = ['Year', 'Visit'])
col_index = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']], names= ['Patient', 'Data type'])

# Generate a random data

# HR array
data_hr = np.random.randint(50,80, 12)

# Temp array
data_temp = 37 + np.random.randn(12)

# Zip HR and Temp get list of tuples
data_x = [i for i in zip(data_hr, data_temp)]
data_x  




[(np.int64(69), np.float64(34.35723446218502)),
 (np.int64(60), np.float64(36.96155525813875)),
 (np.int64(70), np.float64(37.42928572434304)),
 (np.int64(64), np.float64(35.711474783605524)),
 (np.int64(57), np.float64(39.727873737627235)),
 (np.int64(63), np.float64(35.66217960895434)),
 (np.int64(64), np.float64(35.730134324073575)),
 (np.int64(59), np.float64(37.51100347820892)),
 (np.int64(53), np.float64(36.666367344856575)),
 (np.int64(68), np.float64(37.33065665171826)),
 (np.int64(71), np.float64(38.07149035251934)),
 (np.int64(76), np.float64(37.05781337863972))]

In [164]:
# Make array from list of tuples
arr = np.array(data_x)
arr



array([[69.        , 34.35723446],
       [60.        , 36.96155526],
       [70.        , 37.42928572],
       [64.        , 35.71147478],
       [57.        , 39.72787374],
       [63.        , 35.66217961],
       [64.        , 35.73013432],
       [59.        , 37.51100348],
       [53.        , 36.66636734],
       [68.        , 37.33065665],
       [71.        , 38.07149035],
       [76.        , 37.05781338]])

In [165]:
# Reshape to 4 rows and 6 cols
arr_multi = arr.reshape(4,6)
arr_multi

array([[69.        , 34.35723446, 60.        , 36.96155526, 70.        ,
        37.42928572],
       [64.        , 35.71147478, 57.        , 39.72787374, 63.        ,
        35.66217961],
       [64.        , 35.73013432, 59.        , 37.51100348, 53.        ,
        36.66636734],
       [68.        , 37.33065665, 71.        , 38.07149035, 76.        ,
        37.05781338]])

In [166]:
# Make dataframe from multodim array
# Name cols and rows as per their index names
health_data = pd.DataFrame(arr_multi, columns=col_index, index=row_index)
health_data

Unnamed: 0_level_0,Patient,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,Data type,HR,Temp,HR,Temp,HR,Temp
Year,Visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1st,69.0,34.357234,60.0,36.961555,70.0,37.429286
2013,2nd,64.0,35.711475,57.0,39.727874,63.0,35.66218
2014,1st,64.0,35.730134,59.0,37.511003,53.0,36.666367
2014,2nd,68.0,37.330657,71.0,38.07149,76.0,37.057813


In [167]:
# Access data
health_data['Bob']

Unnamed: 0_level_0,Data type,HR,Temp
Year,Visit,Unnamed: 2_level_1,Unnamed: 3_level_1
2013,1st,69.0,34.357234
2013,2nd,64.0,35.711475
2014,1st,64.0,35.730134
2014,2nd,68.0,37.330657


In [170]:
health_data['Bob']['HR']

Year  Visit
2013  1st      69.0
      2nd      64.0
2014  1st      64.0
      2nd      68.0
Name: HR, dtype: float64

In [171]:
health_data.loc[2014, ('Bob','HR')]  # slice none selects all years.

Visit
1st    64.0
2nd    68.0
Name: (Bob, HR), dtype: float64

In [172]:
health_data.loc[(2014, '1st')]

Patient  Data type
Bob      HR           64.000000
         Temp         35.730134
Guido    HR           59.000000
         Temp         37.511003
Sue      HR           53.000000
         Temp         36.666367
Name: (2014, 1st), dtype: float64

In [181]:
idx=pd.IndexSlice
health_data.loc[idx[:, '2nd'], idx[:, 'HR']]

Unnamed: 0_level_0,Patient,Bob,Guido,Sue
Unnamed: 0_level_1,Data type,HR,HR,HR
Year,Visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2013,2nd,64.0,57.0,63.0
2014,2nd,68.0,71.0,76.0


# 4. Indexing and Slicing a MultiIndex

## 4.1 In Series

In [182]:
# Population Series
pop_multi

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

In [183]:
# Partial indexing
pop_multi['California']  # access uisng only one level of index

2000    33871648
2010    37253956
dtype: int64

In [184]:
# Access Single element
pop_multi['California', 2010]  # Using both levels of indexing

np.int64(37253956)

In [185]:
# Use partial slicing
pop_multi['California': 'New York']

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
dtype: int64

In [187]:
# Use full slicing 
pop_multi.loc['California':'New York', 2010]  # use loc for this

California  2010    37253956
New York    2010    19378102
dtype: int64

In [188]:
# Partial slicing using lower level index
pop_multi[:, 2010]

California    37253956
New York      19378102
Texas         25145561
dtype: int64

In [198]:
# Boolean masking
pop_multi[pop_multi> 20000000]

California  2000    33871648
            2010    37253956
Texas       2000    20851820
            2010    25145561
dtype: int64

In [199]:
# Slicing with boolean masking
pop_multi[pop_multi>20000000][:, 2010]

California    37253956
Texas         25145561
dtype: int64

In [205]:
# Fancy indexing
pop_multi[['California', 'Texas' ]]

California  2000    33871648
            2010    37253956
Texas       2000    20851820
            2010    25145561
dtype: int64

In [206]:
pop_multi.loc[['California', 'Texas'], 2000]

California  2000    33871648
Texas       2000    20851820
dtype: int64

## 4.2 In Dataframes

In [207]:
health_data

Unnamed: 0_level_0,Patient,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,Data type,HR,Temp,HR,Temp,HR,Temp
Year,Visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1st,69.0,34.357234,60.0,36.961555,70.0,37.429286
2013,2nd,64.0,35.711475,57.0,39.727874,63.0,35.66218
2014,1st,64.0,35.730134,59.0,37.511003,53.0,36.666367
2014,2nd,68.0,37.330657,71.0,38.07149,76.0,37.057813


In [210]:
health_data.loc[2013]

Patient,Bob,Bob,Guido,Guido,Sue,Sue
Data type,HR,Temp,HR,Temp,HR,Temp
Visit,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
1st,69.0,34.357234,60.0,36.961555,70.0,37.429286
2nd,64.0,35.711475,57.0,39.727874,63.0,35.66218


In [214]:
health_data['Guido', 'HR']

Year  Visit
2013  1st      60.0
      2nd      57.0
2014  1st      59.0
      2nd      71.0
Name: (Guido, HR), dtype: float64

In [216]:
health_data.loc[(2013, '2nd'), ('Guido', 'HR')]

np.float64(57.0)

In [223]:
health_data.loc[(2013, '1st')]

Patient  Data type
Bob      HR           69.000000
         Temp         34.357234
Guido    HR           60.000000
         Temp         36.961555
Sue      HR           70.000000
         Temp         37.429286
Name: (2013, 1st), dtype: float64

In [224]:
health_data

Unnamed: 0_level_0,Patient,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,Data type,HR,Temp,HR,Temp,HR,Temp
Year,Visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1st,69.0,34.357234,60.0,36.961555,70.0,37.429286
2013,2nd,64.0,35.711475,57.0,39.727874,63.0,35.66218
2014,1st,64.0,35.730134,59.0,37.511003,53.0,36.666367
2014,2nd,68.0,37.330657,71.0,38.07149,76.0,37.057813


In [226]:
health_data.loc[(:, '2nd'), (:, 'Temp')]  

SyntaxError: invalid syntax (3750018792.py, line 1)

- Slicing doesnt work inside tuple.
- For this we use either `slicing(None)` or `IndexSlice`

In [231]:
ind = pd.IndexSlice
health_data.loc[idx[:, '2nd'], idx[:, 'Temp']]

Unnamed: 0_level_0,Patient,Bob,Guido,Sue
Unnamed: 0_level_1,Data type,Temp,Temp,Temp
Year,Visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
2013,2nd,35.711475,39.727874,35.66218
2014,2nd,37.330657,38.07149,37.057813


# 5. Rearranging Multi-Indices
- Sometimes rearranging the multi index is beneficial for hnadling our data.
- Like we saw `.stack()` and `.unstack()`.
- There are many more methods to control hierarchical indices and columns.

## 5.1 Sorted and unsoreted indices
- When we deal with slicing in multiindex, then the **multiindex must be sorted.**
- Otherwise it generates error. Let's check:

In [240]:
# Create a unsorted multiindex data
np.random.random(42)
index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1,2]])
data = pd.Series(np.random.rand(6), index = index)
data

a  1    0.236621
   2    0.399179
c  1    0.099896
   2    0.020623
b  1    0.150775
   2    0.164304
dtype: float64

In [243]:
# Try slicing
try:
    data['a':'b']
except KeyError as e:
    print(type(e))
    print(e)

<class 'pandas.errors.UnsortedIndexError'>
'Key length (1) was greater than MultiIndex lexsort depth (0)'


- It gives unsorted index error.
- In pandas we can perform sorting of such indices using
  
`sort_index()`
  

In [244]:
data

a  1    0.236621
   2    0.399179
c  1    0.099896
   2    0.020623
b  1    0.150775
   2    0.164304
dtype: float64

In [254]:
# Method 1
data = data.sort_index()
data

a  1    0.236621
   2    0.399179
b  1    0.150775
   2    0.164304
c  1    0.099896
   2    0.020623
dtype: float64

In [255]:
# try slicing on sorted data
data['a':'b']

a  1    0.236621
   2    0.399179
b  1    0.150775
   2    0.164304
dtype: float64

## 5.2 Stacking and Unstacking Indices

- Unstacking: Converts a stacked MultiIndex to a two-dimensional DataFrame.
- Stacking: The opposite of unstacking, converts a DataFrame back into a stacked MultiIndex form.

In [258]:
pop_multi


California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

In [260]:
pop_multi.shape  # 1D array

(6,)

In [259]:
pop_multi.unstack()

Unnamed: 0,2000,2010
California,33871648,37253956
New York,18976457,19378102
Texas,20851820,25145561


In [261]:
pop_multi.unstack().shape  # converted to 2d 

(3, 2)

In [265]:
pop_multi.unstack(level=0)  # Unstack by the first level (state name)

Unnamed: 0,California,New York,Texas
2000,33871648,18976457,20851820
2010,37253956,19378102,25145561


In [266]:
pop_multi.unstack(level=1)  # Unstack by the second level (year)

Unnamed: 0,2000,2010
California,33871648,37253956
New York,18976457,19378102
Texas,20851820,25145561


In [268]:
pop_multi.unstack().stack()  # Recovers the original series


California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

## 5.3 Index Setting and Resetting
- We can convert index labels into columns 
- To rearrange the indices.

`reset_index(name=)`

In [275]:
pop_multi.index.names = ['state', 'year']
pop_multi

state       year
California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561
dtype: int64

- Now lets reset index and also give our single column a name.

In [276]:
pop_flat = pop_multi.reset_index(name='pop')
pop_flat

Unnamed: 0,state,year,pop
0,California,2000,33871648
1,California,2010,37253956
2,New York,2000,18976457
3,New York,2010,19378102
4,Texas,2000,20851820
5,Texas,2010,25145561


- **Raw data may look like above and using this we can convert it to multiindex using:**
  1. set_index

In [277]:
pop_flat.set_index(['state', 'year'])

Unnamed: 0_level_0,Unnamed: 1_level_0,pop
state,year,Unnamed: 2_level_1
California,2000,33871648
California,2010,37253956
New York,2000,18976457
New York,2010,19378102
Texas,2000,20851820
Texas,2010,25145561


# 6. Data Aggregation on Multi index

- To perform data aggregation like mean(), max(), min(), sum(), etc, we provide level parameter.
- Level parameter means the index level on which this aggregation will work.

In [278]:
health_data

Unnamed: 0_level_0,Patient,Bob,Bob,Guido,Guido,Sue,Sue
Unnamed: 0_level_1,Data type,HR,Temp,HR,Temp,HR,Temp
Year,Visit,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2013,1st,69.0,34.357234,60.0,36.961555,70.0,37.429286
2013,2nd,64.0,35.711475,57.0,39.727874,63.0,35.66218
2014,1st,64.0,35.730134,59.0,37.511003,53.0,36.666367
2014,2nd,68.0,37.330657,71.0,38.07149,76.0,37.057813


In [291]:
# Average HR and Temp
data_mean = health_data.mean(axis=0)  # collapse rows
data_mean

Patient  Data type
Bob      HR           66.250000
         Temp         35.782375
Guido    HR           61.750000
         Temp         38.067981
Sue      HR           65.500000
         Temp         36.703912
dtype: float64

- More we will see in groupby functionality.