# Index

In [1]:
import numpy as np

In [2]:
import pandas as pd

## Various types of `pandas.Index`

The pandas `Index` class and its subclasses can be view as implementing an ordered multiset. Duplicates are allowed.

In [3]:
index_str = pd.Index(['alfa', 'bravo', 'charlie', 'delta'])

In [4]:
index_str

Index(['alfa', 'bravo', 'charlie', 'delta'], dtype='object')

In [5]:
index_int = pd.Index([5, 4, 1, 2], dtype=np.uint64)

In [6]:
index_int

UInt64Index([5, 4, 1, 2], dtype='uint64')

In [7]:
index_date = pd.date_range('2020-01-01', '2020-01-04')

In [8]:
index_date

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'], dtype='datetime64[ns]', freq='D')

In [9]:
index_quater = pd.period_range('2020-03', freq='3M', periods=4)

In [10]:
index_quater

PeriodIndex(['2020-03', '2020-06', '2020-09', '2020-12'], dtype='period[3M]')

## `Series` and `Index`

In [11]:
rng = np.random.default_rng()

`Series` objects are assigned `RangeIndex` by default.

In [12]:
s = pd.Series(rng.random(4))

In [13]:
s

0    0.612977
1    0.022185
2    0.036508
3    0.441480
dtype: float64

In [14]:
s.index

RangeIndex(start=0, stop=4, step=1)

You can assign your own index.

In [15]:
index_str = pd.Index(['alfa', 'bravo', 'charlie', 'delta'])

In [16]:
index_str

Index(['alfa', 'bravo', 'charlie', 'delta'], dtype='object')

In [17]:
s = pd.Series(rng.random(4), index=index_str)

In [18]:
s

alfa       0.020006
bravo      0.234993
charlie    0.616254
delta      0.751752
dtype: float64

In [19]:
s.index

Index(['alfa', 'bravo', 'charlie', 'delta'], dtype='object')

### Organizing data in a `Series` object by the index

In [20]:
s.sort_index(ascending=False)

delta      0.751752
charlie    0.616254
bravo      0.234993
alfa       0.020006
dtype: float64

That is contrast to `sort_values()` which sorts data by values.

In [21]:
s.sort_values(ascending=False)

delta      0.751752
charlie    0.616254
bravo      0.234993
alfa       0.020006
dtype: float64

You can change the order of values with a new index.

In [22]:
new_index = pd.Index(['bravo', 'alfa', 'delta', 'charlie'])

In [23]:
new_index

Index(['bravo', 'alfa', 'delta', 'charlie'], dtype='object')

In [24]:
s.reindex(new_index)

bravo      0.234993
alfa       0.020006
delta      0.751752
charlie    0.616254
dtype: float64

## `Series` and `MultiIndex`

Hierarchical / multi-level indexing enables you to store and manipulate data with an arbitrary number of dimensions in lower dimensional data structures like `Series` (1d) and `DataFrame` (2d).

The `padas.MultiIndex` object is the hierarchical analogue of the standard `pandas.Index` object which typically stores the axis labels in pandas objects. You can think of `MultiIndex` as an array of tuples where each tuple is unique.

In [25]:
tuples = [(i,j) for i in ['alfa','bravo'] for j in [101, 102, 103]]

In [26]:
tuples

[('alfa', 101),
 ('alfa', 102),
 ('alfa', 103),
 ('bravo', 101),
 ('bravo', 102),
 ('bravo', 103)]

In [27]:
multi_index = pd.MultiIndex.from_tuples(tuples)

In [28]:
multi_index

MultiIndex([( 'alfa', 101),
            ( 'alfa', 102),
            ( 'alfa', 103),
            ('bravo', 101),
            ('bravo', 102),
            ('bravo', 103)],
           )

In [29]:
s = pd.Series(
    data=[
        'Alfa/101',  'Alfa/102',  'Alfa/103',
        'Bravo/101', 'Bravo/102', 'Bravo/103',
    ],
    index = multi_index
)

In [30]:
s

alfa   101     Alfa/101
       102     Alfa/102
       103     Alfa/103
bravo  101    Bravo/101
       102    Bravo/102
       103    Bravo/103
dtype: object

It’s worth keeping in mind that there’s nothing preventing you from using tuples as atomic labels. The reason that the MultiIndex matters is that it can allow you to do grouping, selection, and reshaping operations.

In [31]:
pd.Series(
    data=[
        'Alfa/101',  'Alfa/102',  'Alfa/103',
        'Bravo/101', 'Bravo/102', 'Bravo/103',
    ],
    index = tuples
)

(alfa, 101)      Alfa/101
(alfa, 102)      Alfa/102
(alfa, 103)      Alfa/103
(bravo, 101)    Bravo/101
(bravo, 102)    Bravo/102
(bravo, 103)    Bravo/103
dtype: object

### Accessing data in a `Series` object with the multi-level index

In [32]:
s.loc[('alfa', 102)]

'Alfa/102'

In [33]:
s.loc[('alfa',slice(None))]

alfa  101    Alfa/101
      102    Alfa/102
      103    Alfa/103
dtype: object

In [34]:
s.loc[('bravo', [101, 103])]

bravo  101    Bravo/101
       103    Bravo/103
dtype: object

In [35]:
s.loc[('bravo', slice(101, 102))]

bravo  101    Bravo/101
       102    Bravo/102
dtype: object

### Knowing values in the multi-level index

In [36]:
multi_index

MultiIndex([( 'alfa', 101),
            ( 'alfa', 102),
            ( 'alfa', 103),
            ('bravo', 101),
            ('bravo', 102),
            ('bravo', 103)],
           )

In [37]:
multi_index.values

array([('alfa', 101), ('alfa', 102), ('alfa', 103), ('bravo', 101),
       ('bravo', 102), ('bravo', 103)], dtype=object)

In [38]:
multi_index.to_numpy()

array([('alfa', 101), ('alfa', 102), ('alfa', 103), ('bravo', 101),
       ('bravo', 102), ('bravo', 103)], dtype=object)

In [39]:
multi_index.to_list()

[('alfa', 101),
 ('alfa', 102),
 ('alfa', 103),
 ('bravo', 101),
 ('bravo', 102),
 ('bravo', 103)]

In [40]:
multi_index.levels

FrozenList([['alfa', 'bravo'], [101, 102, 103]])

In [41]:
s.index.get_level_values(0)

Index(['alfa', 'alfa', 'alfa', 'bravo', 'bravo', 'bravo'], dtype='object')

In [42]:
s.index.get_level_values(1)

Int64Index([101, 102, 103, 101, 102, 103], dtype='int64')

### Organizing values in a multi-level index

In [43]:
multi_index

MultiIndex([( 'alfa', 101),
            ( 'alfa', 102),
            ( 'alfa', 103),
            ('bravo', 101),
            ('bravo', 102),
            ('bravo', 103)],
           )

In [44]:
multi_index.sortlevel(level=0, ascending=False, sort_remaining=True)

(MultiIndex([('bravo', 103),
             ('bravo', 102),
             ('bravo', 101),
             ( 'alfa', 103),
             ( 'alfa', 102),
             ( 'alfa', 101)],
            ),
 array([5, 4, 3, 2, 1, 0]))

In [45]:
multi_index.sortlevel(level=0, ascending=False, sort_remaining=False)

(MultiIndex([('bravo', 103),
             ('bravo', 102),
             ('bravo', 101),
             ( 'alfa', 103),
             ( 'alfa', 102),
             ( 'alfa', 101)],
            ),
 array([5, 4, 3, 2, 1, 0]))

In [46]:
multi_index.sortlevel(level=1, ascending=False, sort_remaining=True)

(MultiIndex([('bravo', 103),
             ( 'alfa', 103),
             ('bravo', 102),
             ( 'alfa', 102),
             ('bravo', 101),
             ( 'alfa', 101)],
            ),
 array([5, 2, 4, 1, 3, 0]))

In [47]:
multi_index.sortlevel(level=1, ascending=False, sort_remaining=False)

(MultiIndex([('bravo', 103),
             ( 'alfa', 103),
             ('bravo', 102),
             ( 'alfa', 102),
             ('bravo', 101),
             ( 'alfa', 101)],
            ),
 array([5, 2, 4, 1, 3, 0]))

### Organizing data in a `Series` object by the multi-level index

In [48]:
s

alfa   101     Alfa/101
       102     Alfa/102
       103     Alfa/103
bravo  101    Bravo/101
       102    Bravo/102
       103    Bravo/103
dtype: object

In [49]:
s.sort_index(ascending=False)

bravo  103    Bravo/103
       102    Bravo/102
       101    Bravo/101
alfa   103     Alfa/103
       102     Alfa/102
       101     Alfa/101
dtype: object

In [50]:
s.sort_index(level=0, ascending=False)

bravo  103    Bravo/103
       102    Bravo/102
       101    Bravo/101
alfa   103     Alfa/103
       102     Alfa/102
       101     Alfa/101
dtype: object

In [51]:
s.sort_index(level=0, ascending=False, sort_remaining=False)

bravo  103    Bravo/103
       102    Bravo/102
       101    Bravo/101
alfa   103     Alfa/103
       102     Alfa/102
       101     Alfa/101
dtype: object

In [52]:
s.sort_index(level=1, ascending=False)

bravo  103    Bravo/103
alfa   103     Alfa/103
bravo  102    Bravo/102
alfa   102     Alfa/102
bravo  101    Bravo/101
alfa   101     Alfa/101
dtype: object

In [53]:
s.sort_index(level=1, ascending=False, sort_remaining=False)

bravo  103    Bravo/103
alfa   103     Alfa/103
bravo  102    Bravo/102
alfa   102     Alfa/102
bravo  101    Bravo/101
alfa   101     Alfa/101
dtype: object

### Reorganizing data in a `Series` object with multi-level indeces

In [54]:
s

alfa   101     Alfa/101
       102     Alfa/102
       103     Alfa/103
bravo  101    Bravo/101
       102    Bravo/102
       103    Bravo/103
dtype: object

You can reorganize data by reordering a level.

In [55]:
s.reindex(index = [102, 103, 101], level = 1)

alfa   102     Alfa/102
       103     Alfa/103
       101     Alfa/101
bravo  102    Bravo/102
       103    Bravo/103
       101    Bravo/101
dtype: object

Or by reordering every levels.

In [56]:
multi_index

MultiIndex([( 'alfa', 101),
            ( 'alfa', 102),
            ( 'alfa', 103),
            ('bravo', 101),
            ('bravo', 102),
            ('bravo', 103)],
           )

In [57]:
multi_index.levels[0].to_numpy()

array(['alfa', 'bravo'], dtype=object)

In [58]:
multi_index.levels[1].to_numpy()

array([101, 102, 103])

In [59]:
new_multi_index = pd.MultiIndex.from_product([
    multi_index.levels[0].to_numpy(),
    [102, 103, 101]
])

In [60]:
new_multi_index

MultiIndex([( 'alfa', 102),
            ( 'alfa', 103),
            ( 'alfa', 101),
            ('bravo', 102),
            ('bravo', 103),
            ('bravo', 101)],
           )

In [61]:
s.reindex(new_multi_index)

alfa   102     Alfa/102
       103     Alfa/103
       101     Alfa/101
bravo  102    Bravo/102
       103    Bravo/103
       101    Bravo/101
dtype: object