# Index objects

The index objects of pandas are responsible for the axis labels and other metadata, such as the axis name. Any array or other sequence of labels you use when constructing a series or DataFrame is internally converted into an index:

In [1]:
import pandas as pd

obj = pd.Series(range(7), index=pd.date_range("2022-02-02", periods=7))

In [2]:
obj.index

DatetimeIndex(['2022-02-02', '2022-02-03', '2022-02-04', '2022-02-05',
               '2022-02-06', '2022-02-07', '2022-02-08'],
              dtype='datetime64[ns]', freq='D')

In [3]:
obj.index[3:]

DatetimeIndex(['2022-02-05', '2022-02-06', '2022-02-07', '2022-02-08'], dtype='datetime64[ns]', freq='D')

Index objects are immutable and therefore cannot be changed by the user:

In [4]:
obj.index[1] = '2022-02-03'

TypeError: Index does not support mutable operations

Immutability makes the sharing of index objects in data structures more secure:

In [5]:
import numpy as np

labels = pd.Index(np.arange(3))

labels

Int64Index([0, 1, 2], dtype='int64')

In [6]:
obj2 = pd.Series(np.random.randn(3),index=labels)

In [7]:
obj2

0   -0.907581
1    0.394003
2    1.004699
dtype: float64

In [8]:
obj2.index is labels

True

To be similar to an array, an index also behaves like a fixed-size set:

In [9]:
data = {'Code': ['U+0000', 'U+0001', 'U+0002', 'U+0003', 'U+0004', 'U+0005'],
        'Decimal': [0, 1, 2, 3, 4, 5],
        'Octal': ['001', '002', '003', '004', '004', '005']}
df = pd.DataFrame(data)

In [10]:
df

Unnamed: 0,Code,Decimal,Octal
0,U+0000,0,1
1,U+0001,1,2
2,U+0002,2,3
3,U+0003,3,4
4,U+0004,4,4
5,U+0005,5,5


In [11]:
df.columns

Index(['Code', 'Decimal', 'Octal'], dtype='object')

In [12]:
'Code' in df.columns

True

In [13]:
'Key' in df.columns

False

## Axis indices with double labels

Unlike Python sets, a Pandas index can contain duplicate labels:

In [14]:
data2 = {'Code': ['U+0006', 'U+0007'],
        'Decimal': [6, 7],
        'Octal': ['006', '007']}
df2 = pd.DataFrame(data2)
dupe = df.append(df2)

dupe

Unnamed: 0,Code,Decimal,Octal
0,U+0000,0,1
1,U+0001,1,2
2,U+0002,2,3
3,U+0003,3,4
4,U+0004,4,4
5,U+0005,5,5
0,U+0006,6,6
1,U+0007,7,7


For selections with duplicate labels, all occurrences of the label in question are selected:

In [15]:
dupe.loc[1]

Unnamed: 0,Code,Decimal,Octal
1,U+0001,1,2
1,U+0007,7,7


In [16]:
dupe.loc[2]

Code       U+0002
Decimal         2
Octal         003
Name: 2, dtype: object

Data selection is one of the main points that behaves differently with duplicates. Indexing a label with multiple entries results in a series, while single entries result in a scalar value. This can complicate your code because the output type of indexing can vary depending on whether a label is repeated or not. In addition, many pandas functions, such as `reindex`, require labels to be unique. You can use the `is_unique` property of the index to determine whether its labels are unique or not:

In [17]:
dupe.index.is_unique

False

To avoid duplicate labels, you can use `ignore_index=True`, for example:

In [18]:
dupe = df.append(df2, ignore_index=True)

dupe

Unnamed: 0,Code,Decimal,Octal
0,U+0000,0,1
1,U+0001,1,2
2,U+0002,2,3
3,U+0003,3,4
4,U+0004,4,4
5,U+0005,5,5
6,U+0006,6,6
7,U+0007,7,7


## Some index methods and properties

Each index has a number of set logic methods and properties that answer other general questions about the data it contains. The following are some useful methods and properties:

Method | Description
:----- | :----------
`append` | concatenates additional index objects, creating a new index
`difference` | calculates the difference of two sets as an index
`intersection` | calculates the intersection
`union` | calculates the union set
`isin` | computes a boolean array indicating whether each value is contained in the passed collection
`delete` | computes a new index by deleting the element in index `i`
`drop` | computes a new index by deleting the passed values
`insert` | insert computes new index by inserting the element in index `i`
`is_monotonic` | is_monotonic returns `True` if each element is greater than or equal to the previous element
`is_unique` | is_unique returns `True` if the index does not contain duplicate values
`unique` | calculates the array of unique values in the index

## Re-indexing

An important method for Pandas objects is re-indexing, i.e. creating a new object with rearranged values that match the new index. Consider, for example:

In [19]:
obj = pd.Series(range(7), index=pd.date_range("2022-02-02", periods=7))

In [20]:
obj

2022-02-02    0
2022-02-03    1
2022-02-04    2
2022-02-05    3
2022-02-06    4
2022-02-07    5
2022-02-08    6
Freq: D, dtype: int64

In [21]:
new_index = pd.date_range("2022-02-03", periods=7)

In [22]:
obj.reindex(new_index)

2022-02-03    1.0
2022-02-04    2.0
2022-02-05    3.0
2022-02-06    4.0
2022-02-07    5.0
2022-02-08    6.0
2022-02-09    NaN
Freq: D, dtype: float64

`reindex` creates a new index and re-indexes the DataFrame. By default, values in the new index for which there are no corresponding records in the DataFrame become `NaN`.

For ordered data such as time series, it may be desirable to interpolate or fill values during reindexing. The `method` option allows this with a method like `ffill` that fills the values forward:

In [23]:
obj.reindex(new_index, method='ffill')

2022-02-03    1
2022-02-04    2
2022-02-05    3
2022-02-06    4
2022-02-07    5
2022-02-08    6
2022-02-09    6
Freq: D, dtype: int64

For a DataFrame, `reindex` can change either the (row) index, the columns or both. If only a sequence is passed, the rows in the result are re-indexed:

In [24]:
df.reindex(range(7))

Unnamed: 0,Code,Decimal,Octal
0,U+0000,0.0,1.0
1,U+0001,1.0,2.0
2,U+0002,2.0,3.0
3,U+0003,3.0,4.0
4,U+0004,4.0,4.0
5,U+0005,5.0,5.0
6,,,


The columns can be re-indexed with the keyword `columns`:

In [25]:
encoding = ['Octal', 'Code', 'Description']

df.reindex(columns=encoding)

Unnamed: 0,Octal,Code,Description
0,1,U+0000,
1,2,U+0001,
2,3,U+0002,
3,4,U+0003,
4,4,U+0004,
5,5,U+0005,


### Arguments of the function `reindex`

Argument | Description
:------- | :----------
`labels` | New sequence to be used as index. Can be an index instance or another sequence-like Python data structure. An index is used exactly as it is, without being copied.
`axis` | The new axis to index, either `index` (rows) or `columns`. The default is `index`. You can alternatively use `reindex(index=new_labels)` or `reindex(columns=new_labels)`.
`method` | Interpolation method; `ffill` fills forwards, while `bfill` fills backwards.
`fill_value` | Substitute value to be used when missing data is inserted by re-indexing. Uses `fill_value='missing'` (the default behaviour) if the missing labels in the result are to have zero values.
`limit` | When filling forward or backward, the maximum number of elements to fill.
`tolerance` | When filling forward or backward, the maximum size of the gap to be filled for inexact matches.
`level` |  Match single index at `MultiIndex` level; otherwise select subset.
`copy` | If `True`, the underlying data is always copied, even if the new index matches the old index; if `False`, the data is not copied if the indices are equivalent.