# Indexing and slicing

Indexing is the selection of a subset of your data or individual elements. This is very easy in one-dimensional arrays; they behave similarly to Python lists:

In [1]:
import numpy as np

In [2]:
data = np.random.randn(7, 3)
data

array([[-1.10712504, -0.60473472,  0.71961573],
       [ 0.66161012,  0.75139849,  0.19096721],
       [ 0.37037092,  0.52287763, -0.12497468],
       [-0.38862188,  0.96257794, -0.30217841],
       [ 1.36113782,  0.59738272, -0.30496318],
       [-0.11677632, -1.87508217, -0.29912067],
       [ 0.73419997,  0.78876697,  2.24380827]])

In [3]:
data[4]

array([ 1.36113782,  0.59738272, -0.30496318])

In [4]:
data[2:4]

array([[ 0.37037092,  0.52287763, -0.12497468],
       [-0.38862188,  0.96257794, -0.30217841]])

In [5]:
data[2:4] = np.random.randn(2, 3)

In [6]:
data

array([[-1.10712504, -0.60473472,  0.71961573],
       [ 0.66161012,  0.75139849,  0.19096721],
       [-0.76425117,  0.13522634,  0.3471234 ],
       [-0.12183032, -0.91938001,  0.1812069 ],
       [ 1.36113782,  0.59738272, -0.30496318],
       [-0.11677632, -1.87508217, -0.29912067],
       [ 0.73419997,  0.78876697,  2.24380827]])

<div class="alert alert-block alert-info">
<b>Note:</b>
    <p>Array slices differ from Python lists in that they are views of the original array. This means that the data is not copied and that any changes to the view are reflected in the original array.</p>
    <p>If you want to make a copy of a part of an <code>ndarray</code>, you can copy the array explicitly – for example with <code>data[2:5].copy()</code>.</p>
</div>

_Slicing_ in this way always results in array views with the same number of dimensions. However, if you mix integer indices and slices, you get slices with lower dimensions. For example, we can select the second row but only the first two columns as follows:

In [7]:
data[1, :2]

array([0.66161012, 0.75139849])

A colon means that the whole axis is taken, so you can also select higher dimensional axes:

In [8]:
data[:, :1]

array([[-1.10712504],
       [ 0.66161012],
       [-0.76425117],
       [-0.12183032],
       [ 1.36113782],
       [-0.11677632],
       [ 0.73419997]])

## Boolean indexing

Let’s consider an example where we have some data in an array and an array of names with duplicates. I will use the `randn` function in `numpy.random` here to generate some random normally distributed data:

In [9]:
names = np.array(['Liam', 'Olivia', 'Noah', 'Liam', 'Noah', 'Olivia', 'Liam', 'Emma', 'Oliver', 'Ava'])
data = np.random.randn(10, 4)

In [10]:
names

array(['Liam', 'Olivia', 'Noah', 'Liam', 'Noah', 'Olivia', 'Liam', 'Emma',
       'Oliver', 'Ava'], dtype='<U6')

In [11]:
data

array([[ 0.35867089,  1.54040782, -0.17802212,  0.96938897],
       [-1.01908575,  0.54822696, -2.05919441, -0.2672928 ],
       [-2.39936438,  0.47425433, -1.23474455,  0.23779314],
       [ 0.15202324,  0.12542864,  0.65341945,  1.19879063],
       [-0.40538607,  0.05871589, -0.21042647,  0.94201355],
       [-2.8102482 , -1.64395184,  0.54223449,  1.71858231],
       [ 0.28221344,  0.16457021,  0.73805459, -1.3346101 ],
       [-0.33565533, -0.10226983,  1.8015877 , -0.62330811],
       [-0.19469216,  0.73504504, -0.35507636, -0.74534923],
       [-0.45535431, -1.66129057,  0.13310822,  0.32116625]])

Suppose each name corresponds to a row in the data array and we want to select all rows with the corresponding name _Liam_. Like arithmetic operations, comparisons like `==` are vectorised with arrays. So comparing names with the string `Liam` results in a Boolean array:

In [12]:
names == 'Liam'

array([ True, False, False,  True, False, False,  True, False, False,
       False])

This Boolean array can be passed when indexing the array:

In [13]:
data[names == 'Liam']

array([[ 0.35867089,  1.54040782, -0.17802212,  0.96938897],
       [ 0.15202324,  0.12542864,  0.65341945,  1.19879063],
       [ 0.28221344,  0.16457021,  0.73805459, -1.3346101 ]])

Here, the Boolean array must have the same length as the array axis it indexes.

<div class="alert alert-block alert-info">
<b>Note:</b>
    <p>Selecting data from an array by Boolean indexing and assigning the result to a new variable always creates a copy of the data, even if the returned array is unchanged.</p>
</div>

In the following example, I select the rows where `names == 'Liam'` and also index the columns:

In [14]:
data[names == 'Liam', 2:]

array([[-0.17802212,  0.96938897],
       [ 0.65341945,  1.19879063],
       [ 0.73805459, -1.3346101 ]])

To select everything except _Liam_, you can either use `!=` or negate the condition with `~`:

In [15]:
names != 'Liam'

array([False,  True,  True, False,  True,  True, False,  True,  True,
        True])

In [16]:
cond = names == 'Liam'
data[~cond]

array([[-1.01908575,  0.54822696, -2.05919441, -0.2672928 ],
       [-2.39936438,  0.47425433, -1.23474455,  0.23779314],
       [-0.40538607,  0.05871589, -0.21042647,  0.94201355],
       [-2.8102482 , -1.64395184,  0.54223449,  1.71858231],
       [-0.33565533, -0.10226983,  1.8015877 , -0.62330811],
       [-0.19469216,  0.73504504, -0.35507636, -0.74534923],
       [-0.45535431, -1.66129057,  0.13310822,  0.32116625]])

If you select two of the three names to combine several Boolean conditions, you can use the Boolean arithmetic operators `&` (and) and `|` (or).

<div class="alert alert-block alert-warning">
<b>Warning:</b>
    <p>The Python keywords <code>and</code> and <code>or</code> do not work with Boolean arrays.</p>
</div>

In [17]:
mask = (names == 'Liam') | (names == 'Olivia')

In [18]:
mask

array([ True,  True, False,  True, False,  True,  True, False, False,
       False])

In [19]:
data[mask]

array([[ 0.35867089,  1.54040782, -0.17802212,  0.96938897],
       [-1.01908575,  0.54822696, -2.05919441, -0.2672928 ],
       [ 0.15202324,  0.12542864,  0.65341945,  1.19879063],
       [-2.8102482 , -1.64395184,  0.54223449,  1.71858231],
       [ 0.28221344,  0.16457021,  0.73805459, -1.3346101 ]])

## Integer Array Indexing

Integer array indexing allows you to select any elements in the array based on your N-dimensional index. Each integer array represents a number of indices in that dimension.

<div class="alert alert-block alert-info">
<b>See also:</b>
    <ul>
        <li><a href="https://numpy.org/doc/stable/user/basics.indexing.html#integer-array-indexing">Integer array indexing</a></li>
    </ul>
</div>