# Assignment 24: Fancy Indexing and Masking in NumPy #

### Goals for this Assignment ###

By the time you have completed this assignment, you should be able to:

- Use NumPy's _fancy indexing_ to access a subset of array elements by index at once
- Use NumPy's _masking_ to access a subset of array elements based on Boolean values at once

## Step 1: Use Fancy Indexing to Access Multiple NumPy Array Elements by Index ##

### Background: NumPy's Fancy Indexing ###

From the prior assignment, we saw that NumPy array elements can be individually accessed by index, as with:

In [1]:
import numpy as np
arr = np.array([9, 2, 6, 3, 1])
print(arr[1]) # prints 2

2


It turns out that you can access multiple elements of a NumPy array at once, by passing a Python list, NumPy array, or really anything iterable between the square brackets (`[]`).
The end result is a NumPy array containing a copy of the requested elements.
This is illustrated below in the next cell.

In [2]:
subset = arr[[1, 0, 4]]
print(subset) # prints [2 9 1]

[2 9 1]


The value of `subset` follows from the fact that `2`, `9`, and `1` are at in `arr` at indices `1`, `0`, and `4`, respectively.
That is, we've effectively accessed multiple array elements at once by index, in the specific order requested.

Note that `subset` here is a true copy, not a view.
We can see this in action in the following cell.

In [3]:
subset[0] = 42
print(subset) # prints [42  9  1]
print(arr) # prints [9 2 6 3 1]

[42  9  1]
[9 2 6 3 1]


As shown, while `subset[0]` corresponds to `arr[1]`, `subset` is actually a copy of the corresponding elements in `arr`, therefore `subset` can be modified independently of `arr`.

We can similarly use fancy indexing to assign to multiple array elements at once, as shown below.

In [4]:
arr = np.array([9, 2, 6, 3, 1]) # copied for convenience
arr[[0, 4]] = [42, 57]
print(arr) # prints [42  2  6  3 57]

[42  2  6  3 57]


In this case, since we use fancy indexing to provide two indices, we need to provide a list of two values to assign on the righthand side of the `=`.
Each value provided on the righthand side of `=` will be assigned in the same order as the indices provided on the lefthand side of the `=`.
In other words, the line:

```python
arr[[0, 4]] = [42, 57]
```

...effectively does the same thing as:

```python
arr[0] = 42
arr[1] = 57
```

This all may rightfully beg the question: _why have fancy indexing_?
That is, since this is really just a way to work with multiple array elements at a time, this doesn't strictly add any functionality we didn't already have.
There are a few reasons for this:

- Fancy indexing can be more convenient and compact for performing certain operations, particularly those that do require us to manipulate multiple elements of an array.  Depending on the task at hand, it can be possible to avoid loops entirely, and instead combine fancy indexing with some operations we will see in later assignments.
- Instead of hard-coding the indices to access, we can instead provide an expression evaluating to these indices.  This allows us to write an expression which computes indices that we wish to access as the program runs, which is overall very flexible.
- In terms of performance, if we are accessing a large number of array elements, doing so with one use of fancy indexing will almost assurredly perform better than doing so with a Python loop.  With fancy indexing, most of the work is done in NumPy, and NumPy itself calls out to C to do most of its work.  C is generally much faster than Python, meaning going through NumPy (and transitively C) will very likely be much faster than doing everything directly in Python.

As a more useful example of fancy indexing, we can use this to copy the values to a new, _reversed_ array, like so:

In [5]:
example = np.array([8, 4, 9])
reversed_example = example[[2, 1, 0]]
print(example) # prints [8 4 9]
print(reversed_example) # prints [9 4 8]

[8 4 9]
[9 4 8]


### Try this Yourself ###

In the next cell, define a function named `reverse_arr`, which will take a NumPy array and return a new NumPy array.
The returned NumPy array should contain the same elements as the input NumPy array, but in reverse order.
Your definition of `reverse_arr` **must** perform this reversing via fancy indexing.
It's expected that your body of `reverse_arr` will look similar to the code in the prior cell constructing `reversed_example`.
As a hint, you may find `range` useful to construct a list of indices in reverse order or the indices in the original input array.

In [8]:
# Define reverse_arr here.  Leave the calls in place in order to test your code.
def reverse_arr(arr):
    n = arr.size
    reversed_indices = list(range(n - 1, -1, -1))
    return arr[reversed_indices]

print(reverse_arr(np.array([3, 8, 4, 5]))) # should print [5 4 8 3]
print(reverse_arr(np.array([4, 2, 4, 7, 9]))) # should print [9 7 4 2 4]
print(reverse_arr(np.array([]))) # should print []

[5 4 8 3]
[9 7 4 2 4]
[]


## Step 2: Use Masking to Conditionally Access Elements in a NumPy Array ##

### Background: Masking in NumPy ###

When indexing into a NumPy array, you can also instead pass a list of Boolean values instead, of the same length as the array being indexed.
(You can also pass another NumPy array of Booleans, or even really anything iterable.)
For each corresponding index, `True` values will be retained in a new output array, and `False` values will not.
This is shown in the cell below.

In [6]:
arr = np.array([9, 2, 6, 3, 1]) # copied for convenience
new_arr = arr[[True, False, False, True, False]]
print(arr) # [9 2 6 3 1]
print(new_arr) # [9 3]

[9 2 6 3 1]
[9 3]


To explain the code in the prior cell a bit, the expression `arr[[True, False, False, True, False]]` effectively says the following:

- `arr` is assumed to be a NumPy array holding five elements.  The five elements correspond to each of the five Boolean values provided.
- For every `True` value in the array `[True, False, False, True, False]`, the value at the corresponding index in `arr` will be copied to the result NumPy array `new_arr`.  In this case, there is a `True` value at index `0` and index `3`, and so `new_arr` will contain the values in `arr` at those indices.  In other words, the expression `arr[[True, False, False, True, False]]` does the same thing as `arr[[0, 3]]`.

This ability to pass Boolean values in this manner is referred to as _masking_ in NumPy.
[Masking](https://en.wikipedia.org/wiki/Mask_(computing)) is a more general term in Computer Science, usually referring to somehow extracting certain desired values from a list-like structure of values, often with something that looks like a list of Boolean values.
The term "masking" refers to the idea that we put a mask over the whole structure, and the holes in the mask allow us to see just the parts we want to see.

As with fancy indexing, masking yields a whole new array, not a view.
This means you can modify the resulting array from masking without any fear of modifying the original data.

Masking has very similar use cases, benefits, and functionality as fancy indexing.
In practice, masking is very common, and is used as a way of concisely filtering through an input set and isolating out only those values you care about.
You can effectively use both masking and fancy indexing to do something more akin to a loop, but without all the extra syntax, and with high-performance vector-based operations.
You will use masking more in the next assignment to achieve higher-level tasks; the content in this assignment is presented first to help cut down the size of the later assignment and make it more digestible.

### Try this Yourself ###

In the next cell, the NumPy array `base` been defined, which contain some integers.
From there, there are a series of `print`s which print out the values contained in some variables.
For each of these `print`'s, define a variable with the appropriate name, which is initialized by performing masking on `base` with some list of Boolean values.
The first one is done for you as an example.

In [9]:
base = np.array([9, 4, 2, 1, 2])

arr1 = base[[False, False, True, False, True]]
print(arr1) # should print [2 2]

arr2 = base[[True, False, False, False, False]]
print(arr2) # should print [9]

arr3 = base[[False, True, False, True, False]]
print(arr3) # should print [4 1]

arr4 = base[[True, False, False, True, True]]
print(arr4) # should print [9 1 2]

[2 2]
[9]
[4 1]
[9 1 2]


## Step 3: Submit via Canvas ##

Be sure to **save your work**, then log into [Canvas](https://canvas.csun.edu/).  Go to the COMP 502 course, and click "Assignments" on the left pane.  From there, click "Assignment 24".  From there, you can upload the `24_numpy_fancy_indexing_masking.ipynb` file.

You can turn in the assignment multiple times, but only the last version you submitted will be graded.

### Special Thanks to Dr. Glenn Bruns ###

Special thanks to [Dr. Glenn Bruns](https://csumb.edu/scd/glenn-bruns/) at California State University, Monterey Bay, for providing me with closely-related materials which were used in the creation of this assignment.