<a href="https://colab.research.google.com/github/xMCTH/DSFMCTH/blob/main/06_Numpy_indexing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6. Numpy indexing

Often we need to extract some information from a Pandas DataFrame. Here also, Pandas inherits many of the approaches used in Numpy. Therefore we start here by very briefly showing how to proceed with plain arrays before looking at the more complex DataFrames.

Note that Numpy indexing is very powerful and that we cover here only a tiny fraction of this topic. To learn more you can for example the [Numpy reference](https://numpy.org/doc/stable/reference/arrays.indexing.html#indexing).

In [None]:
# import numpy as np

We first create an array:

In [None]:
# my_array = np.random.normal(size=10)
# my_array

## Extracting and setting elements

The standard way to extract information from an array is to used the square parenthesis (bracket) notation. If we want for example to extract the second element of the array we write:



In [None]:
# my_array[1]

Remember that **we start counting from 0** in Python, which is why the *second* element has index 1.

We can extend the notation and extract a range of elements by using the ```from_index:to_index (excluded)``` notation. Here ```excluded``` means that the **last index** specified is **not included**. For example if we want to recover elements with indices from 1 to 3 we write:

In [None]:
# my_array[1:4]

We can also set values in the array in the same maner. For example let's set the above elements to 10:

In [None]:
# my_array[1:4] = 10

In [None]:
# my_array

Note that you can sometimes simplify the notation. For example if you want to extract all elements from the 4th one **to the last one**, you don't have to specify the last index, you can simply replace it by ```:```:


In [None]:
# my_array[4::]

## Higher dimensions

We have seen before that we can create arrays with more than one dimension (think e.g. of the pixels of an image). For example:

In [None]:
# array2D = np.random.normal(size=(3,5))
# array2D

The indexing system works in the same way here. We just have to specify now for each dimension which rows/columns we want to extract with ```my_array[start_row:end_row, start_column:end_column]```:

In [None]:
# array2D[1:3, 0:2]

Here again, we can simplify the notation. If we want to select a few rows but **want to keep all columns**, we can again use the ```:``` notation like this:

In [None]:
# array2D[1:3, :]

## Working with sub-parts

Using indexing, we can also create a smaller array that we want to work on specifically. For example let's say we are only interested in the 6th to 8th element. We can **extract** it and **asign** it to a new array:

In [None]:
# sub_array = my_array[7:10]

In [None]:
# my_array

In [None]:
# sub_array

Let's now modify an element of this subarray:

In [None]:
# sub_array[0] = 100

Let's check that ```sub_array``` has indeed changed:

In [None]:
# sub_array

Let's now also have a look at the original array:

In [None]:
# my_array

**The value in the original array has changed too!**. The reason is that the slicing of the array **does not create an independent sub-array**. It is still linked to the original one. Depending on the types of modification, you might or might not encounter this problem. To be on the safe side, explicitely create a **copy** when creating a sub-array. Like that it will be independent from the original one: 

In [None]:
# sub_array = my_array[7:10].copy()
# sub_array[0] = 200

In [None]:
# sub_array

In [None]:
# my_array

## Boolean indexing

Instead of using numerical indices to extract values from the array, we can also select them by some criteria. Let's create a new random array:

In [None]:
# my_array2 = np.random.normal(size=10)
# my_array2

How to proceed now if we for example only want to recover the elements that are larger than 0 ?

Let's try to see what happens when we just write it down as we would in regular mathemetics:

In [None]:
# my_array2 > 0

We see that the output is again an array, but instead of being filled with numbers, it contains only ```False``` and ```True```. Those values also exist in plain Python and are called booleans. For example:

In [None]:
# a = 3
# a > 10

We can now create an actual boolean array:

In [None]:
# bool_array = my_array2 > 0
# # bool_array

We can now use this **boolean array** ```bool_array``` to extract values from any array of the same size. Imagine that you superpose ```bool_array``` to another array ```value_array``` and only select those values in ```value_array``` which are ```True``` in ```bool_array```. Naturally we can do this with the original array itself. Instead of passing and index ```my_array[i]``` we pass the entire ```bool_array```:

In [None]:
# from IPython.display import Image
# Image(url='https://github.com/guiwitz/ISDAwPython_day2/raw/master/images/logical_indexing.jpeg',width=700)

In [None]:
# my_array2[bool_array] 

Naturally this output array is much smaller than the original one as it only contains the values larger than 0.

## Exercise

1. Create a numpy array with values from 0 to 10 in steps of 0.5
2. Extract the the last three elements of the array using slicing.
3. Apply a cosine function to the full array created in (1.) and store the output in a new array.
4. Create a boolean array telling which values in the array from (3.) are smaller than 0.
5. Recover only those values in a new array via indexing.

# Submit one question regarding anything in the course so far

Via this [link](https://docs.google.com/forms/d/e/1FAIpQLSeQskEubSUMw1lCRBBnOx5orKjCiMgQ-AWgkliu0Nnf-XEfvA/viewform?usp=sf_link) (mandatory)