# Array Creation, Indexing, and Slicing

In numpy everything is centered around the `ndarray` object, which is usually called *array* for simplicity. To quote [the docs](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html), "\[a `ndarray`\] object represents a multidimensional, homogeneous array of fixed-size items". In many cases, a working analogy to an `ndarray` is a python `list` that has a fixed size and only contains similar items, e.g. all integers or all floats. In N dimensions the analogy is a nested lists of lists where each list on the same level of nesting has the same number of elements. In fact, it is possible to convert between a python list-of-lists and ndarrays. That said, under the hood there are some significant differences between a list or a ndarray and we will look at some of those differences later in this module.

Given that we can convert between lists and ndarrays, let's look at some valid and invalid examples of lists that can be converted to arrays.

In [None]:
a0 = 42
a1 = [5, 7, 2, 59]
a2 = [3, False, "a text"]
a3 = [4.5, 99.3, 24]
a4 = [[1, 0], [0, 1]]
a5 = [[1, 0, 0], [1, 1], [0, 0, 1]]
a6 = [[4]]
a7 = [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]]

### Exercise 1

Can you guess which of the above lists can be converted into `ndarray`'s?

<details>
<summary>Answers (click me to reveal)</summary>

a0. Valid. While `a0` is not a list itself, numpy is clever enough to understand that this is a scalar.

a1. Valid. `a1` is a typical example of a vector (1 dimensional array) that you will encounter frequently.

a2. Invalid. The items are not of the same type.

a3. Valid. Another example of a 1d-array.

a4. Valid. A 2-dimensional array. This example represents a 2x2 identity matrix.

a5. Invalid. `len(a5[1]) == 2`, however the length of the other nested lists is 3.

a6. Valid. A 2-dimensional array with dimensions (1, 1), containing a single element. Note that this is different from a scalar.

a7. Valid. A 3-dimensional array (sometimes called a rank-3 tensor) with dimensions (3, 2, 2)
</details>

To create a `ndarray` from a given list, or any other valid sequence of sequence, numpy offers a convenience function called `array`. Calling it, will take a sequence of sequences of sequences ... of primitives as input, and return the ndarray object.

In [None]:
import numpy as np

my_first_array = np.array([[1, 2, 3], [4, 5, 6]])
print(my_first_array)

An array object has several attributes that describe them. The most important ones are:

- **shape**: A N-tuple containing the number of elements along each axis. In our list-of-lists analogy this is the length of each list at the respective nesting depth, e.g. if the shape is `(2, 3, 4)` then each list at depth 1 contains 3 elements. An array's shape is its most important attribute, and many problems you will encounter initially can be solved by investigating the shapes of each array involved.
- **dtype**: The data type of each element in the array. In our list-of-lists analogy the dtype is the item's type at the deepest level of nesting, i.e., the type of the items of those lists, that don't contain lists themself. Typically, these are numeric types of which numpy offers a greater variety than python, e.g., `np.float32`, `np.uint8`, or `np.complex128` to name a few.
- **ndim**: In numpy a single dimension is called *axis* and the total number of axis that span an array is called dimensionality. It is the nesting depth in our list-of-lists analogy, i.e., the dimensionality of an array is equal to the number of lists that need to be nested inside each other to represent the array as a list-of-lists. The dimensionality is sometimes called *rank*.

Here is how to access these attributes:

In [None]:
print("my_first_array has ...")
print(f"... a shape: {my_first_array.shape}")
print(f"... a data type: {my_first_array.dtype}")
print(f"... a dimensionality: {my_first_array.ndim}")

## Changing Shape, Axis Order, and Data Type

Since an array has above mentioned properties, it makes sense to ask how they can be changed. For the attributes considered here, this can be done via



In [None]:
new_array = my_first_array.reshape((1, 6))
print(new_array)

which takes as input the shape (as a tuple) and returns an array of shape (1, 6). Note that the original array is *not* modified in this process. `new_array` will either be a copy of the original array, or - more likely - a `numpy.view` into `my_first_array`, which we will cover in section 4 of this module.

A speciality of the `ndarray.reshape` method is that you are allowed to use a single `-1` in the tuple describing the new shape. The `-1` will be replaced with the appropriate number based on the size of the array (its total number of elements). For example the following is allowed

In [None]:
new_array = my_first_array.reshape(3, -1)
print(new_array)

To change the datatype of the array you can use `ndarray.astype` and numpy will try - to the extent possible - to convert between the requested types.

In [None]:
array_one = np.array([1, -2, -3, 4, 42])
print(f"Array One: {array_one}")
array_two = array_one.astype(np.float32)
print(f"Array One: {array_two}")

When changing datatypes, you need to be mindful, because these operations come with all the pros and cons of using this particular datatype. This can lead to surprising results, which can introduce hard to debug issues in your code. Your code will still run without exception, but it will not produce the result that you might expect it will; for example

In [None]:
print(f"Something is off here: {array_one.astype(np.uint32)}")
print(f"or even more extreme after going back and forth: {array_one.astype(np.bool8).astype(int)}")

If you need to create a new array of a specific data type, e.g., because you are working with images, their pixels intensities range from 0-255, and you want to save some memory, then you can do that directly  using the optional keyword argument `dtype=...` instad of calling `ndarray.dtype` on the resulting object. Most array creation functions (more on them below) support the `dtype` argument.

In [None]:
print(f"The following is equivalent: {np.array([1, 2, 3], dtype=np.uint8)} and {np.array([1,2,3]).astype(np.uint8)}")

Finally, it is possible to reorder the axis of an array and change an arrays dimensionality. We mainly cover this here for completeness, and you will re-encounter this in section 3. To do so you can use

1. [`np.expand_dims`](https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html)
2. [`np.moveaxis`](https://numpy.org/doc/stable/reference/generated/numpy.moveaxis.html)

This allow you to perform operations like

In [None]:
some_array = np.array([x for x in range(3*3*4*5)]).reshape((3, 3, 4, 5))
new_array = np.expand_dims(some_array, (2))
print(f"Comparing the dimensions: {some_array.ndim} vs {new_array.ndim}")
print(f"... also the shapes: {some_array.shape} vs {new_array.shape}")

In [None]:
reordered = np.moveaxis(new_array, (0, 1), (1, 2))
print(f"Reordered axes: {reordered.shape}")

In practice, you will not need the above explicit form often. Instead of moving axes, many functions in the numpy API allow you to specify `axis=...`, which controls the axis along which the function operates, and instead of using `np.expand_dims`, you can also use `ndarray.reshape`. For example, you can achieve the above by inserting a clever `1` into the shape tuple via

In [None]:
print(f"The following is equivalent: {new_array.shape} and {some_array.reshape((3, 3, 1, 4, 5)).shape}")

### Exercise 2

In the cell below the next one, write code that changes the shape, dtype, and/or dimensionality of the arrays found below.

1. Turn `a0` into an array of double precision floating point numbers with shape (5, 5).
2. Turn `a1` into a 1-dimensional boolean array.
3. `a2` should have `shape=(?, 1, 2)`.

**Note:** A `?` in the shape is commonly found in tensorflow error messages (we will cover tensorflow in lab 2) and means that the dimension must exist and may have length > 1.

In [None]:
a0 = np.array([x for x in range(25)])
a1 = np.array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
a2 = np.array([x for x in range(42)])

<details>
<summary>Answers (click me to reveal)</summary>

There are multiple solutions for each problem. Below are some equivalent ones.

Answer for 1.

```
a0.reshape((5,5)).astype(np.double)
a0.reshape((5,5)).astype(np.float64)
a0.astype(np.float64).reshape((5,5))
```

Answer for 2. (a1)
```
a1.reshape(-1)
a1.ravel()
a1.flatten()
```
**Note**: `ravel()` and `flatten()` have not been introduced yet, but produce the same result as `reshape(-1)`. While `ravel()` and `reshape(-1)` are equivalent (from the user's perspective), `flatten()` will always copy the array. It is important to keep this in mind, because it can have (significant) performance implications.


Answer for 3. (a2)
```
a2.reshape((-1, 1, 2))

# solution 2
a2 = a2.reshape((-1, 2))
a2 = a2.expand_dims(a2, 1)
```

</details>

## Array Creation

On top of creating arrays from a python list (of lists), numpy offers several functions to create specialized arrays. These functions are more readable, and tend to be more efficient than creating the list-of-lists equivalent. For example `foo = np.ones((3, 4, 5))` will create an array filled with ones with `foo.shape == (3, 4, 5)`. We will cover some of these creation methods below and it is generally advisable to use them if possible. Additionally, many 3d-party libraries offer their own methods of creating specialized numpy arrays. For example, the library [imageio](https://github.com/imageio/imageio) allows you to load image data (from almost 200 different formats) as `ndarray`s.

Numpy's array creation functions (Full List): [https://numpy.org/doc/stable/reference/routines.array-creation.html](https://numpy.org/doc/stable/reference/routines.array-creation.html)

Commonly used array creation functions:

In [None]:
# from a list of lists (or other nested sequences)
a0 = np.array([[1, 2], [3, 4]]) 

# from other arrays
a1 = np.asarray(a0)             # like np.array but tries to avoid copying data if possible
a2 = np.copy(a0)                # creates a copy of an existing array

# filled with constant values
shape = (3, 4)
a3 = np.ones(shape)
a4 = np.zeros(shape)
a5 = np.full(shape, 42)

# filled with constant values, matching another array
a6 = np.ones_like(a0)
a7 = np.zeros_like(a0)
a8 = np.full_like(a0, 42)

# numeric ranges
a9 = np.arange(start=1, stop=5, step=1)    # [1, 2, 3, 4]
a10 = np.linspace(start=0, stop=1, num=50)  # 50 evenly spaced numbers from 0 to 1

# special matrix creation functions
a11 = np.eye(3)          # identity matrix
a12 = np.diag([2, 3, 4]) # matrix with diagonal elements set to 2, 3, 4
a13 = np.tri(5)          # 5x5 matrix with 1 on lower triangle and 0 elsewhere

### Exercise 3

Which special function can be used to create the following arrays given as lists of lists?

**Note**: This is a great opportunity to familiarize yourself with the [relevant parts](https://numpy.org/doc/stable/reference/routines.array-creation.html) of the libraries documentation (which is generally very excellent). Part 2 and part 4 require you to use additional parameters of the function, which you can read about in the docs.


In [None]:
lists = {
    "1.": [3, 5, 7, 9, 11, 13],
    "2.": [[1., 0., 0., 0.],
           [1., 1., 0., 0.],
           [1., 1., 1., 0.]],
    "3.": [[0., 0., 0., 0.],
           [0., 0., 0., 0.],
           [0., 0., 0., 0.]],
    "4.": [[ 0.  , 10.  ],
           [ 0.25,  9.75],
           [ 0.5 ,  9.5 ],
           [ 0.75,  9.25],
           [ 1.  ,  9.  ]],
    "5.": [[1, 0, 0, 0],
           [0, 3, 0, 0],
           [0, 0, 3, 0],
           [0, 0, 0, 7]]
}

<details>
<summary>Answers (click me to reveal)</summary>

1. np.arange(3, 15, 2)
2. np.tri(3, 1)
3. np.zeros((3, 4))
4. np.linspace((0, 10), (1, 9), 5)
5. np.diag([1, 3, 3, 7])

</details>

### Exercise 4

Create the following arrays using special functions and methods for changing shape and datatype introduced above.

**Note**: The first one is a pattern you will see very regularly. It is particularily useful when debugging code that reorganizes the array in some way (`np.moveaxis` or `ndarray.reshape`), because it allows you to more easily track what happens.

In [None]:
lists = {
    "1.": [[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]],
    "2.": [[ True, False, False],
           [ True,  True, False],
           [ True,  True,  True],
           [ True,  True,  True]],
    "3.": [0, 0, 1]
}

<details>
<summary>Answers (click me to reveal)</summary>

1. np.arange(9).reshape(3,3)
2. np.tri(4, 3, dtype=bool)
3. np.linspace(0, 1, 3).astype(int)

</details>

## Intexing and Slicing

Creating an array is usually just the first step. In most cases, we want to do something with it and the most basic thing we can do is access individual elements or segments of the array. Just like our lists of lists, `ndarray` supports indexing using integers and `slice(start, end, step)` objects (you probably know slices in their shorthand form `start:end:step`, e.g. `some_list[4:6]`). The syntax that you know - 
and expect - from lists carries over.

In [None]:
some_array = np.arange(16).reshape(4, 4)
print(some_array)
print(f"Access single element at (2, 2): {some_array[2][2]}")
print(f"Access single element at (3, 1): {some_array[3][1]}")

As you may have noticed, the indexing follows the same method you may be used to from matrix calculus. the first dimension goes along the columns of the matrix, and the second index goes along the rows. Since it is so common to select elements (and ranges) along multiple dimensions, a `ndarray` also supports so-called tuple-indexing. This allows you to omit some of the brackets and instead use a tuple of the same length as the dimensions to access the elements.

In [None]:
print(f"Access single element at (2, 2): {some_array[2, 2]}")
print(f"Access single element at (3, 1): {some_array[3, 1]}")
print(f"Access range at (3, 1): {some_array[3, :3]}")


Tuple-indexing is more general (and more efficient) than the list-of-list style demonstrated above, because it allows us to easily select sub-matrices of the array. It is often easier and less error-prone to stick to tuple-style indexing, because unexpected things can happen when using the traditional list-of-list style indexing:

In [None]:
print(f"Access top left corner w/o tuple-indexing:\n {some_array[:2][:2]}")
print(f"Access top left corner w/ tuple-indexing:\n {some_array[:2, :2]}")


### Exercise 5

Why does the list-of-list style index produce a new array with shape (2, 4) instead of the expected (2, 2) array?

<details>
<summary>Answers (click me to reveal)</summary>

The tuple-style index uses a single tuple to describe the section of the array we wish to return. The list-of-lists style index gets executed as two separate calls (just like an actual list-of-lists). The first call takes the original array and returns a new array that consists of the first two elements of the original array ([[0 1 2 3], [4 5 6 7]]). The second call takes the new (2-dimensional) array and, again, returns the first two elements along the first dimension. This is exactly the previous array ([[0 1 2 3], [4 5 6 7]]).

</details>

While not being a unique feature of tuple-indexing pre se, you will see the empty slice (`slice(None, None, None)` or `:` for short) much more frequently here than in pure list indexing. 

In addition to this, numpy also offers `np.newaxis` as valid index. Its alias is `None` and in fact `(np.newaxis is None) == True`. As the name suggests, it will add a new axis at the desired location analogous to `np.expand_dims`, and is yet another way to add dimensions to an array; similar to using `1` in the shape-tuple of `ndarray.reshape`. This may sound unnecessary at first glance, but will become clear and increadibly useful in section 3 when we talk about array broadcasting.

Finally, there is the `Ellipsis` object `...`. This is a very funny object, and was introduced into python specifically for numpy use. Outside of numpy it doesn't have a lot of uptake; however, inside of numpy, and in particular in tuple-indexing, it can be used to refer to "everything that hasn't been mentioned explicitly". This is particularly useful in machine learning while working on batched data. For example, you may have a batch of images that are stacked into a 4 dimensional array in the form of `(batch, height, width, colors)`. With an ellipsis accessing the second image becomes `image_batch[1, ...]`.

Let's look at some examples.

In [None]:
some_array = np.arange(25).reshape(5, 5)
print(some_array)
print(f"Access a specific column:\n {some_array[3, :]}")
print(f"Access a specific row:\n {some_array[:, 3]}")
print(f"Select a local area around (2,3) and add a dimension:\n {some_array[None, 1:4, 2:5]}")

ellipsis_example = np.arange(3*4*4).reshape(3,4,4)
print(f"Select the last matrix in a (2d) stack of matrices\n {ellipsis_example[2, :, :]}")
print(f"Do the same with an ellipsis\n {ellipsis_example[2, ...]}")
print(f"Works along other axes, too \n {ellipsis_example[..., 3]}")
print(f"And can be combined with all the other index options \n {ellipsis_example[:-2, None, ..., 3]}")  # here the use of ... or : are equivalent

### Exercise 6

Given the arrays in the cell below, extract the requested sub-arrays.

1. Reverse `a0`.
2. Extract a 3x3 neighborhood around the element at index (3, 7) of `a1`.
2. The 2x4 sub-matrix that has the value 52 in it's bottom right corner of `a1`.
3. Remove the first and last element along each dimension of `a1`. Example: `[[1, 2, 3], [4, 5, 6], [7, 8, 9]]` --> `5`
4. Extract the array with the largest number of ones inside of `a2`.
5. The element/cell with value `2` of `a3`.

In [None]:
a0 = np.arange(22)
a1 = np.arange(81).reshape(9,9)
a2 = np.tri(11)
a3 = np.array([0, 1, 2, 3]).reshape(2, *(1 for _ in range(21)), 2)


<details>
<summary>Answers (click me to reveal)</summary>
1. `a0[::-1]`
1. `a1[2:5, 6:9]` or `a1[2:5, 6:]`
2. `a1[4:6, 5:8]`
3. `a1[1:-1, 1:-1]`
4. `a2[5:, :6]`
5. `a3[1, ..., 0]`
</details>

Finally, we can also set specific elements of an array by indexing it similar to how it works in the list-of-lists example. Again we get tuple-indexing support; however, this time we can not use `None`/`np.newaxis`. If we index a range of elements and assign values to them, then all the values in this sub array are set. One way to utilize this is to assign an array to the selected range, or - forshaddowing broadcasting - to set all elements to the same value using a scalar.

In [None]:
some_array = np.arange(3*3).reshape(3,3)
print(f"The original array {some_array}")

# a simple assignment on a single cell
some_array[1, 1] = 42
print(f"Setting a single element\n {some_array}")

some_array[1, :] = [31, 21, 11]
print(f"Setting an entire row:\n {some_array}")

some_array[1:, ...] = 18
print(f"Setting a region to a constant value: \n {some_array}")

### Exercise 7

Create the following arrays using only functions introduced in this notebook, without using `np.array(...)`.

1. Create a null vector of size 10 but the fifth value which is 1.
2. Create a 2d array with 1 on the border and 0 inside.
3. Create a 8x8 matrix and fill it with a checkerboard pattern of 1 and 0.
4. Create a 4x4 matrix where the two non-overlapping 2x2 blocks along the diagonal have different values.

<details>
<summary>Answers (click me to reveal)</summary>

Exercise 7.1
```
vector = np.zeros(10)
vector[4] = 1
```

Exercise 7.2
```
matrix = np.ones((10,10))
matrix[1:-1,1:-1] = 0
```

Exercise 7.3
```
matrix = np.zeros((8,8))
matrix[1::2, ::2] = 1
matrix[::2, 1::2] = 1
```

Exercize 7.4
```
matrix = np.zeros((4,4))
matrix[:2, :2] = 3
matrix[2:, 2:] = 7
```

</details>

## Final Exercise

Combine what you have learned, and - using only functions from this notebook (except for `np.array()`) - create the following array:

In [None]:
[[ 1.,  0.,  0.,  0.,  8.,  0.,  0.,  0.,  0.],
 [ 1.,  1.,  1.,  0.,  6.,  0.,  0.,  0.,  0.],
 [ 1.,  1.,  1.,  1.,  5.,  0.,  0.,  0.,  0.],
 [ 1.,  1.,  0.,  0.,  7.,  0.,  0.,  0.,  0.],
 [ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.],
 [42., 42., 42., 42.,  3.,  5.,  0.,  0.,  0.],
 [42., -2., -2., 42.,  2.,  0.,  6.,  0.,  0.],
 [42., -2., -2., 42.,  1.,  0.,  0.,  7.,  0.],
 [42., 42., 42., 42.,  0.,  0.,  0.,  0.,  8.]]

<details>
<summary>Answers (click me to reveal)</summary>

```
some_array = np.diag(np.arange(9)).astype(np.float32)
some_array[5:, :4] = 42
some_array[6:-1, 1:3] = -2
some_array[:4, :4] = np.tri(4)
some_array[4] = np.arange(9)
some_array[:, 4] = np.arange(9)[::-1]
```
</details>