# 3.2 ndarrays
An `ndarray` is a multidimensional container of items of the same type and size, in other words, an N-dimensional array. 

- **Create**
  
An `ndarray` is created by using the `numpy.array` function. The simplest way of creating an `ndarray` is by passing a list into the function `numpy.array`. 

In [1]:
import numpy as np

arr = np.array([[1,2,3],[4,5,6]])
print(arr)

[[1 2 3]
 [4 5 6]]


- **Size**

The above code snippet creates an `ndarray` of size 2 by 3 (also written as 2 x 3). It's always advisable that, for an `ndarray`, there is a fixed size for each dimension. For example, `[[1,2,3],[4,5,6]]` is a 2-dimensional array of 2 by 3; the first dimension has the size of 2 and the second dimension has the size of 3. If we consider the first dimension being the rows and the second dimension being the columns, we can construct a table of 2 by 3 of this array. 

~~~
|  1  |  2  |  3  |
|  4  |  5  |  6  |
~~~

An example of array with unfixed size for every dimension is `[[1,2,3],[4,5]]`. The first dimension has the size of 2, and for the second dimension, the first item has the size of 3 whereas the second item has the size of 2. It's perfectly okay to create such array as a normal Python array. However, to create it as an `ndarray`, we will receive an error as shown in the following code snippet.

In [2]:
import numpy as np

pyarr = [[1,2,3],[4,5]]
arr = np.array(pyarr)
print(arr)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

After we run the code snippet above, we will get an error about deprecation. The main reason creating an ndarray from array with unfixed size is not recommended is because ndarray is creating data similar to a table. By having unfixed size as shown in the code snippet above, the ndarray created is actually a 1d array with two elements. The first element is a list of `[1,2,3]`, and the second element is a list of `[4,5]`. The ndarray is **not**

```
[[1,2,3],
 [4,5,None]]
```
Therefore be cautious and always be aware of what we are expecting the code to produce for us.

- **Type**

The intended use of an `ndarray` is to contain items of the same type. To ensure all items are of the same type, we can pass the type as the second argument when creating the array with `numpy.array`. This is useful when the array is constructed elsewhere and we can use it to validate if all items are as intended.

In [3]:
import numpy as np

arr = np.array([[1,2,3],[4,5,6]], np.int32)
print(arr)

[[1 2 3]
 [4 5 6]]


The previous code snippet informs numpy.array that the items are supposed to be (and will be converted into) `numpy.int32`. If we provide a text character as one of the items, as it can't be converted to a `numpy.int32`, an exception will be raised.

In [4]:
import numpy as np

arr = np.array([[1,2,3],[4,5,'a']], np.int32)
print(arr)

ValueError: invalid literal for int() with base 10: 'a'

The data type (`dtype`) passed as the second argument of `numpy.array` has to be a numpy data type. The list of numpy data types can be found [here](https://numpy.org/doc/stable/reference/arrays.scalars.html).

- **Information of a numpy.array**

Once we created an `ndarray`, the library provided some useful methods to access the information of the `ndarray`.

In [5]:
import numpy as np

arr = np.array([[1,2,3],[4,5,6]], np.int32)
print(f"Number of dimensions: {arr.ndim}")
print(f"Shape: {arr.shape}")
print(f"Number of items: {arr.size}")
print(f"Item type: {arr.dtype}")

Number of dimensions: 2
Shape: (2, 3)
Number of items: 6
Item type: int32


- **Create ndarray of specific values**

NumPy provides `numpy.empty` to create an empty array of a specific shape. To create an ndarray of shape 2 by 3, we will use `numpy.empty`(`[2,3]`). By default, the `ndarray` is created with `dtype=numpy.float64`. This can be changed by passing the `dtype=...` argument to change the type of the items.



In [6]:
import numpy as np

arr = np.empty([2,3])
print(arr)

[[9.34577196e-307 9.34598246e-307 1.60218491e-306]
 [1.69119873e-306 1.24611673e-306 1.05699581e-307]]


We can then use the `.fill` method of the ndarray to populate all the items as a certain value.



In [7]:
import numpy as np

arr = np.empty([2,3])
arr.fill(3)
print(arr)

[[3. 3. 3.]
 [3. 3. 3.]]


To accelerate this process, NumPy has also provided the function of `numpy.ones`, `numpy.zeros`, and `numpy.full`:

- `numpy.ones` creates an array filled with 1's
- `numpy.zeros` creates an array filled with 0's 
- `numpy.full` creates an array filled with specified value.

In [8]:
import numpy as np

shape = [2,3]
arr = np.ones(shape)
print(arr)
print("")
arr = np.zeros(shape)
print(arr)
print("")
arr = np.full(shape, 3)
print(arr)

[[1. 1. 1.]
 [1. 1. 1.]]

[[0. 0. 0.]
 [0. 0. 0.]]

[[3 3 3]
 [3 3 3]]


If we have an array and we want to create an empty array with the same shape, NumPy provides the function `numpy.empty_like` to help us to achieve it.

In [9]:
import numpy as np

ori = np.array([[1,2,3],[4,5,6]])
arr = np.empty_like(ori) 
arr.fill(4)
print(arr)

[[4 4 4]
 [4 4 4]]


Similar functions such as `numpy.ones_like`, `numpy.zeros_like`, and `numpy.full_like` create arrays pre-filled with different values.

## 3.2.1 Indexing and slicing ndarrays
Indexing allows us to access a certain item in an array using an index. Slicing allows us to extract a group of items from the arrays using their indices.

**Basic**

The basic indexing and slicing for a 1-D ndarray is identical to that of the standard Python list. To access a single item, we will just use the syntax of `arr[index]`, whereas for a range of items, we use the syntax of `start:stop:step`. 

The `start`, `stop`, and `step` are all optional. If step is not specified, it's assumed to be 1. If stop is not specified, it's assumed to include up to the last item. If start is not specified, it's assumed to include from the first item. Generally, if we are not to specify step, we will omit the second colon `:`. So to include every item, we can also use `arr[:]`.

> To slice an array with in reverse order, we can use a negative value as step. For example, `arr[4:2:-1]` will give us `[4, 3]`. If we only specify `arr[4:2]`, the output will be an empty array.

> Negative indexing for start and stop are used to refer to index of the item counting from reverse order. `-1` will refer to the last item, `-2` the second last item, and so on.

In [10]:
import numpy as np

arr = [0,1,2,3,4,5]
narr = np.array(arr)

print(arr[2:4])
print(narr[2:4:2])

[2, 3]
[2]


**Two dimensions and above**
- Indexing single item

To access a single item in an ndarray with more than one dimension, we can use multiple pairs of square brackets similar to the Python list.

In [11]:
import numpy as np

arr = [ [1,2,3], [4,5,6] ]
narr = np.array(arr)

print(arr[1][2])
print(narr[1][2])

6
6


The example shows the indexing of the item of index 1 in the first dimension, and index `2` in the second dimension. The item of index `1` in the first dimension is `[4,5,6]`, and the item of index `2` in the indexed item is `6`.

For ndarray, we can also index with a list of indices, where the first index is for the first dimension, the second for the second dimension, and so on. So to access the same item as the previous example, we will use `[1,2]` as the index for narr.

In [12]:
import numpy as np

arr = [ [1,2,3], [4,5,6] ]
narr = np.array(arr)

print(narr[1,2])

6


Note that indexing with a list of indices will not work for a Python list. For a Python list, taking the example of `arr = [[1,2,3],[4,5,6]]`, to index the value 6, which is the third element of the second element, we would need to do `arr[1][2]`. If we do `arr[1,2]`, an exception will be raised.

In [13]:
arr = [ [1,2,3], [4,5,6] ]
print(arr[1][2])
print(arr[1,2])

6


TypeError: list indices must be integers or slices, not tuple

- Slicing

To slice an ndarray of multiple dimensions, we can apply the syntax start:stop:step on each dimension. Take the following example of a 3-dimensional array with the size of 3 by 3 by 3.

In [14]:
import numpy as np

arr = [ 
    [ [1,2,3], [4,5,6], [7,8,9] ], 
    [ [11,12,13], [14,15,16], [17,18,19] ], 
    [ [21,22,23], [24,25,26], [27,28,29] ]
]
narr = np.array(arr)
print(narr[:2, 1:, 1:])

[[[ 5  6]
  [ 8  9]]

 [[15 16]
  [18 19]]]


The example shows an attempt to slice the ndarray with `[:2, 1:, 1:]`. To break it down, for the first dimension `:2`, we are slicing items with an index of `0` and `1`, i.e. 

```
[
    [[1,2,3], [4,5,6], [7,8,9]],
    [[11,12,13], [14,15,16], [17,18,19]]
]
```

For the second dimension `1:`, we are slicing items with an index of 1 and 2. Therefore we will get
```
[
    [[4,5,6], [7,8,9]],
    [[14,15,16], [17,18,19]]
]
```

For the third dimension `1:`, we are slicing items with an index of 1 and 2 again. Therefore we will have
```
[
    [[5,6], [8,9]],
    [[15,16], [18,19]]
]
```

**Advanced indexing**
- Indexing with integer array

Aside from using the syntax of `start:stop:step`, we can also use a list of integers for each dimension. However, the outcome of this is different from using the syntax of `start:stop:step`. 

In [15]:
import numpy as np

arr = [ 
    [ [1,2,3], [4,5,6], [7,8,9] ], 
    [ [11,12,13], [14,15,16], [17,18,19] ], 
    [ [21,22,23], [24,25,26], [27,28,29] ]
]
narr = np.array(arr)
print(narr[[0,1], [0,2], [1,2]])

[ 2 19]


In this example, `narr[[0,1],[0,2],[1,2]]`, we are indexing two items. The first item is `narr[0,0,1]` and the second item is `narr[1,2,2]`. 

> The use of `start:stop:step` and the integer list can be combined. However, the interaction is more complicated. We won't be discussing on that in this lesson but you may explore it further.

- Indexing with boolean array

An ndarray can also be indexed with a boolean array. The shape of the boolean array to be used as an index is expected to have the same shape as the array to be indexed.

In [16]:
import numpy as np

arr = [ 
    [ [1,2,3], [4,5,6], [7,8,9] ], 
    [ [11,12,13], [14,15,16], [17,18,19] ], 
    [ [21,22,23], [24,25,26], [27,28,29] ]
]
narr = np.array(arr)
print(narr[narr % 10 < 5])

[ 1  2  3  4 11 12 13 14 21 22 23 24]


`narr % 10 < 5` produces a boolean array with the same shape of `narr` populated with the value of `True` and `False`. The output array is a flattened array of the items with the indices of True in the boolean array.

## 3.2.1 View and copy of an ndarray
NumPy provides different methods to create a view or a copy of an ndarray. 

**View**

A view of an ndarray is a shallow copy of the original array. Changes of value in the original array or the shallow copy will be reflected in the other array.

As shown in the example, updating the value in the original array changes the value in the view array, and vice versa.

In [17]:
import numpy as np

arr = [[1,2,3], [4,5,6]]
narr = np.array(arr)
narrview = narr.view()

print("Updating original array")
narr[0,0] = 0
print(f"\tOriginal: {narr[0,0]}")
print(f"\tView: {narrview[0,0]}")

print("Updating view array")
narrview[0,0] = 2
print(f"\tOriginal: {narr[0,0]}")
print(f"\tView: {narrview[0,0]}")

Updating original array
	Original: 0
	View: 0
Updating view array
	Original: 2
	View: 2


**Copy**

A copy of an ndarray creates a deep copy of the original array. Changes of value in one array will not reflect in the other array.

In [18]:
import numpy as np

arr = [[1,2,3], [4,5,6]]
narr = np.array(arr)
narrcopy = narr.copy()

print("Updating original array")
narr[0,0] = 0
print(f"\tOriginal: {narr[0,0]}")
print(f"\tCopy: {narrcopy[0,0]}")

print("Updating copy array")
narrcopy[0,0] = 2
print(f"\tOriginal: {narr[0,0]}")
print(f"\tCopy: {narrcopy[0,0]}")

Updating original array
	Original: 0
	Copy: 1
Updating copy array
	Original: 0
	Copy: 2


## 3.2.3 Operations on ndarrays

### 3.2.3.1 Arithmetic and comparison

When arithmetic and comparison operators are applied on ndarrays, they are assumed to be element-wise operations. It means that if we apply the operation of plus `+` on two `ndarrays` with values of `[1,2,3]` and `[4,5,6]`, the first item in the first array `(1)` will be summed with the first item in the second array `(4)`; the second item with the second item, and so on. 

Due to this behaviour, when applying arithmetic and comparison operators, the ndarrays should have equal shape. The arithmetic operators are inclusive of `+`, `-`, `*`, `/`, `//`, `%`, `divmod()`, `**` or `pow()`, `<<`, `>>`, `&`, `^`, `|`, `~`, whereas the comparison operators are inclusive of `==`, `<`, `>`, `<=`, `>=`, `!=`.

In [19]:
import numpy as np

a = np.array([1,2,3])
b = np.array([4,5,6])
print(a + b)

[5 7 9]


### 3.2.3.2 Data analysis
NumPy has provided various methods for `ndarray` to perform analysis on the data. Among which include but not limited to `.max`, `.argmax`, `.min`, `.argmin`, `.sum`, `.cumsum`, `.mean`, `.var`, `.std`.

methods|functions
---------|---------------------
.max     |maximum value
.argmax  |index of the maximum value
.min     |minimum value
.argmin  |index of the minimum value
.sum     |sum of the values
.cumsum  |cumulative sum of the values
.mean    |mean of the values
.var     |variance of the values
.std     |standard deviation of the values

In [20]:
import numpy as np

narr = np.array([1,2,3,4,5,6])
print(f"maximum value: {narr.max()} at index {narr.argmax()}")
print(f"minimum value: {narr.min()} at index {narr.argmin()}")
print(f"total: {narr.sum()}")
print(f"cumulative sum: {narr.cumsum()}")
print(f"mean: {narr.mean()}")
print(f"variance: {narr.var()}")
print(f"standard deviation: {narr.std()}")

maximum value: 6 at index 5
minimum value: 1 at index 0
total: 21
cumulative sum: [ 1  3  6 10 15 21]
mean: 3.5
variance: 2.9166666666666665
standard deviation: 1.707825127659933


For all of these methods, there is an optional argument of axis to specify the axis of operation. If no axis is specified, all items are included for calculation. Number of axes of an ndarray equals to the number of dimensions it has.` [1,2,3]` has one axis; `[[1,2,3],[4,5,6]]` has two axes; `[ [ [1,2,3],[4,5,6] ], [ [7,8,9],[10,11,12] ] ]` has three axes.

In [21]:
import numpy as np

narr = np.array([[1,5,3], [4,2,6]])
print(narr.max(axis=0))
print(narr.max(axis=1))

[4 5 6]
[5 6]


The axis index is counted from the outer-most dimension to the inner-most dimension. For the 2-d array of

```
[
    [1, 5, 3],
    [4, 2, 6]
]
```

when we use `.max(axis=0)`, we are requesting for the maximum value along axis 0, i.e. the outermost dimension. Axis 0 is the vertical axis in the shown layout. Therefore, to find the maximum values along axis 0, we are comparing 1 and 4, 5 and 2, 3 and 6. The maximum values along axis 0 are `[4,5,6]`.

Axis 1 is the axis inner of axis 0. In this case, axis 1 is the horizontal axis. Finding the maximum values along axis 1 means we are comparing 1, 5, and 3, and 4, 2, and 6. The maximum values along axis 1 are `[5,6]`. 


To illustrate the concept of axis for 3-d array, we use the following example,

```
[
    [ [7, 2, 9], [ 4, 11,  6] ], 
    [ [1, 8, 3], [10,  5, 12] ]
]
```

the maximum values along axis 0 are obtained by comparing the elements in first line with the elements in second line, i.e. `[ [7,8,9], [10,11,12] ]`. 

The maximum values along axis 1 are obtained by comparing across the second dimension, meaning, we are comparing 7 and 4, 2 and 11, 9 and 6 for the first line. The outcome of it is` [ [7,11,9], [10,8,12] ]`. Another way to view this is, by removing the outer-most dimension, the first item is

```
[
    [7,  2, 9],
    [4, 11, 6]
]
```
Therefore axis 1 in the original 3-d array becomes the axis 0 in the extracted 2-d array. So we compare vertically in this extracted array.

The maximum values along axis 2 in the 3-d array is obtained by comparing in the third dimension. In this case we are comparing within the inner-most item. So the output of `.max(axis=2)` is` [[9,11], [8,12]]`.

> The concept of the axis can be confusing. Spend some time to test it out to understand it better.

In [22]:
import numpy as np

narr = np.array([[[7,2,9],[4,11,6]], [[1,8,3],[10,5,12]]])
print(narr.max(axis=0))

[[ 7  8  9]
 [10 11 12]]


### 3.2.3.3 All and any
As mentioned earlier, the arithmetic and comparison operators perform element-wise operations. There are times we want to check if all the items fulfill certain criteria or if there's any item that fulfill certain criteria. For this, NumPy provides .all and .any for an `ndarray`. The two methods also accept the argument axis.

In [23]:
import numpy as np

narr = np.array([[1,5,3], [4,2,6]])
threshold = 3
morethan = narr > threshold
print("morethan array: ")
print(morethan)
print("")
print(f"All more than {threshold}? {morethan.all()}")
print(f"Any more than {threshold}? {morethan.any()}")

morethan array: 
[[False  True False]
 [ True False  True]]

All more than 3? False
Any more than 3? True


### 3.2.3.4 Sort
`ndarray`.sort provides the function to sort the array in place. This means that the array itself will be modified in the process of sorting. The argument axis also applies to this function. By default, the sort is applied along the last axis (inner-most).

In [24]:
import numpy as np

narr = np.array([[1,5,3], [4,2,6]])
print(narr)

narr.sort()
print(narr)

[[1 5 3]
 [4 2 6]]
[[1 3 5]
 [2 4 6]]


> What are the outputs of different axes?

To not sort the array in-place, we could use numpy.sort to return a sorted array as the output instead of modify the array in-place.

In [25]:
import numpy as np

narr = np.array([[1,5,3], [4,2,6]])
print(f"narr = \n{narr}")

sortednarr = np.sort(narr)
print(f"\nsorted narr = \n{sortednarr}")
print(f"\nnarr = \n{narr}")

narr = 
[[1 5 3]
 [4 2 6]]

sorted narr = 
[[1 3 5]
 [2 4 6]]

narr = 
[[1 5 3]
 [4 2 6]]


### 3.2.3.5 Reshape
Occasionally we need to change the shape of an array. We can use `ndarray.reshape` or ndarray.resize to achieve it. ndarray.reshape returns an array with the new shape, whereas ndarray.resize changes the array in place.

In [26]:
import numpy as np

narr = np.array([[1,2,3],[4,5,6]])
newnarr = narr.reshape((3,2))
print("After reshape")
print("narr = ")
print(narr)
print("newnarr = ")
print(newnarr)

narr.resize((3,2))
print("\nAfter resize")
print("narr = ")
print(narr)

After reshape
narr = 
[[1 2 3]
 [4 5 6]]
newnarr = 
[[1 2]
 [3 4]
 [5 6]]

After resize
narr = 
[[1 2]
 [3 4]
 [5 6]]


If there is not enough item to fill the new array, the additional slots will be filled with default value depending on the data type.

In [27]:
import numpy as np

narr = np.array([[1,2,3],[4,5,6]])
narr.resize((2,4))
print("After resize")
print("narr = ")
print(narr)

After resize
narr = 
[[1 2 3 4]
 [5 6 0 0]]


If there is not enough slot to fit all items in the new array, the additional items will be removed.

In [28]:
import numpy as np

narr = np.array([[1,2,3],[4,5,6]])
narr.resize((2,2))
print("After resize")
print("narr = ")
print(narr)

After resize
narr = 
[[1 2]
 [3 4]]


In both cases, no exception or warning will be raised. Therefore we must be cautious in performing reshape or resize.

