<center><img src=img/MScAI_brand.png width=70%></center>

# Numpy: Multidimensional Arrays and Fancy Indexing

### Multidimensional arrays


A 2-dimensional array is like a matrix in maths. We can again make a 2-dimensional array in several ways, e.g. using functions like `np.ones`:

In [3]:
import numpy as np
M = np.ones((3, 3))
print(M)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


Here, we must pass in the *shape* (not just the length). The shape is a tuple of integers.

By the way, we can write `M` on its own in IPython or Jupyter Notebook, and we'll see the output:

In [7]:
M

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

But writing `print(M)` gives a slightly nicer output:

In [8]:
print(M)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


We can also pass in a list of lists to make a two-dimensional array composed of the elements of those lists:

In [9]:
X = [[1.0, 2.0], 
     [3.0, 4.0]] # just a list of lists
X = np.array(X)
print(X)

[[1. 2.]
 [3. 4.]]


Again, we can do operations on each element:

In [10]:
X + 10

array([[11., 12.],
       [13., 14.]])

Of course, `X` has not been altered by this:

In [11]:
X

array([[1., 2.],
       [3., 4.]])

We can find out the shape of an array:

In [12]:
X.shape

(2, 2)

### Fancy Indexing

When we studied lists, they were always one-dimensional: 
```python
L = [4, 5, 6]
```
It requires just one integer to index `L`. 

(Ok, we can make *lists of lists*, but by the substitution model we only deal with one dimension at a time: `L = [[4, 5, 6], [7, 8, 9]]` and `L[0][0] == 4`.) 

We also saw list *slices*.

As we have seen, we can use indexing and slicing on 1D Numpy arrays, just like with lists. But for multidimensional arrays, we access elements using not a single number but a `tuple`.

```
Operation | Python list | Numpy array
----------|-------------|------------
Index     | L[int]      | a[tuple]
Shape     | len(L)      | a.shape
```

This reflects the move from a single number (length of a list) to a `tuple` (*shape* of an array). We can combine indexing and slicing in that tuple arbitrarily. That is called *fancy indexing*.

In [13]:
print(X)
X[0, 1] # equivalent to X[(0, 1)]

[[1. 2.]
 [3. 4.]]


2.0

We can also extract a single row:

In [14]:
X[0, :]

array([1., 2.])

Here `0` means "the first row" and `:` means "all columns" -- like writing `[:]` to mean "all elements" in a list slice, which we saw when studying lists. So we get all elements of the first row.

We get a single column in a similar way, and notice that it now "looks like a row".

In [15]:
X[:, 0]

array([1., 3.])

As the logical conclusion of this, we'll see expressions such as `X[:, :]`. You'll say wait, that's not a `tuple`! But it is `:, :` is effectively a `tuple` of slices, where each slice omits both start and end values, so they take on default values.

In [18]:
print(X[:, :])

[[1. 2.]
 [3. 4.]]


In [26]:
print(X[0:X.shape[0], 0:X.shape[1]])

[[1. 2.]
 [3. 4.]]


**Exercise**: what does `X[0]` mean and why?

### Reshaping

If we have, say an array of 10 numbers, we can ask for it to be reshaped as 5$\times$2 or 2$\times$5. The numbers aren't changed -- it's just the *shape* of the table they are presented in:

In [11]:
X = np.array(range(10))
print(X)

[0 1 2 3 4 5 6 7 8 9]


In [12]:
print(X.reshape((5, 2)))

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


In [13]:
print(X.reshape((2, 5)))

[[0 1 2 3 4]
 [5 6 7 8 9]]


In [14]:
print(X.reshape((10, 1)))

[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]


In [15]:
print(X.reshape((1, 10)))

[[0 1 2 3 4 5 6 7 8 9]]


In [16]:
print(X.reshape((10,))) # a tuple of one element: (10,)

[0 1 2 3 4 5 6 7 8 9]


Notice that `X` itself has not been altered:

In [17]:
print(X)

[0 1 2 3 4 5 6 7 8 9]


A related idea is transposing, done using `.T`:

In [28]:
X = np.array([[1.0,  2.0,   3.0], 
              [4.0,  5.0,   6.0]])
print(X)
print("") # put a blank line for clarity
print(X.T)

[[1. 2. 3.]
 [4. 5. 6.]]

[[1. 4.]
 [2. 5.]
 [3. 6.]]


**Exercise**: If we have a 2D array and transpose it twice, what happens?


### Matrix multiplication

As we know, many Numpy operations work *per-element*. A binary operation like `*` is straightforward when the two inputs are the same shape:

In [4]:
X = np.array([[1.0,  2.0,   3.0], 
              [4.0,  5.0,   6.0]])
Y = np.array([[0.01, 0.1,   1.0], 
              [10.0, 100.0, 1000.0]])
print(X * Y) 

[[1.e-02 2.e-01 3.e+00]
 [4.e+01 5.e+02 6.e+03]]


That is sometimes called the *element-wise* product or *Hadamard product*. It requires the arrays to have the same size and shape. 

### Broadcasting


Numpy also allows multiplication or other functions to work when the two array shapes are *broadcastable*. "the smaller array is 'broadcast' ('reused') across the larger array so that they have compatible shapes". This works as long as corresponding dimensions are *equal*, or one of them is equal to 1, or not present. We line the shapes up "right-aligned":

```
A      (2d array):  2 x 4
B      (1d array):      4
Result (2d array):  2 x 4
```

In [35]:
A = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]); print(A.shape)
B = np.array([10, 11, 12, 13]); print(B.shape)
C = A + B
print(C); print(C.shape)

(2, 4)
(4,)
[[11 13 15 17]
 [15 17 19 21]]
(2, 4)


A contrived example:
```
A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5
```

In [36]:
A = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]); print(A.shape)
B = np.array([10, 11]); print(B.shape)
C = A + B

(2, 4)
(2,)


ValueError: operands could not be broadcast together with shapes (2,4) (2,) 

```
A      (2d array):  2 x 4
B      (1d array):      2
Result           :  incompatible
```

In [38]:
A = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]); print(A.shape)
B = np.array([10, 11]).reshape(2, 1); print(B.shape)
C = A + B
print(C)
print(C.shape)

(2, 4)
(2, 1)
[[11 12 13 14]
 [16 17 18 19]]
(2, 4)


```
A      (2d array):  2 x 4
B      (1d array):  2 x 1
Result (2d array):  2 x 4
```

The full rules: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

There is also true matrix multiplication, where the two matrices must be of *compatible* shapes (not the same thing). In the following case they are $3\times 2$ and $2\times 2$:

In [41]:
X = np.array([[1.0, 2.0], 
              [3.0, 4.0], 
              [5.0, 6.0]])
Y = np.array([[1.0, 10.0], 
              [100.0, 1000.0]])
print(X.shape)
print(Y.shape)
C = X @ Y
print(C)
print(C.shape)
print(np.dot(X, Y))

(3, 2)
(2, 2)
[[ 201. 2010.]
 [ 403. 4030.]
 [ 605. 6050.]]
(3, 2)
[[ 201. 2010.]
 [ 403. 4030.]
 [ 605. 6050.]]


### Higher-dimensional arrays

So far we have seen 1D and 2D arrays and they are the most common. However, higher dimensions are also common in applications.

A 3D array could arise as a single image: width $\times$ height $\times$ channels. 

A typical image has three channels (red, green, blue). The value at location (0, 0, 0) represents the intensity of the red channel at the top-left.

Some .png images have four channels (red, green, blue, alpha), where alpha represents transparency. Greyscale images have just one channel.

A 3D array could also arise by representing a *volume*:
<center><img src=img/numpy_3d.png width=50%></center>
<font size=1>https://ipython-books.github.io/13-introducing-the-multidimensional-array-in-numpy-for-fast-array-computations/</font>

* Weather simulation
* 3D grid of cells
* each cell stores temperature in a 100m $\times$ 100m $\times$ 100m cube
* value stored at array location (0, 0, 0): temperature at highest altitude at the south-west corner of the simulation area.

In [4]:
shape = (3, 4, 2)
X = np.random.random(shape); print(X)

[[[0.07070677 0.84423054]
  [0.08656932 0.55130132]
  [0.99199969 0.13164892]
  [0.7650867  0.66715329]]

 [[0.38941812 0.1506887 ]
  [0.64156382 0.48095295]
  [0.90036734 0.79358824]
  [0.68364997 0.50270988]]

 [[0.43883143 0.47931998]
  [0.45710398 0.24783698]
  [0.32377639 0.89628352]
  [0.00986422 0.39127776]]]


It's an unfortunate fact that printing 3D or higher data is ugly on a 2D screen. Numpy just prints out 2D at a time as shown. Think of this as a cube, sliced up. Look carefully at the brackets.

**Exercise**: given `X` as above, which value is stored at `X[0, 2, 1]`?

In [7]:
X[0, 2, 1]

0.13164891620764096

4D and higher are also possible. Colour video footage is effectively 4D: width $\times$ height $\times$ channels $\times$ time.

4D also arises when training neural network models with image data. Each image in the dataset is 3D: width $\times$ height $\times$ channels. 

But typical training algorithms process a *batch* of (e.g.) 10 images at a time. 

Consider a dataset `X` representing 10,000 square images each of 100$\times$100 pixels, each with 3 channels. Shape would be `(100, 100, 3, 10000)`. Typical code:
```python
for i in range(0, 10000, 10): # step=10
    NN.train(X[:, :, :, i:i+10])
```

### Exercises

**Exercise**: what shape would we see with the following data?

* An old, greyscale CCTV camera which shoots 1 frame of 100$\times$100 pixels every second, for 1 minute.

**Exercise**: what shape would we see with the following data?

* A set of 4 temperature sensors, each recording a value once every  day for a year.

**Exercise**: Given the data `X`, use fancy indexing to extract sub-arrays on the next slide:

In [22]:
X = np.array([[11, 12, 13, 14], 
              [21, 22, 23, 24], 
              [31, 32, 33, 34], 
              [41, 42, 43, 44]])
print(X)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]
 [41 42 43 44]]


```python
[31 32 33 34] # observe this is 1D

[[31 32 33 34]] # observe this is 2D, of shape (1, 4)

[13 23 33 43]

[[11 12 13 14]
 [21 22 23 24]]

[[22 23]
 [32 33]]
```

### Solutions

* Greyscale CCTV => 1 channel
* 100$\times$100 (width and height)
* 1 per second, 60 seconds => 60
* => 1$\times$100$\times$100$\times$60
* That '1' is superfluous, so: 100$\times$100$\times$60


* 4 temperature sensors: 1 dimension of length 4. Not 4D!
* One value once every day for a year
* => 365$\times$4

In [23]:
X[2, :]

array([31, 32, 33, 34])

In [24]:
X[2:3, :]

array([[31, 32, 33, 34]])

In [25]:
X[:, 2]

array([13, 23, 33, 43])

In [26]:
X[:2, :]

array([[11, 12, 13, 14],
       [21, 22, 23, 24]])

In [27]:
X[1:3, 1:3]

array([[22, 23],
       [32, 33]])