In [1]:
from __future__ import print_function # print function for Python 2/3 compatibility

# Pre-tutorial on Arrays and NumPy

This tutorial gives an introduction to arrays and NumPy that are used for making the programming assignments for the Digital Image Processing course. Please note that this notebook has been modified from [1].

NumPy is a python library that can be used to manipulate arrays in similar ways to how MATLAB does it. Vectors and matrices (here referred to as **arrays**).

If you are familiar with MATLAB, there are several NumPy __[tutorials](https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html)__ and __[reference cheat sheets](http://mathesaurus.sourceforge.net/matlab-numpy.html)__ for MATLAB users.

## 1. What are arrays?

An array is a multi dimensional grid of data. Tables in Microsoft Excel can be thought of as arrays with dimensions `r × c × p` corresponding to `r` rows `c` columns and `p` pages. In NumPy, it is similar but you can have multi-dimensional arrays, with more than three dimensions. The figures below illustrates arrays of different dimensions: <img src="figA.jpeg">

There are many good reasons why one might want to use arrays. For example, images are naturally stored as arrays.

An image as an array, with each entry being a pixel value: <img src="figC.png">

## 2. Create arrays using NumPy

The first thing to do is to import the library:

In [2]:
import numpy as np 

### Numpy arrays

In NumPy, arrays are lists, except that all the entries have to be of the same _data type_ (`int8`, `int32`, `float`, `boolean`, etc). 

NumPy arrays can be created using the same syntax as for a regular Python list and, in addition, using the NumPy function `array` as illustrated below. This function converts a list into a one-dimensional array, a list of lists into a two-dimensional array, a list of list that is a list of lists into a three-dimensional array, etc. The variable class of such objects is `numpy.ndarray`:

In [3]:
# defining arrays
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) # a row vector, also called a 1x9 array
b = np.array([[5], [6], [3], [-1], [6], [9], [2], [5], [5]]) # a column vector, also called a 9x1 array
A = np.array([[1, 2, 3, 4, -1, -2, -3, -4], [5, 6, 7, 8, -5, -6, -7, -8]]) # a 2x8 matrix or array

# display the results
print('a = ', a)
print('a has', len(a.shape), 'dimension(s)', 'and is a variable of class', type(a), 'and data type', a.dtype, '\n')
print('b = \n',  b)
print('b has', len(b.shape), 'dimension(s)', 'and is a variable of class', type(b), 'and data type', b.dtype, '\n')
print('A = \n',  A)
print('A has', len(A.shape), 'dimension(s)', 'and is a variable of class', type(A), 'and data type', A.dtype, '\n')

a =  [1 2 3 4 5 6 7 8 9]
a has 1 dimension(s) and is a variable of class <type 'numpy.ndarray'> and data type int64 

b = 
 [[ 5]
 [ 6]
 [ 3]
 [-1]
 [ 6]
 [ 9]
 [ 2]
 [ 5]
 [ 5]]
b has 2 dimension(s) and is a variable of class <type 'numpy.ndarray'> and data type int64 

A = 
 [[ 1  2  3  4 -1 -2 -3 -4]
 [ 5  6  7  8 -5 -6 -7 -8]]
A has 2 dimension(s) and is a variable of class <type 'numpy.ndarray'> and data type int64 



One can use the function `shape` to show the array dimensions, the function `type` to get the _variable type_ (i.e. a numpy array) and `dtype` for the _data type_ (`int64`, `float`, `boolean`, etc):

In [4]:
print(A.shape) # shape of A is 2x8

rA, cA = A.shape
print('rows =', rA)    # number of rows
print('columns =', cA) # and columns

(2, 8)
rows = 2
columns = 8


### Array creation routines

NumPy has built-in functions for __[creating arrays](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html)__ from scratch.

For instance, it is often useful to create equally spaced arrays and sequences of numbers. Instead of creating this kind of arrays by manually typing all the elements (like in the example above), there are more efficient ways like __[`arange()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html#numpy.arange)__ and __[`linspace()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#numpy.linspace)__ functions:

In [5]:
# Equally spaced arrays and sequences of numbers
c1 = np.arange(1, 10, 1)    # from 1 to 9, increasing with a step of 1
c2 = np.arange(1, 10, 2)    # from 1 to 9, increasing with a step of 2
c3 = np.arange(10, 1, -2)   # from 10 to 2, decreasing with a step of 2
c4 = np.arange( 0, 2, 0.3)  # with float arguments 
c5 = np.linspace( 0, 2, 9)  # 9 numbers from 0 to 2
    
print('c1 = ', c1)
print('c2 = ', c2)
print('c3 = ', c3)
print('c4 = ', c4)
print('c5 = ', c5)

c1 =  [1 2 3 4 5 6 7 8 9]
c2 =  [1 3 5 7 9]
c3 =  [10  8  6  4  2]
c4 =  [ 0.   0.3  0.6  0.9  1.2  1.5  1.8]
c5 =  [ 0.    0.25  0.5   0.75  1.    1.25  1.5   1.75  2.  ]


In many practical situations, the elements of an array are originally unknown but the size is known. Hence, NumPy offers several functions to create arrays. For instance, the function: 
* __[`zeros()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html#numpy.zeros)__ creates an array full of zeros
* __[`ones()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html#numpy.ones)__ creates an array full of ones
* __[`empty()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.empty.html#numpy.empty)__ creates an array whose initial content is random and depends on the state of the memory
* __[`full()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html#numpy.full)__ creates an array full of fill values. 

Only the shape of the new array needs to be specified, while e.g. data type is optional. Below are some examples of how to use such functions:

In [6]:
# Arrays with zeros, ones, random and sime fixed value entries - here you need to specify the sizes of the dimensions (also known as axis in Python)
d1 = np.zeros((3, 4))        # 3 rows, 4 columns
d2 = np.ones((2, 3, 4))      # 2 of 3 rows and 4 columns
d3 = np.empty((2, 3))        # 2 rows, 3 columns
d4 = np.full((3,3), np.inf)  # 3 rows, 3 columns

print('d1 = ',  d1, '\n')
print('d2 = ',  d2, '\n')
print('d3 = ',  d3, '\n')
print('d4 = ',  d4, '\n')

d1 =  [[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]] 

d2 =  [[[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]

 [[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]] 

d3 =  [[ 0.3  0.6  0.9]
 [ 1.2  1.5  1.8]] 

d4 =  [[ inf  inf  inf]
 [ inf  inf  inf]
 [ inf  inf  inf]] 



Alternatively, the shape and type can be copied from a given input array:

In [7]:
c1 = np.zeros_like((d1))       # shape and type copied from d1
c2 = np.ones_like(d2)          # shape and type copied from d2
c3 = np.empty_like(d3)         # shape and type copied from d3
c4 = np.full_like(d4, np.inf)  # shape and type copied from d4

print('c1 = ',  c1, '\n')
print('c2 = ',  c2, '\n')
print('c3 = ',  c3, '\n')
print('c4 = ',  c4, '\n')

c1 =  [[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]] 

c2 =  [[[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]

 [[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]] 

c3 =  [[ 0.  0.  0.]
 [ 0.  0.  0.]] 

c4 =  [[ inf  inf  inf]
 [ inf  inf  inf]
 [ inf  inf  inf]] 



### Creating a copy of an array

When making a copy of a NumPy array and please use __[`copy()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.copy.html#numpy.copy)__, i.e. note the difference between the following array assignments:

In [8]:
x = np.array([[1, 2, 3], [4, 5, 6]]) # create a 2D array x
print('original x =\n', x, '\n')

y = x # create a reference y of x
print('reference y =\n', y, '\n')

z = np.copy(x) # create a copy z
print('copy z =\n', z, '\n')

x[0,:] = 10 # assign 10 on first row

print('The three arrays after assigning x[0,:] = 10:\n')
print('original x =\n', x, '\n')
print('reference y =\n', y, '\n')
print('copy z =\n', z, '\n')

original x =
 [[1 2 3]
 [4 5 6]] 

reference y =
 [[1 2 3]
 [4 5 6]] 

copy z =
 [[1 2 3]
 [4 5 6]] 

The three arrays after assigning x[0,:] = 10:

original x =
 [[10 10 10]
 [ 4  5  6]] 

reference y =
 [[10 10 10]
 [ 4  5  6]] 

copy z =
 [[1 2 3]
 [4 5 6]] 



## 3. Manipulating arrays with NumPy

### Arithmetic operations on arrays

Arithmetic operators on NumPy arrays apply *element-wise*. For instance, `*` means element-wise multiplication, and the **dot** function is used for matrix multiplication. (In Matlab, `*` means matrix multiplication, and `.*` is used for matrix multiplication.)

In [9]:
# Create some arrays a, b, c and d
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[3, 19, 20], [9, 16, 14]])
c = np.array([[7, 8], [9, 10], [11, 12]])
d = np.arange(1., 10., 1)

print('a = \n', a)
print('b = \n', b)
print('c = \n', c)
print('d = \n', d, '\n')
print('element-element multiplication a*b (see figure): \n', a*b, '\n') # element-wise array multiplication 
print('matrix multiplication a*c (see figure): \n', np.dot(a,c), '\n') # matrix multiplication
print('raise each element of d to the power of 2: \n', d**2, '\n')  # raise each element of a to the power of 2
print('divide each element of d by 2: \n', d/2, '\n') # divide each element of a by 2

a = 
 [[1 2 3]
 [4 5 6]]
b = 
 [[ 3 19 20]
 [ 9 16 14]]
c = 
 [[ 7  8]
 [ 9 10]
 [11 12]]
d = 
 [ 1.  2.  3.  4.  5.  6.  7.  8.  9.] 

element-element multiplication a*b (see figure): 
 [[ 3 38 60]
 [36 80 84]] 

matrix multiplication a*c (see figure): 
 [[ 58  64]
 [139 154]] 

raise each element of d to the power of 2: 
 [  1.   4.   9.  16.  25.  36.  49.  64.  81.] 

divide each element of d by 2: 
 [ 0.5  1.   1.5  2.   2.5  3.   3.5  4.   4.5] 



Element-element array multiplication: <img src="figF.png">

Matrix multiplication: <img src="figE.png">

### Array functions

Arrays can be manipulated in other ways, apart from doing arithmetic operations on them. NumPy offers specific functions e.g. for __[array manipulation](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html)__, like changing array shape: 

In [10]:
b = np.arange(20) # vector of uniformly sampled integer values from 0 to 19
print('original 1x20 row vector b: \n', b, '\n')

b = np.reshape(b, (4, 5)) # a new shape to an array without changing its data, e.g. 1x20 to 
print('b reshaped into 4x5 2D array: \n', b, '\n')

print('b reshaped back into 1x20 row vector using .flatten(): \n', b.flatten(), '\n') # the array collapsed into one dimension again with flatten

print('b reshaped back into 1x20 row vector using .ravel(): \n', b.ravel(), '\n') # or with ravel

original 1x20 row vector b: 
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19] 

b reshaped into 4x5 2D array: 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]] 

b reshaped back into 1x20 row vector using .flatten(): 
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19] 

b reshaped back into 1x20 row vector using .ravel(): 
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19] 



Note that various functions can be connected, like the array creation routines from the previous example:

In [11]:
b = np.arange(20).reshape(4,5)
print(b)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


Joining arrays:

In [12]:
c = np.random.randint(0, high=100, size=(3,1))
print('orignal 3x1 array c: \n', c, '\n')

C = np.hstack([c, c, c]) # concatenate 3x1 array c horizontally
print('c concatenated horizontally 3 times --> 3x3 array C: \n', C, '\n')

C = np.vstack([c.transpose(), c.transpose(), c.transpose()]) # concatenate transpose of array c vertically
print('the transpose of c concatenated vertically 3 times --> 3x3 array C: \n', C, '\n')

orignal 3x1 array c: 
 [[88]
 [16]
 [13]] 

c concatenated horizontally 3 times --> 3x3 array C: 
 [[88 88 88]
 [16 16 16]
 [13 13 13]] 

the transpose of c concatenated vertically 3 times --> 3x3 array C: 
 [[88 16 13]
 [88 16 13]
 [88 16 13]] 



Deleting sub-arrays, e.g. whole rows and columns:

In [13]:
D = np.random.randint(0, high=100, size=(5,5)) # create 10×10 random integer array D
print('original 2D array D: \n', D, '\n')

D = np.delete(D, 0, axis=0)  # delete first row of D
print('first row deleted from D: \n', D, '\n')

D = np.delete(D, -1, axis=1) # delete last column of D
print('last column deleted from D: \n', D, '\n')

original 2D array D: 
 [[31 99 52 21 16]
 [71 82 43 52 76]
 [90 89 33 98 51]
 [77 72 44 54 17]
 [60 20 78 83 48]] 

first row deleted from D: 
 [[71 82 43 52 76]
 [90 89 33 98 51]
 [77 72 44 54 17]
 [60 20 78 83 48]] 

last column deleted from D: 
 [[71 82 43 52]
 [90 89 33 98]
 [77 72 44 54]
 [60 20 78 83]] 



One can also apply __[mathematical functions](https://docs.scipy.org/doc/numpy/reference/routines.math.html)__ on array elements, e.g. calculate the exponential of all elements in the input array:

In [14]:
M = 5 # window size
std = 1 # standard deviation (sigma)

# create a Gaussian window
n = np.arange(0, M) - (M - 1.0) / 2.0
w = np.exp(-n**2 / (2 * std**2) )

print('n =', n, '\n')
print('Gaussian window w =\n', w)

n = [-2. -1.  0.  1.  2.] 

Gaussian window w =
 [ 0.13533528  0.60653066  1.          0.60653066  0.13533528]


Or obtain specific information about arrays, like __[statistics](https://docs.scipy.org/doc/numpy/reference/routines.statistics.html)__:

In [15]:
print('maximum value in array D =', np.max(D) )
print('minimum value in array D =', np.min(D) )
print('mean value in array D =', np.mean(D) )
print('standard deviation in array D =', np.std(D) )

maximum value in array D = 98
minimum value in array D = 20
mean value in array D = 65.375
standard deviation in array D = 21.776922992


It is important to note that many of these functions operate on the whole array or along some specified dimension by default. For instance, function  `max` can be used to return the maximum of an array or maximum along an axis. By default, flattened input is used, i.e. the function returns the maximum of the whole array:

In [16]:
print('D = \n', D, '\n')
print('D flattened = \n', D.flatten(), '\n') # A flattened
print('maximum value in array D =', np.max(D) ) # max in array
print('maximum value in each column of D =', np.max(D, axis=0) ) # max in each column
print('maximum value in each row of D =', np.max(D, axis=1) ) # max in each row

D = 
 [[71 82 43 52]
 [90 89 33 98]
 [77 72 44 54]
 [60 20 78 83]] 

D flattened = 
 [71 82 43 52 90 89 33 98 77 72 44 54 60 20 78 83] 

maximum value in array D = 98
maximum value in each column of D = [90 89 78 98]
maximum value in each row of D = [82 98 77 83]


These things are also crucial to understand in order to write vectorized code and avoid useless for loops, i.e. in many cases, **one does not have to iterate through all individual elements or rows/columns in order to perform a specific operation on them**.

Remember that you can (and are encouraged to) look up the use of these (and other) functions from the NumPy documentation, including __[array manipulation](https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html)__, __[mathematical functions](https://docs.scipy.org/doc/numpy/reference/routines.math.html)__ and __[statistics](https://docs.scipy.org/doc/numpy/reference/routines.statistics.html)__, for instance.

## 4. Index and slice arrays

### Slicing

As is standard for Python, NumPy uses 0 (zero) based indexing as well. Entries in arrays can be accessed by indicating the name of the array and the entry position in each dimension inside square brackets: 

In [17]:
e = np.arange(10)**3  # define new array
print('e = ', e, '\n')
print('Third entry in e:', e[2], '\n') # access the third entry from left to right -> indexing
print('Entries 2 to 5 in e:', e[2:5], '\n') # access entries 3 to 5 -> slicing
print('Every second entry from 2 to 8 in e:', e[1:8:2], '\n') # access every second entry from 2 to 8 -> slicing

e =  [  0   1   8  27  64 125 216 343 512 729] 

Third entry in e: 8 

Entries 2 to 5 in e: [ 8 27 64] 

Every second entry from 2 to 8 in e: [  1  27 125 343] 



In the case of 2D arrays, always specify the row first and column second (row, col):

In [18]:
# recall b is defined above
b = np.arange(20).reshape(4,5)

print('b = \n', b, '\n')
print('Entry in row 3, column 2 in b:', b[2,1], '\n') # access entry in position 2,1 -> indexing
print('"Middle-bottom" minor array of b: \n', b[-2:,1:-1], '\n') # access entries in the second and third rows and columns -> slicing
print('Second column of b: \n', b[:,1], '\n')
print('Third row of b: \n', b[2,:], '\n')
print('First two rows and columns of b: \n', b[:2, :2], '\n')
print('Last two rows and columns of b:\n', b[-2:,-2:], '\n')

b = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]] 

Entry in row 3, column 2 in b: 11 

"Middle-bottom" minor array of b: 
 [[11 12 13]
 [16 17 18]] 

Second column of b: 
 [ 1  6 11 16] 

Third row of b: 
 [10 11 12 13 14] 

First two rows and columns of b: 
 [[0 1]
 [5 6]] 

Last two rows and columns of b:
 [[13 14]
 [18 19]] 



Replace the second column and last row of `A` with `b` using indexing:

In [19]:
# create some arrays A and b
A = np.random.randint(0,high=100,size=(3,3))
b = np.random.randint(0,high=100,size=(3,1))

print('A = \n', A, '\n')
print('b = \n', b, '\n')

A[:,1] = b.flatten() # replace second column of A with b
print('Second column of A replaced with b = \n', A,  '\n')

A[-1,:] = b.flatten() # replace last row of A with b
print('Last row of A replaced with b = \n', A, '\n')

A = 
 [[72  6 99]
 [79 60 55]
 [89 25 63]] 

b = 
 [[ 7]
 [98]
 [67]] 

Second column of A replaced with b = 
 [[72  7 99]
 [79 98 55]
 [89 67 63]] 

Last row of A replaced with b = 
 [[72  7 99]
 [79 98 55]
 [ 7 98 67]] 



For 2D multichannel arrays, like RGB or HSV images, the additional last coordinate would specify the channel (row, col, ch).

### Logical indexing:

You can use the logical operators `>, >=, <, <=, ==, |, &` (greater than, greater than or equal to, less than, less than or equal to, equal, or, and) to test entries in arrays, as follows:

In [20]:
# some 2D array I
I = np.array([[17, 24, 1, 8, 15],
              [23, 5, 7, 14, 16],
              [4, 6, 13, 20, 22],
              [10, 12, 19, 21, 3],
              [11, 18, 25,  2,  9]])

L = I < 10 # boolean array size of I where True indicates elements of I < 10

print('I = \n', I, '\n')

print('I[row,col] < 10:\n', L, '\n') # gives the entry values of the elements of a that are less than 10.

print('Inverted boolean array, i.e. I[row,col] >= 10:\n', np.logical_not(L), '\n') # all other values

I = 
 [[17 24  1  8 15]
 [23  5  7 14 16]
 [ 4  6 13 20 22]
 [10 12 19 21  3]
 [11 18 25  2  9]] 

I[row,col] < 10:
 [[False False  True  True False]
 [False  True  True False False]
 [ True  True False False False]
 [False False False False  True]
 [False False False  True  True]] 

Inverted boolean array, i.e. I[row,col] >= 10:
 [[ True  True False False  True]
 [ True False False  True  True]
 [False False  True  True  True]
 [ True  True  True  True False]
 [ True  True  True False False]] 



This operation generates a new array of the same size as that being tested but with entries either `False` or `True` depending on how they evaluate. The resulting logical array can be used *access* elements of another array. For instance, one can list or operate on all the elements of the array that meet the condition:

In [21]:
print('list of values in array I > 10: \n', I[L], '\n') # lists all element values of a that are less than 10.

print('list of values in array I <= 10: \n', I[np.logical_not(L)], '\n') # lists all other element values.

J = np.zeros_like(I) # J is zero matrix size of I

J[L] = I[L] # copy all the elements from I that are less than 10 to J to corresponding locations

print(J, '\n')

J[J==0] = -1 # set all values of J equal to zero to -1

print(J)

list of values in array I > 10: 
 [1 8 5 7 4 6 3 2 9] 

list of values in array I <= 10: 
 [17 24 15 23 14 16 13 20 22 10 12 19 21 11 18 25] 

[[0 0 1 8 0]
 [0 5 7 0 0]
 [4 6 0 0 0]
 [0 0 0 0 3]
 [0 0 0 2 9]] 

[[-1 -1  1  8 -1]
 [-1  5  7 -1 -1]
 [ 4  6 -1 -1 -1]
 [-1 -1 -1 -1  3]
 [-1 -1 -1  2  9]]


Again, also these things are also crucial to understand in order to write vectorized code and avoid useless for loops.

## 5. Help and Documentation

You have already used many of the tools NumPy has on offer. You might want to read more about how those functions are used, and you should! 
To do so, you can type in the terminal:

`help(functionname)`

Or, you can also check the documentation and search the name of the function there.

#### References

[1] https://github.com/karinsasaki/python-workshop-image-processing/blob/master/pre_tutorial/arrays-and-numpy.ipynb