# Arrays and NumPy

NumPy is a python library that can be used to manipulate arrays in similar ways to how MATLAB does it. It is used to facilitate the dealing with tools of linear algebra - vectors and matrices (here referred to as **arrays**).

## 1. What are arrays?

An array is a multi dimensional grid of data. Tables in Microsoft Excel can be thought of as arrays with dimensions 
`r × c × p` corresponding to `r` rows `c` columns and `p` pages. In numpy it is similar, but you can have multi-dimensional arrays, with more than three dimensions. The figures below illustrates arrays of different dimensions.

<img src="figA.jpeg",width=600>

There are many good reasons why one might want to use arrays. For example, for images and stoichiometric information of a collection of reactions are naturally stored as arrays (see the image below).

An image as an array, with each entry being a pixel value: <img src="figC.png">

Stoichiometric matrix with each entry describing how a reactant is affected by a reaction: <img src="figD.png">

## 2. Create arrays using NumPy

The first thing to do is to import the library:

In [None]:
import numpy as np 

In NumPy, arrays are lists, except that all the entries have to be of the same _data type_ (int8, int32, float, boolean, etc). 

Creating NumPy arrays is done using the same syntax as for a regular Python list andn in addition, using the NumPy function `array` as illustrated below. This function converts a list into a one-dimensional array, a list of lists in to a two-dimensional array, a list of list that is a list of lists into a three-dimensional array, etc. The class of such objects is `numpy.ndarray`:

In [None]:
# defining arrays
row = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) # a row vector, also called a 1x9 array
col = np.array([[5],  # a column vector, also called a 9x1 array
                [6], 
                [3], 
                [-1], 
                [6], 
                [9], 
                [2],
                [5],
                [5]])
mat = np.array([[1, 2, 3, 4, -1, -2, -3, -4], [5, 6, 7, 8, -5, -6, -7, -8]]) # a 2x8 matrix or array

# display the results
print 'row:\n', row, '\nhas', len(row.shape), 'dimension(s) ;', 'is an object of class', type(row), '; with data type', row.dtype, '\n'
print 'col:\n', col, '\nhas', len(col.shape), 'dimension(s) ;', 'is an object of class', type(col), '; with data type', col.dtype, '\n'
print 'mat:\n', mat, '\nhas', len(mat.shape), 'dimension(s) ;', 'is an object of class', type(mat), '; with data type', mat.dtype, '\n'

Note the use of the funcion `shape`, to show the number of entires in each dimension, the function `type` to get the _variable type_ (i.e. a numpy array) and `dtype` for the _data type_ (int64, float, boolean, etc).

In more practical situations, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays. For example, the function `zeros` creates an array full of zeros, the function `ones` creates an array full of ones, and the function `empty` creates an array whose initial content is random and depends on the state of the memory. For these you need to specify the dimensions sizes. Here are some examples of how to use such functions:

In [None]:
# Equally spaced arrays and sequences of numbers
simple1 = np.arange(1,10,1) # from 1 to 9, increasing with a step of 1
simple2 = np.arange(1,10,2) # from 1 to 9, increasing with a step of 2
simple3 = np.arange(10,1,-1) # from 10 to 2, decreasing with a step of 2
simple4 = np.arange( 0, 2, 0.3 ) # it accepts float arguments
simple5 = np.linspace( 0, 2, 9 )  # 9 numbers from 0 to 2

# Arrays with zeros, ones and empty (random) entries - here you need to specify the sizes of the dimensions (also known as axis in Python)
zeros = np.zeros((3,4))  # 3 rows, 4 columns
ones = np.ones((2,3,4)) # 2 of 3 rows and 4 columns
empty = np.empty((2,3))  # 2 rows, 3 columns

# display    
print "simple1:", simple1
print "simple2:", simple2
print "simple3:", simple3
print "simple4:", simple4
print "simple5:", simple5
print "zeros:\n", zeros
print "ones:\n", ones
print "empty:\n", empty

## 3. Manipulate arrays

### Arithmetic with Arrays: Addition, multiplication, substraction, division, powers.

Unlike in MATLAB, arithmetic operators on arrays apply *element-wise*. When an arithmetic operation is used on arrays, a new array is created and filled with the result.

In [None]:
# a is defined above
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
b = np.array([[5], [6], [3], [-1], [6], [9], [2], [5], [5]])

aa = a*a
print 'a = ', a
print 'element-element multiplication a*a:', aa, '(see figure)' # element-wise array multiplication 
print 'array multiplication a*b:', np.dot(a,b), '(see figure)' # array multiplication (see figure)
# NOTE: Python version 3.5+ has a built-in matrix multiplication operator: a @ b
print 'raise each element of a to the power of 2:', a^2  # raise each element of a to the power of 2
print 'divide each element of a by 2:', a/2 # divide each element of a by 2

print '\n Check by hand each of these; Look at the examples in the image below.'

Element-element array multiplication: <img src="figF.png">

Array multiplication: <img src="figE.png">

### Array methods

Arrays can be manipulated in other ways, apart from doing arithmetic on them. They can be reshaped, you can obtain information about them, etc. To do that, NumPy offers specific functions such as `len`, `shape` and `resize`. A quick reminder of the basics of using methods in Python:

Call methods with the syntax `variable.method()`. Provide arguments to methods in round brackets:

In [None]:
print np.max(A)

Store output by assigning it to a variable:

In [None]:
maxA = np.max(A)
print maxA

Connect multiple methods together. Method calls are evaluated from left to right, with each method operating on the output of the previous call:

In [None]:
b = np.arange(12).reshape(3,4)
print b

Remember that you can (and we enourage you to) look up the usage of these (and other) methods and function in [the numpy documentation](https://docs.scipy.org/doc/numpy/reference/index.html)! 

## 4. Index and slice arrays

### Slicing

Entries in arrays can be accessed by indicating the name of the array and the entry position in each dimension inside square brackets:

In [None]:
e = np.arange(10)**3  # define new array
print 'e = ', e, '\n'
print 'Third entry in e:', e[2], '\n' # access the third entry from left to right -> indexing
print 'Entries 2 to 5 in e:', e[2:5], '\n' # access entries 3 to 5 -> slicing

# recall b is defined above
print 'b = \n', b
print 'Entry in row 1, column 1 in b: ', b[1,1], '\n' # access entry in position 1,1 -> indexing
print 'Middle-bottom minor array of b: \n', b[1:3,1:-1] # access entries in the second and third rows and columns -> slicing

### Logical indexing:

You can use the logical operators `>, >=, <, <=, ==, |, &` (greater than, greater than or equal to, less than, less than or equal to, equal, or, and) to test entries in arrays, as follows:

In [None]:
print a
print a < 3.5 # gives the entry values of the elements of a that are less than 0.5.

This operation generates a new array of the same size as that being tested, but with entries either `False` or `True` depending on how they evaluate.

## 5. Numeric Types

Python provides some native object types - `int` and `float` - for storing numeric data. But `numpy` provides its own types, offering much more control over the properties of the data you're working with. For example, `numpy.int32` and `numpy.int64`, and `numpy.float32` and `numpy.float64` offer 32-bit, and 64-bit implementations of integer and float values respectively. By specifying which of these types you would like to store your data in, you can control the precision with which your data is represented, and the amount of memory that that data will consume.

_(Note: it's at this point that we should mention one of the other, more functional motivations for using `numpy` when working with numeric data. The array implementations in `numpy` are [in many cases much faster to operate on than the standard Python `list` objects, and require less memory to store](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists). One of the reasons for this is that they are designed to contain data of a single type - defined in the `dtype` attribute of the `numpy.array` object._

An important point to consider when working with integer data (as you often will be when handling image files) is the use of _signed_ vs _unsigned_ integers. These are implemented as `numpy.int*` and `numpy.uint*` objects respectively, and these is a subtle but important difference in the range of values that they can store. `numpy.int8` and `numpy.uint8` are 8-bit integers, allowing for 256 different values. The signed `int8` objects can hold values between -128 and 127, while the unsigned `uint8` objects can range between 0 and 255 - that is, they can't hold negative values. __Be careful with these different data types!__ Weird things can start to happen with your data if you handle these types inappropriately. For example, look at what happens when we try to subtract from an array of unsigned `uint8` integers:

In [None]:
signed = np.array([1, 3, 5, 7, 9, 11], dtype=np.uint8)
print signed
subtracted = signed - 4
print subtracted

You might have expected Python to raise an error when attempting to create a value less than 0 (the lowest value that can be represented in an unsigned integer), but instead the values are 'cycled', with -1 becoming 255, -2 becoming 254 etc etc. The same will happen with signed integer values, and the behaviour extends to 16-, 32-, and 64-bit integers as well (though I do at least get a warning from numpy when I 'cycle' an unsigned 64-bit integer).

In [None]:
unsigned = np.array([-128, -103, 5, -127, 9, 127], dtype=np.int8)
print unsigned + 2

signed64 = np.int64(-9223372036854775808)
signed64 -= 1
print signed64

A complete list of `numpy` numeric types is available [here](https://docs.scipy.org/doc/numpy/user/basics.types.html).

## 6. Help and Documentation

You have already used many of the tools NumPy has on offer. You might want to read more about how those functions are used, and you should! 
To do so, you can type in the terminal:

`help(functionname)`

You can also check the documentation and search the name of the function there.

## 7. Exercises

__1.__ Create a `3 × 3` random integer array `A` and two `3 × 1` integer vectors `a` and `b`. 

__2.__ Multiply `a` by the scalar 5 and name this new vector `c`.

__3.__ Compute the element-wise product of `a` and `b`. What do you get?

__4.__ What do you get for `A[1,2]`, `A[:3]`, `A[0:2, 0:2]`?

__5.__ Replace the second column of `A` with `b` (Hint: use indexing).

__6.__ Extract the following from `A`:
1. row 2, column 1 
2. row 3, all columns
3. rows 2,3 columns 2,3

__7.__ Compute the (mathematical) array product of `A` and `b`. What do you get? Can you do the element-wise product? Why/why not?

__8.__ Concatenate `b` with itself 3 times to get a `3 × 3` array `B`. Use functions `vstack` and `hstack`.

__9.__ Multiply `A` and `B` element-wise and assign the result to a new variable C.

__10.__ Use the function `shape` to save the dimensions of C in rC and cC. If necessary, use the documentation.

__11.__ Use help to get information about `len` - how does it differ from `shape`?

__12.__ Delete the first row of `C`.

__13.__ What are the dimensions of this new array?

__14.__ Find the elements of `C` that are less than `5`.

__15.__ Create a `24 × 3` matrix `Q`.

__16.__ Calculate minimum, maximum, mean and standard deviation of each column of `Q`. Use help for find out about the functions min, max, mean and std.

... don't look at the solutions until you're done! ...  
.  
.  
.  
.  
.  
.  
.  
... or _really_ stuck! ...  
.  
.  
.  
.  
.  
.  
... did you try checking the [numpy documentation](https://docs.scipy.org/doc/numpy/reference/index.html) first? ...  
.  
.  
.  

## Solutions

In [None]:
# 1.
print '\n 1.'
A = np.random.randint(0,high=100,size=(3,3))
a = np.random.randint(0,high=100,size=(3,1))
b = np.random.randint(0,high=100,size=(3,1))
print 'A = \n', A
print 'a = \n', a
print 'b = \n', b

# 2.
print '\n 2.'
c = 3*a
print 'c = \n', c

# 3.
print '\n 3.'
print 'element-element product:', a*b

# 4. 
print '\n 4.'
print 'Entry 1,2: \n', A[1,2]
print A
print 'Second column: \n', A[:3,1]
print 'Top left minor array: \n', A[0:2, 0:2]

# 5. 
print '\n 5.'
#print A
#print A[:3,1]=b

# 6. 
print '\n 6.'
print 'A = \n', A
print 'row 2, column 1:\n', A[1,0]
print 'row 3, all columns:\n', A[2,:]
print 'rows 2,3 columns 2,3:\n', A[1:2,1:2]

# 7. 
print '\n 7.'
print np.dot(A,b)

# 8. 
print '\n 8.'
B = np.hstack([b,b,b])
print B

# 9. 
print '\n 9.'
C = A*B
print C

# 10. 
print '\n 10.'
Cr, Cc = np.shape(C)
print 'Cr = ', Cr
print 'Cc = ', Cc

# 11. 
print '\n 11.'

# 12. 
print '\n 12.'

# 13. 
print '\n 13.'

# 14. 
print '\n 14.'

# 15. 
print '\n 15.'
Q = np.random.randint(0,high=100,size=(24,3))
print Q

# 16. 
print '\n 16.'
Qmax = np.max(Q)
Qmin = np.min(Q)
Qmean = np.mean(Q)
Qstd = np.std(Q)
print Qmax
print Qmin
print Qmean
print Qstd

#### References

https://docs.scipy.org/doc/numpy-dev/user/quickstart.html