# Python for High Performance Computing
# The <span style="font-family: Courier New, Courier, monospace;">NumPy</span> module
<hr style="border: solid 4px green">
<br>
<center> <img src="images/arc_logo.png"; alt="Logo" style="float: center; width: 20%"></center>
<br>
## http://www.arc.ox.ac.uk
## support@arc.ox.ac.uk

## <span style="font-family: Courier New, Courier, monospace;">NumPy</span> (**Num**erical **Py**thon)
<hr style="border: solid 4px green">

### Best of two worlds: fast development & fast execution
* the *de facto* standard for scientific computing in Python
* extends Python with `ndarray`, multi-dimensional arrays classes and methods
* allows for *efficient* and *fast* math calculations on arrays
  * operations "expressed" in `NumPy` are faster than pure Python
  * some functionality based on multi-threaded libraries (BLAS, LAPACK, FFTW)
* interfaces easily with C and Fortran code
  * bridge to legacy code
  * a way to write fast executing functions
<br><br>

### <span style="font-family: Courier New, Courier, monospace;">NumPy</span> is the cornerstone of scientific Python computing
* a lot of other packages are built on `NumPy` and rely on it for performance
* almost everything in this course depends on `NumPy`

## <span style="font-family: Courier New, Courier, monospace;">NumPy</span> (**Num**erical **Py**thon)
<hr style="border: solid 4px green">

### Essential <span style="font-family: Courier New, Courier, monospace;">NumPy</span> methods
* multidimensional arrays objects (`numpy.ndarray`)
<br><br>

### Other
* matrices and linear algebra operations
* random number generation
* Fourier transforms
<br><br>

> Complete documentation at https://www.numpy.org/

## <span style="font-family: Courier New, Courier, monospace;">NumPy</span> & <span style="font-family: Courier New, Courier, monospace;">SciPy</span>: significant overlap
<hr style="border: solid 4px green">

### <span style="font-family: Courier New, Courier, monospace;">SciPy</span>
* loads a lot of <span style="font-family: Courier New, Courier, monospace;">NumPy</span> functionality in its own namespace, *e.g.* `randn`, `fft`
* overrides the linear algebra and fftw routines from <span style="font-family: Courier New, Courier, monospace;">NumPy</span> with more sophisticated versions
<br><br>

### The overlap exists because of
* development history (<span style="font-family: Courier New, Courier, monospace;">NumPy</span> evolved from `numeric`)
* backward compatibility
<br><br>

### The good practice
* from `NumPy` use only the array classes and methods
* anything beyond that (*e.g.* linear algebra, fft, optimisation, ...) belongs to `SciPy`

## <span style="font-family: Courier New, Courier, monospace;">NumPy</span> *core* functionality: fast array operations
<hr style="border: solid 4px green">

### Overarching idea: specialised and optimised operations on arrays
* use the `numpy.ndarray` data type and associated methods
* get rid of loops by "vectorising operations"
<br><br>

### Features used in array-based computing
* ufuncs (**u**niversal **func**tion**s**)
* aggregations
* broadcasting
* slicing, masking, fancy indexing
* vectorisation

## <span style="font-family: Courier New, Courier, monospace;">NumPy</span> arrays
<hr style="border: solid 4px green">

### The standard Python library provides *lists* and *1D arrays* (<span style="font-family: Courier New, Courier, monospace;">array.array</span>)
* lists are general containers for objects
* arrays are 1D containers for objects of the same type
* functionality is limited
* large memory and performance overheads
<br><br>

### <span style="font-family: Courier New, Courier, monospace;">NumPy</span> provides *multidimensional arrays* (<span style="font-family: Courier New, Courier, monospace;">numpy.ndarray</span>)
* stores elements of the same data type in multiple dimensions
  * efficient storage (and memory access) of data (similar and compatible with Fortran/C/C++ arrays)
* a lot more functionality than standard library
  * efficient and fast mathematical operations
<br><br>

> Documentation: http://docs.scipy.org/doc/numpy1.10.0/reference/generated/numpy.ndarray.html

## <span style="font-family: Courier New, Courier, monospace;">NumPy</span> 1d arrays
<hr style="border: solid 4px green">

First, import `numpy` as alias `np` to shorten code and improve readability.

In [1]:
import numpy as np

## Creating a 1d array
<hr style="border: solid 4px green">

Example 1: array created from a list

In [2]:
a = np.array( [-1, 0, 1] )
print a
type (a)

[-1  0  1]


numpy.ndarray

## Creating a 1d array (cont'd)
<hr style="border: solid 4px green">

Example 2: array created from another array ("copy constructor").

In [3]:
b = np.array( a )
print b

[-1  0  1]


## Creating a 1d array (cont'd)
<hr style="border: solid 4px green">

`NumPy` arrays are of type `ndarray`

In [4]:
print type(b), b.dtype

<type 'numpy.ndarray'> int64


## Creating a 1d array (cont'd)
<hr style="border: solid 4px green">

Example 3: `arange` for arrays, just like using `range` for lists

In [5]:
a = np.arange(4)
print a
a = np.arange( -2, 6, 2 )
print a, a.dtype

[0 1 2 3]
[-2  0  2  4] int64


## Creating a 1d array (cont'd)
<hr style="border: solid 4px green">

Example 4: use `linspace` to create sample step points in an interval

In [6]:
a = np.linspace(-10, 10, 5)
print a, a.dtype

[-10.  -5.   0.   5.  10.] float64


## Exercise
<hr style="border: solid 4px green">

Using the cell below, create an array of equally spaced 16 values between (and including) 0 and 1.  Check the size of the array.  How can the sum of all the elements in the array be computed?

## Exercise
<hr style="border: solid 4px green">

Can you guess what the following code snippet does?

In [12]:
b = np.zeros(3)
c = np.ones(3)
b, c

(array([ 0.,  0.,  0.]), array([ 1.,  1.,  1.]))

## Useful shortcuts
<hr style="border: solid 4px green">
Both `zeros` and `ones` are also useful for creating arrays before their contents is known. 
<br><br>

Another function is `empty`.

In [None]:
d = np.empty(3)
print d

> *Caution*!  `empty` just allocates memory and *does not* guarantee entries are zero.

## Array attributes
<hr style="border: solid 4px green">

`NumPy` keeps track of array metadata  as "attributes" of the array structure

In [13]:
# taking "a" from the previous example
a = np.linspace (-10, 10, 5)

# examine key array attributes
print a
print "Dimensions ", a.ndim
print "Shape      ", a.shape  # number of elements in each dim
print "Size       ", a.size   # total number of elements
print "Data type  ", a.dtype  # data type of element e.g. 64 bit float
print "Object type", type(a)

[-10.  -5.   0.   5.  10.]
Dimensions  1
Shape       (5,)
Size        5
Data type   float64
Object type <type 'numpy.ndarray'>


## Specify data at array creation
<hr style="border: solid 4px green">

In [14]:
# array of double precision floats (the default!)
a = np.array ([1., 2., 3.])
print "Data type of a", a.dtype
# array of single precision floats
b = np.array ([1.1,2.2,3.3], np.float32)
print "Data type of b", b.dtype
# arrays of complex elements
c = np.array ([[1, 2, 3], [4, 5, 6]], complex)
print "Data type of c", c.dtype

Data type of a float64
Data type of b float32
Data type of c complex128


## Exercise
<hr style="border: solid 4px green">

Modify the array `a` in the example above such that the type of the data is 32 and 64 bit integer respectively.

## <span style="font-family: Courier New, Courier, monospace;">NumPy</span> multi-dimensional arrays
<hr style="border: solid 4px green">

### Example: from lists
* `array` transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on
* *e.g.* a 2D array or matrix can be created from list of lists

In [3]:
mat = np.array( [[1,2,3], [4,5,6]] )
print mat
print "Dimensions: ", mat.ndim
print "Size:       ", mat.size
print "Shape:      ", mat.shape
print "Type:       ", mat.dtype

[[1 2 3]
 [4 5 6]]
Dimensions:  2
Size:        6
Shape:       (2, 3)
Type:        int64


## Array shape
<hr style="border: solid 4px green">

Work out the shape of the resulting arrays then execute these cells

In [16]:
i = np.array( [[1,1,1], [2,2,2], [3,3,3], [4,4,4]] )
print "i", i.shape
j = np.array([[[1,1,1],[2,2,2],[3,3,3],[4,4,4]],
             [[1,1,1],[2,2,2],[3,3,3],[4,4,4]]] )
print "j", j.shape

i (4, 3)
j (2, 4, 3)


## Exercise
<hr style="border: solid 4px green">

In the cell below, generate an identity matrix of size 6x6.  Then compute its size in memory (in bytes) using the array methods `size` and `itemsize`.  Then, force the matrix to be of 4 byte floats (add `dtype=np.float32`) and recalculate.

## Arrays and ufuncs
<hr style="border: solid 4px green">

### Ufuncs = overloaded and vectorised operations

### Pure Python -- using lists
```python
x = range (8)
y = [v + 5 for v in x]
```

### <span style="font-family: Courier New, Courier, monospace;">NumPy</span> -- using arrays is concise, loopless and fast
```python
x = np.array (x)
y = x + 5
```

## Example
<hr style="border: solid 4px green">

In [25]:
x = range (100000)
% timeit [v + 5 for v in x]
x = np.array(x)
% timeit x + 5

100 loops, best of 3: 14 ms per loop
The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 49.4 µs per loop


## Arrays and ufuncs (cont'd)
<hr style="border: solid 4px green">

### Overloaded operations
* arithmetic operators `+`, `-`, `*`, `/`, `%`
* bitwise operators `&`, `|`, *etc*
* comparison operators `<`, `>`, `<=`, `>=`, `==`, `!=`
* math functions `np.exp`, `np.log`, `np.sin`, *etc*

## Arrays and aggregations
<hr style="border: solid 4px green">

### Aggregations = reducing operations on arrays
* *e.g.* finding the max, sum, mean of an array, for which <span style="font-family: Courier New, Courier, monospace;">NumPy</span>  has dedicated and fast methods.
<br>

### Example 1

In [26]:
x = np.random.random (100000) # random vector
# this involves a lot of type-checks
% timeit min (x) 
# this does not involve a lot of type-check and is 100x faster
% timeit x.min ()

100 loops, best of 3: 5.38 ms per loop
The slowest run took 8.23 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 25.2 µs per loop


### Example 2

In [27]:
# random matrix
A = np.random.random ((10, 10000))
# select first row and sum 
% timeit sum(A[0,:])
# use the sum method of the array class
% timeit A[0,:].sum()
# even computing the sums of *all* columns
#   is fast using the class method
% timeit A.sum(axis=0)

1000 loops, best of 3: 761 µs per loop
The slowest run took 5.56 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.19 µs per loop
The slowest run took 6.40 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 41.4 µs per loop


## Arrays and aggregations (cont'd)
<hr style="border: solid 4px green">

* `np.min()`, `np.max()`
* `np.sum()`, `np.prod()`
* `np.argmin`, `np.argmax` return index of min / max value
* `np.nanmin`, `np.nanmax`, ... same thing but ignoring NaNs
* `np.mean()`,`np.std()`, `np.median()`, `np.percentile()`
* *etc.*

## Exercise
<hr style="border: solid 4px green">

In the cell below, generate a 1D vector of size 10 using `arange`.  Use the absolute value function `numpy.abs()` as well as `argmin()` to find the index of the vector entry closest to the value 7.

## Array broadcasting
<hr style="border: solid 4px green">

### Broadcasting = set of *rules* by which ufuncs operate on arrays if different sizes & dimensions
<br><br>

### Examples

In [35]:
x13 = np.array([1, 2, 3])
print x13
print x13 * 2
print x13 + 2

[1 2 3]
[2 4 6]
[3 4 5]


## Array broadcasting rules
<hr style="border: solid 4px green">

### An operation involves two arrays
* size of the trailing axes must either be the same or one of them must be one
* if neither, an eror is raised
* the size of the result is the maximum size along each dimension from the input arrays

## Broadcasting rules (cont'd)
<hr style="border: solid 4px green">

Example 1
<img src="images/bcast_43x13.png"; style="float: center; width: 40%">

In [36]:
x13 = np.array([10, 20, 30])
x43 = np.arange(12).reshape(4,3)
print x43, "+", x13, "=", x43+x13

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]] + [10 20 30] = [[10 21 32]
 [13 24 35]
 [16 27 38]
 [19 30 41]]


## Broadcasting rules (cont'd)
<hr style="border: solid 4px green">

Example 2: vector outer product
<img src="images/bcast_41x13.png"; style="float: center; width: 40%">

In [37]:
x13 = np.array([10, 20, 30])
x41 = np.arange(4).reshape(4,1)
print x41, "*", x13, "=", x41*x13

[[0]
 [1]
 [2]
 [3]] * [10 20 30] = [[ 0  0  0]
 [10 20 30]
 [20 40 60]
 [30 60 90]]


## Broadcasting rules (cont'd)
<hr style="border: solid 4px green">

Example 3: compute an outer-product by forcing `x41` to be a column array.

In [38]:
x13 = np.array([10, 20, 30])
x41 = np.arange(4)
print x41, "*", x13, "=", x41[:,np.newaxis]*x13

[0 1 2 3] * [10 20 30] = [[ 0  0  0]
 [10 20 30]
 [20 40 60]
 [30 60 90]]


## Exercise
<hr style="border: solid 4px green">

Taking the above examples as a guide, generate an array that has the values
* 10, 100, 1000, ... in the first column,
* 11, 101, 1001, ... in the second,
* 12, 102, 1002, ... in the third,
* and so on (adding 1 to each new column).

Type solution in the cell below.

## Accessing array elements
<hr style="border: solid 4px green">

### Basic indexing and slicing (similar to lists)
* but there is a lot more to this in `NumPy`

### Slicing

In [8]:
# a[start:stop:step] --> [start, and select every 'step' element until until stop)
a = np.arange(11)
print "a", a
print "a[0:4]", a[0:4]
print "a[0:7:2]", a[0:7:2]
print "a[0::2]", a[0::2]
print "a[::2]", a[::2]   # "shorthand" of the above

a [ 0  1  2  3  4  5  6  7  8  9 10]
a[0:4] [0 1 2 3]
a[0:7:2] [0 2 4 6]
a[0::2] [ 0  2  4  6  8 10]
a[::2] [ 0  2  4  6  8 10]


## Accessing array elements (cont'd)
<hr style="border: solid 4px green">

### Negative indices are valid!
* this feature is called *wraparound* and is useful for accessing the last element

In [44]:
print a[-1]

10


## Accessing array elements (cont'd)
<hr style="border: solid 4px green">

Exercise:  can you guess the output of the following cell?

In [None]:
print a[2:-3:2]

## Accessing array elements (cont'd)
<hr style="border: solid 4px green">

Multi-dimensional arrays: can use a tuples or index notation.

In [17]:
# basic indexing of a 3d array
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
# using index notation
print "c[1][0][1]:", c[1][0][1] 
# using a tuple (more performant)
print "c[1,0,1]:", c[1,0,1]     

c[1][0][1]: 6
c[1,0,1]: 6


## Accessing array elements (cont'd)
<hr style="border: solid 4px green">

Using tuples is faster!

In [46]:
% timeit c[1][0][1] 
% timeit c[1,0,1]     

The slowest run took 13.94 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 445 ns per loop
The slowest run took 20.84 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 195 ns per loop


## Accessing array elements (cont'd)
<hr style="border: solid 4px green">

In [19]:
print c
# if fewer indices given than number of axes, 
# missing axes are taken complete slices
print "c[1] = ", c[1]
print "c[1,0] = ", c[1,0]
# can use 3 dots for missing indices
print "c[1,0,...] = ", c[1,0,...] 

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
c[1] =  [[5 6]
 [7 8]]
c[1,0] =  [5 6]
c[1,0,...] =  [5 6]


## Slices and views
<hr style="border: solid 4px green">

### View = an array that refers to data from another array (a reference!)
* can create a view on an array by selecting a slice of an array
* no data is copied when a view is created

In [50]:
# create a 2d array
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
print "a = ", a

# can assign a slice to a variable and change
# the array referred to by the slice
s = a[2:, 1:]
print s
s[:,:] = -2
print "s = ", s
print "a = ", a

a =  [[1 2 3]
 [4 5 6]
 [7 8 9]]
[[8 9]]
s =  [[-2 -2]]
a =  [[ 1  2  3]
 [ 4  5  6]
 [ 7 -2 -2]]


## Slices and views (cont'd)
<hr style="border: solid 4px green">

### Slicing creates a view, whose memory is shared with the original array.

In [51]:
# this will print true, because
# s points to memory that is already pointed to by a
print s.base is a
# although pointers s and a are different
print id(s)
print id(a)

True
4404502320
4405100416


## Exercise
<hr style="border: solid 4px green">

Using slicing, change all the elements with values 7,8,10,11 in matrix `m` below (*i.e.* bottom right corner elements) to 1000 using a slice.

In [20]:
m = np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11]])

## Exercise
<hr style="border: solid 4px green">

Use the cell below to generate a 8x8 matrix of nonzero values and modify its border values to be zero.  Regenerate the matrix and fill it with a checkerboard pattern of zeros and ones.

## Reshaping arrays
<hr style="border: solid 4px green">

### An array can have its size and shape modified

In [72]:
a = np.arange(6)
print "a = ", a
print "shape of a:", a.shape
# modifying the shape attribute (not a copy) requires the size remains the same
a.shape = (3,2)
print "a = ", a

a =  [0 1 2 3 4 5]
shape of a: (6,)
a =  [[0 1]
 [2 3]
 [4 5]]


## Reshaping arrays (cont'd)
<hr style="border: solid 4px green">

In [25]:
# can alter the size and shape of the array with resize()
# (this may copy or pad, depending on shape)
a = np.arange(6)
print a
print np.resize(a, (3, 2))
print np.resize(a, (4, 3))
print np.resize(a, (4,1))

[0 1 2 3 4 5]
[[0 1]
 [2 3]
 [4 5]]
[[0 1 2]
 [3 4 5]
 [0 1 2]
 [3 4 5]]
[[0]
 [1]
 [2]
 [3]]


## Exercise
<hr style="border: solid 4px green">

Use the cell below to resize array `a` with various sizes that do not multiply up to the original length of 6.

## Reshaping arrays (cont'd)
<hr style="border: solid 4px green">

### Check if arrays share the same data (*i.e.* not a copy) using <span style="font-family: Courier New, Courier, monospace;">base</span>

In [27]:
a1 = np.resize(a, (3, 2))
print a1.base is a
print id(a) == id(a1)

False
False


## Reshaping arrays (cont'd)
<hr style="border: solid 4px green">

### Careful!  <span style="font-family: Courier New, Courier, monospace;">reshape()</span> works differently

In [28]:
a1 = np.reshape(a, (3, 2))
print a1
print a1.base is a
print id(a) == id(a1)

[[0 1]
 [2 3]
 [4 5]]
True
False


## Fancy indexing
<hr style="border: solid 4px green">

### Indexing using another numpy array of integer or boolean values
* this is advanced indexing, which lets you do more than simple indexing

In [29]:
# create an array
p = np.array([[ 0,  1,  2],[ 3,  4,  5],
              [ 6,  7,  8],[ 9, 10, 11]])
print p

rows = [0,0,3,3]   # indices for rows
cols = [0,2,0,2]   # indices for columns
q = p[rows,cols]
print q

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[ 0  2  9 11]


## Fancy indexing (cont'd)
<hr style="border: solid 4px green">

Fancy indexing returns a copy (not a view like slicing)

In [30]:
# ... check if a is a view or a copy
q[0] = 1000
print q
print p

[1000    2    9   11]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


## Exercise
<hr style="border: solid 4px green">

Use `base` in the cell below to check whether `q` is a copy.  Then, do the same for a simple indexed slice of `p`, *e.g.* `p[1:2,3:4]`.

## Fancy indexing (cont'd)
<hr style="border: solid 4px green">

You can use logical expressions and boolean "masks" to find indices of elements of interest.

Example

In [83]:
# find indices of elements 
# with value less than zero
m = np.array( [[0,-1,4,20,99],[-3,-5,6,7,-10]] )
print m
print m[ m < 0 ]
print m<0

[[  0  -1   4  20  99]
 [ -3  -5   6   7 -10]]
[ -1  -3  -5 -10]
[[False  True False False False]
 [ True  True False False  True]]


## Fancy indexing (cont'd)
<hr style="border: solid 4px green">


Can you guess what the following code does?

In [84]:
a = np.arange(10)
print a
mask = np.ones(len(a), dtype=bool)
mask[[0,2,4]] = False  # set certain mask values to False
result = a[mask]
print result

[0 1 2 3 4 5 6 7 8 9]
[1 3 5 6 7 8 9]


> *Note*: Masks are particularly powerful when used in conjunction with the ufunc operations.

## Array copies
<hr style="border: solid 4px green">

Simple assignment creates references or "shallow" copies of arrays.

In [34]:
a = np.array( [-2,6,2] )
print "a = ", a
b = a
a[0] = 20
print "b points to a!"
print "b =", b

a =  [-2  6  2]
b points to a!
b = [20  6  2]


## Array copies (cont'd)
<hr style="border: solid 4px green">


That is because all `b = a` does is a pointer copy.  Indeed

In [35]:
print "id(a)=", id(a)
print "id(b)=", id(b)

id(a)= 139671260856192
id(b)= 139671260856192


## Array copies (cont'd)
<hr style="border: solid 4px green">


Use `copy()` to create a true or "deep" copy.

In [36]:
c = a.copy()
print "id(a)=", id(a)
print "id(c)=", id(c)

id(a)= 139671260856192
id(c)= 139671260856032


## Array copies (cont'd)
<hr style="border: solid 4px green">

In [37]:
# check c really is an independent copy of a 
c[0] = 0
print "c changes", c
print "a unaffected", a

c changes [0 6 2]
a unaffected [20  6  2]


## Vectorization
<hr style="border: solid 4px green">

### Vectorization = replacement of explicit loops by array expressions
* allows element-wise operations on arrays
* no loops involved
* efficient element-wise operations
* uses all the "tricks" above (indexing, broadcating, *etc.*)
<br><br>

### Vectorization is powerful
* one or two (or more) orders of magnitude faster than the pure Python equivalent
<br><br>

### Vectorization = tradeoff between time and space
* time is execution **time**, space is **memory**
* replacing loops may need extra variables for intermediary results
<br><br>

### Example: operations on arrays of matching size.

In [90]:
import math
x = np.random.randn(1000)
y = np.random.randn(1000)
% timeit [math.sqrt(x[i]*x[i] + y[i]*y[i]) for i in xrange(1000)]
% timeit np.sqrt(x*x + y*y)

1000 loops, best of 3: 849 µs per loop
The slowest run took 37.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.19 µs per loop


## Vectorization (cont'd)
<hr style="border: solid 4px green">

### Power and flexibility: all sorts of operations are possible

In [91]:
a = np.arange(10).reshape([2,5])
b = np.arange(10).reshape([2,5])

# multiply with a constant scalar
print -0.1*a

# multiply a and b, entry by entry
# (NB: this is not matrix multiplication)
print a*b

[[-0.  -0.1 -0.2 -0.3 -0.4]
 [-0.5 -0.6 -0.7 -0.8 -0.9]]
[[ 0  1  4  9 16]
 [25 36 49 64 81]]


## Vectorization (cont'd)
<hr style="border: solid 4px green">

### Careful -- type of data elements matters!

In [92]:
# divide a to b plus 1
print a/(b+1).astype(float)
# what happens if the astype() is removed?
print a/(b+1)

[[ 0.          0.5         0.66666667  0.75        0.8       ]
 [ 0.83333333  0.85714286  0.875       0.88888889  0.9       ]]
[[0 0 0 0 0]
 [0 0 0 0 0]]


## Vectorization (cont'd)
<hr style="border: solid 4px green">

### Example: element-wise math functions on arrays

In [93]:
# efficient vectorised computation
np.sqrt(np.array([4, 9, 16]))

array([ 2.,  3.,  4.])

## Exercise
<hr style="border: solid 4px green">

Run the cell below and determine which method to square the elements of an array is faster.

Then, change the power from square to cube.  How does the times change?

Add two extra methods to compute the cube of all the elements in `x`
* using the power function `numpy.power()`
* using Einstein summation `numpy.einsum ('i,i,i->i', x,x,x)`

In [38]:
# multiplying two vectors
x = np.arange(10E7)
% timeit x*x
% timeit x**2
% timeit np.power(x, 2)
% timeit np.einsum ('i,i->i', x,x)

1 loop, best of 3: 137 ms per loop
10 loops, best of 3: 138 ms per loop
1 loop, best of 3: 801 ms per loop
1 loop, best of 3: 237 ms per loop


## Manipulating arrays
<hr style="border: solid 4px green">

#### Many methods for manipulating arrays (reshaping, joining, splitting, inserting, ...).

For example,
```python
concatenate((a1,a2),axis=0)
split(a, indices_or_sections, axis=0)
hstack
vstack
flatten
ravel(a)
stack(arrays[, axis])
tile(a, reps)
repeat(a, repeats[, axis])
unique(ar[, return_index, return_inverse, ...])
trim_zeros(filt[, trim])
fill(scalar)
xv, yv = meshgrid(x,y)
```

## Math operations
<hr style="border: solid 4px green">

### <span style="font-family: Courier New, Courier, monospace;">NumPy</span> provides a host of math funtions
<br><br>

### But so does the <span style="font-family: Courier New, Courier, monospace;">math</span> module
<br><br>

### Which one to use?
* `math`
  * mathematical functions defined by the C standard
  * fast operations on
* `NumPy`
  * mathematical functions defined as ufuncs
  * slower than `math` on Python scalars
  * faster than `math` on NumPy arrays

In [98]:
import numpy
import math
x = 1.2
v = numpy.random.rand(1024)
% timeit numpy.sinh(x)
% timeit math.sinh(x)
% timeit numpy.sinh(v)
% timeit [math.sinh(vi) for vi in v]

The slowest run took 26.98 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 848 ns per loop
The slowest run took 10.47 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 205 ns per loop
The slowest run took 13.62 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.8 µs per loop
1000 loops, best of 3: 277 µs per loop


## Exercise
<hr style="border: solid 4px green">

Build two 1D arrays `x` and `y`, representing cartesian coordinates of a series of points.  Tip: use the `numpy` function `random.rand`.  Apply vector transformations to change these to polar coodinates.  Then, use `numpy.sqrt()` and `numpy.arctan2()` and write your solution in the cell below.

## File operations
<hr style="border: solid 4px green">

### <span style="font-family: Courier New, Courier, monospace;">NumPy</span> provides an easy way to save data to text file

In [None]:
# Generate an array of 5 random real numbers
pts = 5
x = np.arange(pts)
y = np.random.random(pts)

## File operations (cont'd)
<hr style="border: solid 4px green">

In [None]:
# data format specifiers: d = int, f = float, e = exponential
np.savetxt('savedata.txt', np.c_[x,y], header='DATA', footer='END', fmt='%d %1.4f')
# escape to shell to see file
!cat savedata.txt

## File operations (cont'd)
<hr style="border: solid 4px green">

In [None]:
# Reload data to an array
p = np.loadtxt('savedata.txt')
print p

## File operations (cont'd)
<hr style="border: solid 4px green">

### More flexibility with <span style="font-family: Courier New, Courier, monospace;">genfromtext()</span>

In [None]:
p = np.genfromtxt ('savedata.txt', skip_header=2, skip_footer=1)
print p

## Exercise
<hr style="border: solid 4px green">

What do `numpy.save()` and `numpy.load()` do?  Write a snippet of code to save a variable to a binary file.  Read the data in from the file into another variable and compare with the first variable whose values were saved.

## <span style="font-family: Courier New, Courier, monospace;">numexpr</scan>
<hr style="border: solid 4px green">

### Fast numerical expression evaluator for <span style="font-family: Courier New, Courier, monospace;">NumPy</scan>
* an easy-to-use way to compute array-based operations
* faster than pure `NumPy` by up to an order of magnitude
<br><br>

### Accelerates expressions operating on <span style="font-family: Courier New, Courier, monospace;">NumPy</span> arrays
* faster execution
  * expression parsed
  * array operands are split into small chunks that easily fit in the cache of the CPU
* less memory
  * allocating memory for intermediate results is avoided
* parallelism
  * the operations are multi-threaded
  * support from Intel's VML (Vector Math Library), part of Math Kernel Library (MKL)

## <span style="font-family: Courier New, Courier, monospace;">numexpr</span> (cont'd)
<hr style="border: solid 4px green">

### Best acceleration of expression evaluation
* on large arrays
* CPU expensive operations that are not memory-bound (*e.g.* transcendental functions)

## <span style="font-family: Courier New, Courier, monospace;">numexpr</span> (cont'd)
<hr style="border: solid 4px green">

### Example

In [102]:
import numpy as np
import numexpr as ne
import time
import math
n = 10000000;
# generate some large arrays
x = np.random.rand(n); y = np.random.rand(n); z = np.random.rand(n)
# normal numpy
%timeit r1 = np.sqrt(np.sum(x**2 + y**2 + z**2))
# numexpr
%timeit r2 = ne.evaluate('sum(x**2 + y**2 + z**2)'); r2 = ne.evaluate('sqrt(r2)')
# normal numpy (something more expensive than above)
%timeit r3 = np.sin(x**2) + np.arcsinh(y**2) + np.arctanh(z**2)
# numexpr
%timeit r4 = ne.evaluate('sin(x**2) + arcsinh(y**2) + arctanh(z**2)');  

10 loops, best of 3: 141 ms per loop
10 loops, best of 3: 45.8 ms per loop
1 loop, best of 3: 915 ms per loop
10 loops, best of 3: 47.1 ms per loop


## Random number generation
<hr style="border: solid 4px green">

<span style="font-family: Courier New, Courier, monospace;">NumPy</span> provides functions for **R**andom **N**umber **G**eneration (RNG).

In [None]:
# create an array of 10 random real numbers 
# from a uniform distribution
print np.random.rand (10)

## Random number generation (cont'd)
<hr style="border: solid 4px green">

In [None]:
# create a matrix of numbers from a normal distribution
# (0 mean, 1 variance)
print np.random.randn (5,3)

## Random number generation (cont'd)
<hr style="border: solid 4px green">

In [None]:
# create 2d array (5x5) reshaped matrix from a 1d array of (25)
# random integers between 0 and 5
print np.random.randint(0,high=5,size=25).reshape(5,5)

## Random number generation (cont'd)
<hr style="border: solid 4px green">

In [None]:
# generate sample from normal distribution 
# (mean=0, standard deviation=1)
print np.random.standard_normal((5,5))

## Exercise
<hr style="border: solid 4px green">

Explore other ways of generating random numbers.  What other distributions can you sample?

## RNG and parallel processing
<hr style="border: solid 4px green">

### RNG in parallel can be a source of error
What happens when you need to generate random numbers for different Python processes running concurrently on the same system?  (This is a frequent occurrence, for example in Monte Carlo simulations.)
<br><br>

### Caution!
The default seed for the RNG is `/dev/urandom` (or the Windows analogue).  Parallel processes that use the default may easily end up generating identical streams of numbers.


## RNG and parallel processing (cont'd)
<hr style="border: solid 4px green">

Consider the following code (which uses the `multiprocessing` module).

In [44]:
import numpy as np
from multiprocessing import Pool

def func (seed=None):
    # uncomment the following line to seed the RNG
    # np.random.seed (seed)
    # generate 5 random numbers with a uniform distribution
    return np.random.uniform (0, 1, 4)

pool = Pool (processes=8)
print np.array (pool.map(func, range(8)))

[[ 0.21774402  0.87520553  0.57335391  0.02402425]
 [ 0.21774402  0.87520553  0.57335391  0.02402425]
 [ 0.89658483  0.40507974  0.48759049  0.60595859]
 [ 0.21774402  0.87520553  0.57335391  0.02402425]
 [ 0.21774402  0.87520553  0.57335391  0.02402425]
 [ 0.21774402  0.87520553  0.57335391  0.02402425]
 [ 0.21774402  0.87520553  0.57335391  0.02402425]
 [ 0.89658483  0.40507974  0.48759049  0.60595859]]


## RNG and parallel processing (cont'd)
<hr style="border: solid 4px green">

### Seed the RNG
Clearly, each process must use something other than the default to seed the RNG.  But what?
<br><br>

### Random seeds
A first idea is to use random integers as seeds.  But the probability of getting two identical seeds in two separate processes is surprisingly high.  So this first idea is a *bad* one, particularly when the number of processes is high.

* Classic problem: what is the probability two people in a group share birthday?  For a group of 23 people the chance is 50% and it increases to 97% for a group of 50.

* Similarly, if the seed is a 16-bit unsigned integer, there are 65.5k possible seeds.  The probability two seeds are indentical from a sample of 1000 is 99.95%.
<br>


## RNG and parallel processing (cont'd)
<hr style="border: solid 4px green">

### Sequential seeds
A better idea!  A good quality RNG produces uncorrelated output for any two different seeds (including consecutive seeds).  Uncomment the seeding command in the code above to illustrate this.
<br><br>

### Dedicated parallel RNG
An even better idea if Monte Carlo simulations are the bulk of your research.

RandomState (https://pypi.python.org/pypi/randomstate/1.10.1)
* a good choice
* drop-in replacement for `numpy`'s own (`numpy.random.RandomState`), so also easy to use

## Summary
<hr style="border: solid 4px green">

### The <span style="font-family: Courier New, Courier, monospace;">NumPy</span> extension module
* designed for scientific computing in Python
* multi-dimensional array structures
* fast array oriented computing

## Exercise: <span style="font-family: Courier New, Courier, monospace;">NumPy</span> indexing and slicing
<hr style="border: solid 4px green">

### Using the cell below for programming, follow the following steps:
1. Define a 1D `NumPy` array `x`, containing random numbers (real, double precision).
2. Using index slicing and negative indices, assign the "interior" of the array `x` (the entire array except the first and the last entry) to a new array variable `y`.
3. Using slicing, create yet another array `z` that contains every other element of `x`, starting with the first.
4. Using fancy indexing, create an array `z2` that contains all the negative elements of `z`.
5. Transform the array `x` into a 2D array and assign the result to `x2`.  Modify the first element of `x2`.  Now, verify whether the two arrays share the same data or not.  The result depends on the transformation method used: try both `reshape` and `resize`.  To compare `x` and `x2`, use the `numpy` function `flatten` to adjust`x2` back to a single dimension.

> *Hint*: One way to compare `x` and `x2` is to use the logical `==` and the function `numpy.all`.  Another way is to use `np.testing.assert_almost_equal`.

In [26]:
import numpy as np

x = np.random.uniform(-10,10,10)
print x

y = x[1:-1:1]
print y

z = x[0::2]
print z

z2= z[ z<0 ]
print z2

x2 = np.reshape(x, (-1,2))
print x2

x3 = np.resize(x, (-1,2))
print x3


[-8.08512258 -6.09951458  1.44880284  4.45202971  0.43294371 -3.0268401
 -6.90761686 -5.22285429 -7.43169445  8.25902342]
[-6.09951458  1.44880284  4.45202971  0.43294371 -3.0268401  -6.90761686
 -5.22285429 -7.43169445]
[-8.08512258  1.44880284  0.43294371 -6.90761686 -7.43169445]
[-8.08512258 -6.90761686 -7.43169445]
[[-8.08512258 -6.09951458]
 [ 1.44880284  4.45202971]
 [ 0.43294371 -3.0268401 ]
 [-6.90761686 -5.22285429]
 [-7.43169445  8.25902342]]
[[-8.08512258 -6.09951458]
 [ 1.44880284  4.45202971]
 [ 0.43294371 -3.0268401 ]
 [-6.90761686 -5.22285429]]


## Exercise: <span style="font-family: Courier New, Courier, monospace;">NumPy</span> arrays and loops
<hr style="border: solid 4px green">

### Use the cell below and follow the following steps:
1. Define a large 1D `NumPy` array `x`, containing random numbers (real, double precision).
2. Create another array `y` of the same dimensions as `x` but with all entries zero.
2. Using different methods, define a function `apply_sin()` that takes both `x` and `y` as arguments and computes the sine of each element of `x`, storing the results in the corresponding element of `y`.
  * First, use iterations over a list generated by `range` and use the function `math.sin`, *e.g.*
  ```python
  for i in range(n):
      y[i] = math.sin(x[i])
  ```
  * Time the execution of the function.  Change `range` with `xrange`.  How does the execution time change?  What does `xrange` do?
  * Now, change `math.sin` with `numpy.sin`.  How does execution time change?  Which is the more expensive function for calculating the sine of a scalar?
  * Lastly, use NumPy and a vectorised approach, applying `numpy.sin` directly to the entire array `x`.  How does vectorisation improve performance over the iterated approach?  Can `math.sin` be mapped directly on the array?

<img src="../../images/reusematerial.png"; style="float: center; width: 90"; >