# Modules in Python

One of the advantages of Python that makes it so versatile for a wide range of tasks is the broad ecosystem of tools and packages that offer more specialized functionality on top of the "bare" Python.

## Loading Modules: the ``import`` Statement

For loading built-in and third-party modules, Python provides the ``import`` statement.

#### <font color='green'>Good</font>
import <font color='green'>sys</font>

from os import <font color='green'>path</font>

import statistics <font color='green'>as stats</font>

from custom_package import <font color='green'>mode</font>

from statistics import <font color='green'>mean, median</font>

#### <font color='red'>Bad:</font> silently overwrites previous imports
from math import <font color='red'><b>*</b></font>

from pylab import <font color='red'><b>*</b></font>

For today we will import the **NumPy** module. A powerful and flexible maths package

In [1]:
import numpy as np # Because I am too lazy to write numpy every time

# ![](http://www.numpy.org/_static/numpy_logo.png) 
##### NumPy supports arrays which are very useful to numerical computations
* Arrays are N dimensional: 1d (vector), 2d (plane),...,N dim
* Arrays are (generally) faster than lists
* Many packages use numpy arrays to store data
* Arrays can be used to make calculations in one command, without `for` loops or list compreension

### Looking for help?

* Documentation: http://docs.scipy.org/doc/numpy/reference/
* Google is your friend! Especially links to Stack Overflow "how do I create an empty array in numpy"
* Use the help function (tab will show options available)

In [2]:
help(np.mean)

Help on function mean in module numpy:

mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)
    Compute the arithmetic mean along the specified axis.
    
    Returns the average of the array elements.  The average is taken over
    the flattened array by default, otherwise over the specified axis.
    `float64` intermediate and return values are used for integer inputs.
    
    Parameters
    ----------
    a : array_like
        Array containing numbers whose mean is desired. If `a` is not an
        array, a conversion is attempted.
    axis : None or int or tuple of ints, optional
        Axis or axes along which the means are computed. The default is to
        compute the mean of the flattened array.
    
        .. versionadded:: 1.7.0
    
        If this is a tuple of ints, a mean is performed over multiple axes,
        instead of a single axis or all the axes as before.
    dtype : data-type, optional
        Type to use in computing the mean.  For integer inputs,

### Creating an array from a list

In [3]:
a1d = np.array([3, 4, 5, 6])
a1d

array([3, 4, 5, 6])

In [4]:
a2d = np.array([[10.,   20, 30], [9, 8, 5]])
a2d

array([[10., 20., 30.],
       [ 9.,  8.,  5.]])

Slicing works much like lists, with the different dimenstions of the array seperated by commas. Can you guess what the following slices are equal to? Print them to check your understanding.

In [5]:
a2d[0,0]

10.0

In [6]:
a2d[0,1:]

array([20., 30.])

In [7]:
a2d[:,2]

array([30.,  5.])

**Excercise** Create a 2D NumPy array from the following list and assign it to the variable "a":

In [8]:
# [[2, 3.2, 5.5, -6.4, -2.2, 2.4],
#  [1, 22, 4, 0.1, 5.3, -9],
#  [3, 1, 2.1, 21, 1.1, -2]]

**Excercise** Using indexes: how to calculatethe difference between adjacent items in a list without a loop?

In [10]:
x = np.array([1, 2, 3, 4, 5])
# Your code here

### Array attributes

In [11]:
a2d

array([[10., 20., 30.],
       [ 9.,  8.,  5.]])

#### ndarray.ndim
the number of axes (dimensions) of the array. In NumPy, the number of dimensions is referred to as rank.

In [12]:
a2d.ndim

2

#### ndarray.shape
the dimensions of the array

In [13]:
a2d.shape

(2, 3)

### Functions for creating arrays
#### ``arange([start,] stop[, step,], dtype=None)``
#### evenly spaced, defined by step

In [14]:
np.arange(1, 9, 2)

array([1, 3, 5, 7])

###### ``linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)``


#### evenly spaced, defined by length

In [16]:
np.linspace(0, 1, 11)   # start, end, number of points

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

**Excercise**

Create arrays of evenly spaced numbers

In [22]:
# Numbers from 1 to 10 in steps of 1


In [23]:
# From 0 to -2 in steps of -0.4


In [24]:
# 100 steps from - pi to pi (hint, use np.pi)


####  Create array filled with zeros

In [25]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

#### Creates array with random numbers

In [26]:
np.random.rand(4)       # From a uniform distribution beween 0 and 1

array([0.22143784, 0.16680299, 0.86535731, 0.95405693])

In [27]:
np.random.normal(0,1,size=4)      # Gaussian (mean,std dev, num samples)

array([ 0.38204834,  1.8588799 , -1.09337928, -1.06913512])

In [29]:
np.random.randint(-10,10,size=(5,5)) # Random integers in a specified range
# How does this function work? Try uncommenting the next line to read the documentation for this function
#np.random.randint?

array([[-4,  1,  7, -9, -7],
       [ 4,  8,  0, -2,  5],
       [-7,  5,  3,  4,  7],
       [-2, -7, -4,  6,  4],
       [-4,  6,  2,  0, -9]])

#### Grid generation
* A common task is to generate a pair of arrays that represent data coordinates. 
* Useful for interpolation of mapping contours.
* When orthogonal 1D coordinate arrays already exist, NumPy's `meshgrid` function is very useful:

In [30]:
x = np.linspace(-5, 5, 3)
y = np.linspace(10, 40, 4)
print(x)
print(y)

[-5.  0.  5.]
[10. 20. 30. 40.]


In [31]:
x2d, y2d = np.meshgrid(x, y)
print(x2d)

[[-5.  0.  5.]
 [-5.  0.  5.]
 [-5.  0.  5.]
 [-5.  0.  5.]]


In [32]:
print(y2d)

[[10. 10. 10.]
 [20. 20. 20.]
 [30. 30. 30.]
 [40. 40. 40.]]


Transpose arays with .T

In [33]:
y2d.T

array([[10., 20., 30., 40.],
       [10., 20., 30., 40.],
       [10., 20., 30., 40.]])

### Statistical methods of arrays

In [34]:
print('array a1d                       :', a1d)
print('Minimum and maximum             :', a1d.min(), a1d.max())
print('Index of minimum and maximum    :', a1d.argmin(), a1d.argmax())
print('Sum and product of all elements :', a1d.sum(), a1d.prod())
print('Mean and standard deviation     :', a1d.mean(), a1d.std())
print('Median and 75 percentile           :', np.median(a1d), np.percentile(a1d,75))

array a1d                       : [3 4 5 6]
Minimum and maximum             : 3 6
Index of minimum and maximum    : 0 3
Sum and product of all elements : 18 360
Mean and standard deviation     : 4.5 1.118033988749895
Median and 75 percentile           : 4.5 5.25


### Operations over a given axis

In [35]:
print(a2d)
print('sum array  :',a2d.sum())
print('sum axis 0  :',a2d.sum(axis=0))
print('sum axis 2 :',a2d.sum(axis=1))

[[10. 20. 30.]
 [ 9.  8.  5.]]
sum array  : 82.0
sum axis 0  : [19. 28. 35.]
sum axis 2 : [60. 22.]


**Excercise** Using the array 'a' we created earlier, find: 
* The maximum value
* The 90th percentile 
* The mean along axis 0
* The sum along axis 1

(If you haven't made 'a' yet uncomment and run the following cell)

In [36]:
#a = np.array([[2, 3.2, 5.5, -6.4, -2.2, 2.4],
#              [1, 22, 4, 0.1, 5.3, -9],
#              [3, 1, 2.1, 21, 1.1, -2]])

In [37]:
# Your code here

### Vectorisation: operations on whole arrays

In [41]:
a = np.array([1,2,3,4])
print(a)

[1 2 3 4]


In [42]:
a**2

array([ 1,  4,  9, 16])

In [43]:
np.exp(a) # e raised to the power of a

array([ 2.71828183,  7.3890561 , 20.08553692, 54.59815003])

In [44]:
b = np.array([1, 10, 100, 1000])
a*b

array([   1,   20,  300, 4000])

All the maths we could apply to *ints*, *floats* and *complex* numbers individually we can now apply to arbitrarily large and complex *arrays* of numbers using NumPy. Let's revisit our function for calculating pressure from depth

In [45]:
def under_pressure(d,rho = 1027.5, g = 9.81):
    P = rho*g*d
    return P

Using Python's standard data types we had to pass depths to the function one at a time to get pressures. With numpy we can use vectorisation to get a whole array of pressure values simply by passing an array of depths to the function

Using a list:

In [48]:
depths_list = [10,20,30,40,50]
pressures = under_pressure(depths_list)
print(pressures)

TypeError: can't multiply sequence by non-int of type 'float'

Using an array:

In [49]:
depths_array = np.array([10,20,30,40,50])
pressures = under_pressure(depths_array)
print(pressures)

[100797.75 201595.5  302393.25 403191.   503988.75]


## Shape manipulation

In [50]:
b = np.array([[1, 2, 3], [4, 5, 6]])

In [51]:
b.flatten()

array([1, 2, 3, 4, 5, 6])

In [52]:
b.reshape(3,2)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [53]:
b.repeat(3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6])

In [57]:
np.tile(b,(3,1))

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

## Pointers revisited
### Copies vs. in-place operations


From help(numpy):

<code>
Most of the functions in `numpy` return a copy of the array argument
(e.g., `np.sort`).  In-place versions of these functions are often
available as array methods, i.e. ``x = np.array([1,2,3]); x.sort()``.
Exceptions to this rule are documented.
</code>

In [58]:
foo = np.array([99,98,97])
bar = foo
# Method sort()
foo.sort()

In [59]:
print(foo)

[97 98 99]


In [60]:
print(bar)

[97 98 99]


Using the inbuild method var.sort() on `foo` has changed `bar`

In [None]:
foo = np.array([99,98,97])
bar = foo
# Function sort
foo = np.sort(foo)

In [None]:
print(foo)

In [None]:
print(bar)

using the function np.sort() `bar` remains unchanged

### If you are ever unsure

Prefer use of **functions** that take the form `module.function(variable)` e.g. ` np.sort(variable`

to use of **methods** that take the form `variable.method()` e.g. `variable.sort()`

or use `copy` when making copies of variables to be safe