# ![](http://www.numpy.org/_static/numpy_logo.png) Introduction to NumPy
##### NumPy supports arrays which are very useful to numerical computations
* Arrays are N dimensional: 1d (vector), 2d (plane),...,N dim
* Arrays are (generally) faster than lists
* Many packages use numpy arrays to store data
* Arrays can be used to make calculations in one command, without For loops or list compreension

**See this brilliant guide: http://www.labri.fr/perso/nrougier/from-python-to-numpy/**

Let's import it...

In [None]:
import numpy as np

Vectorised
```python
array2 = array1 * k + c
```

Non-vectorised, here with a list
```python
for i in range(len(arr1)):
    arr2[i] = arr1[i] * k + c   
```

### Do we still need lists?
 
* Lists can have different objects as elements. Arrays are homogenous.
```python
example_list = [number, string, cat, dog]
example_array = [cat1, cat2, cat3]
```
* Lists can be nested 
```python
nested_list = [[1, 2], ['a', 'b', 'qwerty'], [1]]
```
Arrays can also be nested but it negates some of the advantages of n-dimensional arrays

### Looking for help?

* Documentation: http://docs.scipy.org/doc/numpy/reference/
* Use help function (tab will show options available)
```python
    help(np.mean)
```

* Interactive help: NumPy has an a built-in search engine

In [None]:
np.lookfor('weighted average')

### Creating an array from a list

In [None]:
a1d = np.array([3, 4, 5, 6])
a1d

In [None]:
a2d = np.array([[10.,   20, 30], [9, 8, 5]])
a2d

In [None]:
print( type( a1d[0] ) )
print( type( a2d[0,0] ) )

In [None]:
type(a1d)

The **core class** of NumPy is the `ndarray` (homogeneous n-dimensional array).

To find methods or attributes:
<code>
a1d.   ->tab
</code>

More on this below.

### Common mistakes

In [None]:
try:
    a = np.array(1,2,3,4)   # WRONG, only 2 non-keyword arguments accepted
except ValueError as err:
    print(err)

In [None]:
a = np.array([1,2,3,4]) # RIGHT

In [None]:
np.ndarray([1,2,3,4]) #  ndarray is a low level method. Use np.array() instead

### Functions for creating arrays

#### ``arange([start,] stop[, step,], dtype=None)``
#### evenly spaced, defined by step

In [None]:
np.arange(1, 9, 2)

In [None]:
# for integers, np.arange is same as range but returns an array insted of a list 
np.array( range(1,9,2) )

###### ``linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)``


#### evenly spaced, defined by length

In [None]:
np.linspace(0, 1, 10)   # start, end, num-points

## Exercise 1

In [None]:
# create array with seconds from 00:00 to 24:00, inclusive 
# (your code below)


####  Create array filled with zeros

In [None]:
np.zeros((2, 3))

By default, the dtype of the created array is float64 but other dtypes can be used:

In [None]:
np.zeros((2,2),dtype=int)

#### Filled with ones

In [None]:
np.ones((2, 3))

#### Create array with random numbers

In [None]:
np.random.rand(4)       # uniform in [0, 1]

In [None]:
np.random.normal(0,1,size=4)      # Gaussian (mean,std dev, num samples)

In [None]:
np.random.gamma(1,1,(2,2))      # Gamma (shape, scale , num samples)

#### Grid generation

* A common task is to generate a pair of arrays that represent data coordinates. 
* Useful for interpolation of mapping contours.
* When orthogonal 1D coordinate arrays already exist, NumPy's `meshgrid` function is very useful:

In [None]:
x = np.linspace(-5, 5, 3)
y = np.linspace(10, 40, 4)
x2d, y2d = np.meshgrid(x, y)
print(x2d)
print(y2d)

### Transpose arrays 
This can be very useful when dealing with grids

In [None]:
print(y2d)
print(np.transpose(y2d)) # using a numpy function
print(y2d.transpose())   # using the method of y2d
print(y2d.T)             # using the property of y2d


## Array indexing

* Indices begin at 0, like other Python sequences and C/C++. Note that many languages, such as Matlab, R and Fortran, start with 1
* In 2D, the first dimension corresponds to rows, the second to columns.
* The fastest varying dimension is the last dimension! The outer level of the hierarchy is the first dimension.

In [None]:
a = np.arange(10, 100, 10)
a

In [None]:
a[2:9:3] # [start:end:step]

In [None]:
a[:3] # last is not included

In [None]:
a[-2] # negative index counts from the end

### Using indexes: how to calculate x[i]-x[i-1] without a loop?

In [None]:
x = np.random.rand(6)
x = np.sort(x)
print(x)

In [None]:
x[1:] - x[:-1]

## Exercise 2

Create a 2D NumPy array from the following list and assign it to the variable "arr":

In [None]:
# [[2, 3.2, 5.5, -6.4, -2.2, 2.4],
#  [1, 22, 4, 0.1, 5.3, -9],
#  [3, 1, 2.1, 21, 1.1, -2]]

Can you guess what the following slices are equal to? Print them to check your understanding.

In [None]:
# a[:, 3]

In [None]:
# a[1:4, 0:4]

In [None]:
# a[1:, 2]

How to extract the last column and the row before last

In [None]:
#a[]

In [None]:
#a[]

### Fancy indexing

NumPy arrays can be indexed with slices, but also with boolean or
integer arrays (masks)

In [None]:
a = np.random.randint(1, 100, 6) # array of 6 random integers between 1 and 100
a

In [None]:
mask = ( a % 3 == 0 ) # Where divisible by 3 (% is the modulus operator).
mask

In [None]:
a[mask]

### Array attributes

In [None]:
a2d

#### ndarray.ndim
the number of axes (dimensions) of the array. In NumPy, the number of dimensions is referred to as rank.

In [None]:
a2d.ndim

#### ndarray.shape
the dimensions of the array

In [None]:
a2d.shape

This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

In [None]:
# Let's use those values

NLines,NCols = a2d.shape
print('NLines:', NLines,'NCols:',NCols)

#### ndarray.size
the total number of elements of the array

In [None]:
a2d.size

Note that `size` is not equal to `len()`. The latter returns the length of the first dimension.

In [None]:
len(a2d)

### Copies and views

In [None]:
original = np.array([99,98,97])

other = original
other[0] = 0

print('other is now ',other)
print('original is now ',original)

Numpy, in its frugality, will create a view by default, unless told to make a copy.

In [None]:
original = np.array([99,98,97])
copy = original.copy()
copy[0] = 0

print('copy is now ',copy)
print('original is now ',original)

### Copies vs. in-place operations


From help(numpy):

<code>
Most of the functions in `numpy` return a copy of the array argument
(e.g., `np.sort`).  In-place versions of these functions are often
available as array methods, i.e. ``x = np.array([1,2,3]); x.sort()``.
Exceptions to this rule are documented.
</code>

In [None]:
original = np.array([99,98,97])

# Function sort
sortedCopy = np.sort(original)
print('original:',original,'returned:',sortedCopy)

# Method sort()
original.sort()
print('original:', original)


### Statistical methods of arrays

In [None]:
a1d=np.random.normal(0,1,4)
print('array a1d                       :', a1d)
print('Minimum and maximum             :', a1d.min(), a1d.max())
print('Index of minimum and maximum    :', a1d.argmin(), a1d.argmax())
print('Sum and product of all elements :', a1d.sum(), a1d.prod())
print('Mean and standard deviation     :', a1d.mean(), a1d.std())


### Statistical functions

<https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html>
    

In [None]:
print('Median and percentile           :', np.median(a1d), np.percentile(a1d,75))

### Operations over a given axis

In [None]:
print(a2d)
print('sum  :',a2d.sum())
print('sum  :',a2d.sum(axis=0))
print('sum  :',a2d.sum(axis=1))

### Vectorisation: operations on whole arrays

In [None]:
a=np.random.rand(4)
print(a)
np.exp(a/100.)/a

In [None]:
# Non-vectorised
r=np.zeros(a.shape)   # create empy array for results

for i in range(a.size):
    r[i] = np.exp(a[i]/100.)/a[i]
r

Vectorization is generally faster than a for loop.
But for complicated algorithms it might not be possible or the most readable:
<code>
for i in range(len(a)):
    if isprime(i):
        r[i] = a[i]
    else
        r[i]= a[i]+a[i-1]
</code>

## Exercise 3

Consider an 4x5 2D array of negative integers:

In [None]:
a = np.arange(-100, 0, 5).reshape(4, 5)
a

Suppose you want to return an array `result`, which has the squared value when an element in array `a` is greater than `-90` and less than `-40`, and is 1 otherwise.

* With a For loop it would look like this:

In [None]:
result = np.zeros(a.shape, dtype=a.dtype)

for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        if a[i, j] > -90 and a[i, j] < -40:
            result[i, j] = a[i, j]**2
        else:
            result[i, j] = 1
            
result

* Can you write a vectorised solution?

Hint: use np.logical_and() and np.logical_not() to create a condition


In [None]:
# Your code here
# 



## Masked arrays - how to handle (propagating) missing values

![](../figures/masked_array.png)

All operations related to masked arrays live in `numpy.ma` submodule.

The simplest example of manual creation of a masked array:

In [None]:
a = np.ma.masked_array(data=[1, 2, 3],
                       mask=[True, True, False],
                       fill_value=-999)
a

Often, a task is to mask array depending on a criterion.

In [None]:
a = np.linspace(1, 15, 15)

In [None]:
masked_a = np.ma.masked_greater_equal(a, 11)

In [None]:
masked_a

### Solution to Exercise 3 

A more less verbose and quicker approach would be:

In [None]:
condition = np.logical_and(a > -90,a < -40)
condition

In [None]:
result[condition] = a[condition]**2
result[np.logical_not(condition)] = 1
print(result)

### A one-liner using ``np.where``:

In [None]:
result = np.where(condition, a**2, 1)
print(result)

## Exercise 4

* Create a "data" array of linearly spaced numbers in the interval (-10, 20) spaced by 0.5
* Calculate the logarithm of the vector
* Create a condition - a True/False (boolean) array, corresponding to the masked values
* The data array should be masked when all of the following conditions apply
    - larger or equal than 10
    - larger than -1 and smaller than 1 
* Mask the array depending on the condition
* Hint: use `np.where` function

In [219]:
# Your code:
# Hint: use `np.linspace` or `np.arange` functions


In [218]:
# Your code:
# Hint: use np.isfinite
# condition = 

In [217]:
# Hint: use np.ma.masked_where(condition,arr)
#masked_arr=
#print(masked_arr)

## Shape manipulation

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
print('{} <-- array'.format(a))
print('{} <-- its shape'.format(a.shape))

In [None]:
a.flatten()

In [None]:
a.repeat(4)

In [None]:
a.reshape((3, 2))

In [None]:
print('Old shape: {}'.format(a.shape))
print('New shape: {}'.format(a.reshape((3, 2)).shape))

#### Add a dimension

In [None]:
a[..., np.newaxis].shape

## (One) solution to exercise 5

In [None]:
# Your code:
arr = np.arange(-10,20.1,0.5)
arr=np.log(arr)
print(arr)

In [None]:
# Your code:
condition = np.logical_or(np.logical_or(abs(arr)<1,arr>=10), np.logical_not(np.isfinite(arr)))
print(condition)


In [None]:
# use np.ma.masked_where(condition,arr)
masked_arr=np.ma.masked_where(condition,arr)
print(masked_arr)

## Exercise 5

Generate a 2d array with 5x5. The first value is 0 and it grows left to right and top to bottom in increments on 0.1.

In [None]:
e4 = np.arange(0.,2.5,.1)
e4

In [None]:
e4.reshape([5,5])

## Broadcasting

The fact that NumPy operates on an element-wise basis means that in principle arrays must always match one another's shape. However, NumPy will also helpfully "broadcast" dimensions when possible. 

![](http://www.astroml.org/_images/fig_broadcast_visual_1.png)
[Image source](http://www.astroml.org/book_figures/appendix/fig_broadcast_visual.html)

## References
* [NumPy docs](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
* [SciPy lectures](http://www.scipy-lectures.org/)
* [NCAS Introduction to Scientific Computing Course](http://www.ceda.ac.uk/ncas-reading-2015/)