# Numpy & matplotlib

This notebook was written by Tim Hillel (tim.hillel@ucl.ac.uk) for the UCL Department of Civil, Environmental, and Geomatric Engineering (CEGE) Introduction to Python sessions. 

Please contact before distributing or reusing the material below.


## Overview 

This notebook will introduce two new Python *libraries*:
* Numpy
* Matplotlib

## Numpy

Data structures (e.g. `lists` and `dictionaries`) are fine for 1-D data, but what if we have data in two or more dimensions?

Many (all?) of you have used Matlab - use arrays!

Python has matlab-like array functionality - *numpy*

Great documentation and tutorials are available for numpy: 

https://docs.scipy.org/doc/numpy/user/quickstart.html

### Importing libraries

Python is geared towards code reuse, and there is a huge number of libraries you can use to add functionality to Python.

Anaconda has the most useful libraries for data science pre-installed, including `numpy`

We can import a library using the `import` keyword, e.g.

    import numpy
    
However, we usually give numpy the alias *np* using the keyword `as`

In [None]:
import numpy as np

### Creating and manipulating arrays

Unlike matlab, we need to specifically call `numpy` functions when we are using arrays in python.

For example, we can create an array using the `np.array()` function with a list of lists

In [None]:
np.array([[1,2,3],[3,2,1]])

Numpy can also easily create arrays of regular format. Use the `arange` function to create a vector with the numbers 0 to 19, and store it as `a1`

In [None]:
# create vector a1


Arrays have lots of methods you can use with them. Use the `reshape` method to turn the array into a 4x5 array, and store it as `a2`. Can you reshape it to 7x3? What happens if you try?

In [None]:
# reshape a1 to 4x5 and store the result in a2


With any 2D array, we can reshape it with -1 to get it back to a vector. Try reshaping `a2` to a vector (do not store it!)

### Indexing arrays

Indexing arrays is similar to lists, except now we can specify a row and a column. 
Get the 2nd value in the 3rd row of `a2`

We can also slice arrays. Try extracting (from a2):
* the 2nd row of the array `[5, 6, 7, 8, 9]`
* the 3rd column of the array `[2, 7, 12, 17]`
* the top-right 2x3 sub-array `[[3, 4], [8, 9] [13, 14]]`

*Hint*: remember the `:` can be used to select all values in a row or column

### Attributes and methods

Arrays have a datatype, which we can check with the `dtype` *attribute*. Note, as it is an attribute, we do not call it!

In [None]:
a2.dtype

Try dividing the array by two and storing it as a3, and then checking the dtype

Numpy arrays have several attributes and methods. Try checking the a3's `shape`. What about the `max` value? How about the index of the max value?

### Random numbers

We can also use numpy to generate (pseudo) random numbers, using the `random` submodule. 

When generating random numbers, it is a good idea to set the `seed`, so that we can generate the same numbers when we repeat our experiments.

In [None]:
np.random.seed(42)

Create an array of uniform random floats the same shape as a3, between -2 and 2, using the `rand` method in the random submodule. Call it a4

In [None]:
np.random.seed(42)
# create a4


Try calculating the mean and variance (using `sum` and `size`)

Compare the answers you get to using the `mean` and `var` methods

### Boolean arrays and indexing

We can use boolean conditions, e.g. `==` (is equal to) and `=<` (is equal to or greater than) on arrays to create boolean arrays.

Try creating a boolean array of all the values larger than 1 in `a4`

A boolean array can be used a *mask*, which extracts only the elements with a true value, as follows:
    
    <array>[<boolean_mask>]
    
Try extracting all of the values in a4 smaller than -1

## Plotting

As with `numpy`, python borrows its primary plotting interface from matlab.

The main library for plotting in python is `matplotlib`. Matplotlib has multiple interfaces, the most commonly used is the `pyplot` interface. We normally give it the alias `plt`

In [None]:
import matplotlib.pyplot as plt

We can generate a plot with `plt.plot()`. Try plotting the array `a4` and see what happens. Can you explain the plot?

Try adding a3 and a4 and plotting all of the data as a single line. Remember we can *reshape* a 2D array into a vector

## Exercise

You will need to use the matplotlib documentation to achieve this! Feel free to work together and ask for help!

You are going to try and recreate the following plot:

<img src="data/example.jpg">

* Use the numpy random module to generate a 2x500 array of normally distributed numbers with mean 10 and standard deviation 20. Use the random seed 404 to generate the random points
* Plot the data as a 2D scatter plot (so that each column represents one data point x, y, z). Use orange crosses for the data points. Add the label 'data' to the points
* Add the mean to the plot as a large blue dot. Add the label 'mean' to the plot.
* Label the axis 'x' and 'y'.
* Add a legend
* Save the plot as normal_scatter.jpeg