# Short Numpy Tutorial

This is a very short introduction to numpy, focused on the basic data structure, `ndarray`. Numpy is the most important scientific package in the Python ecosystem because it provides a common datastructure on which many other packages build on.

![Python scientific ecosystem](http://luispedro.org/files/talks/2013/EuBIAS/figures/sciwheel.png)

To make this tutorial work on Python 2 & Python 3, let's import some future features into Python 2:

In [66]:
from __future__ import print_function, division

In [67]:
# np is the standard abbreviation for numpy in the code
# Even the numpy docs use it
import numpy as np

## What is an ndarray?

The `ndarray` is the biggest contribution of numpy. An ndarray is

- a regular grid of N-dimensions,
- homogeneous by default (all the elements have the same type),
- contiguous block of memory with types corresponding to machine types (8-bit ints, 32 bit floats, 64-bit longs, ...).

### Building an array (inline)

We can build an array from Python lists:

In [68]:
arr = np.array([
    [1.2, 2.3, 4.0],
    [1.2, 3.4, 5.2],
    [0.0, 1.0, 1.3],
    [0.0, 1.0, 2e-1]])
print(arr)

[[ 1.2  2.3  4. ]
 [ 1.2  3.4  5.2]
 [ 0.   1.   1.3]
 [ 0.   1.   0.2]]


Note: Check how to specify the type of the array. Create a 2D matrix of 3 rows and 4 columns initializing with values chosen by you and specify the type of the elements as float.
    

Create a three-dimensional array of 100x100x3 elements of type integer stored by 32 bits. How many ways to do it, do you find?

### Inspecting array properties

In [69]:
print(arr.dtype)
print(arr.ndim)
print(arr.shape)

float64
2
(4, 3)


This array is of `float64` (at least on my computer, probably on yours too), it has 2 dimensions and its shape is 4 rows and 3 columns.

When constructing an array, we can explicitly specify the type:

In [70]:
iarr = np.array([1,2,3], dtype='uint8')
print(iarr)

[1 2 3]


Arithmetic operations on the array : we should take into account that the type has to be respected.

In [71]:
arr *= 2.5
iarr *= 2
print(arr)
print(iarr)

[[  3.     5.75  10.  ]
 [  3.     8.5   13.  ]
 [  0.     2.5    3.25]
 [  0.     2.5    0.5 ]]
[2 4 6]


Ex: What is the problem of:

`iarr *= 2.5 ?`

In [72]:
iarr *= 2.5 

TypeError: Cannot cast ufunc multiply output from dtype('float64') to dtype('uint8') with casting rule 'same_kind'

In [None]:
"""Solution"""
iarr = iarr.astype(float)
iarr *= 2.5 

Has the type of the `iarr` array chaged?
Notice that numpy array creates variable with a certain type. If we do not consider it, our code will not work!

In [None]:
print(iarr.dtype)
print(iarr)


## Indexing

### Slicing & Dicing

We can use Python's `[]` operator to slice and address the array:

Below, you can see some examples of how we can read a matrix; 

In [None]:
print(arr) # The whole matrix
print(arr[0,0]) # First row, first column
print(arr[1]) # The whole second row
print(arr[:,2]) # The whole third column

### Working with slices of an array.

Slices share memory with the original array! In the following code, the variable `view` corresponds to a slice of the array `arr`.

In [None]:
# The position arr[1,0] = x. If we move its value to view, and modify view, we can see how arr[1,0] also is modifed.

print("Before: {}".format(arr[1,0]))

# adding 100
view = arr[1]
view[0] += 100

print("After: {}".format(arr[1,0]))

Ex: How can we avoid memory sharing between variables? We should use the `.copy()` function.

In [None]:
"""Solution"""
print("Before: {}".format(arr[1,0]))
view = arr.copy()[1]
view[0] += 100
print("After: {}".format(arr[1,0]))

### Visual illustration of slicing

In [None]:
a = np.array([
       [ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

![slicing](https://scipy-lectures.github.io/_images/numpy_indexing.png)

This image is taken from [scipy-lectures](https://scipy-lectures.github.io/intro/numpy/array_object.html), a more complete tutorial on numpy than what we have here.

## Boolean operations

An important subset of operations with numpy arrays concerns using logical operators to build boolean arrays. For example:


In [None]:
is_greater_one = (arr >= 1.)
print(is_greater_one)

In [None]:
print (arr)

Ex: Put -10 in all elements of `arr` that are bigger than 100.

In [None]:
"""Solution"""
arr[(arr>100)] = -10
print(arr)

Ex: Construct a second array `arr2` that contains only the values of `arr` that are between 30 and 50.

In [None]:
"""Solution"""
arr2 = arr[(arr>30)&(arr<50)]
print(arr2)

## Basic functions on arrays



In [None]:
arr.mean()

Also available: `max`, `min`, `sum`, `ptp` (point-to-point, i.e., difference between maximum and minimum values).

These functions can also work *axis-wise*:

In [None]:
arr.mean(axis=0)

In order to save code lines, an important trick is to combine logical operations:

In [None]:
is_greater_one = (arr > 1)
print(is_greater_one)
print(is_greater_one.mean())

## Broadcasting

You can often perform operations along the array bu 

In [None]:
print(arr)
print("Now adding [1,1,0] to *every row*")
print()
arr += np.array([1,1,0])
print(arr)

Ex: Add the vector [1,2,3,4] to each column.

In [None]:
"""Solution"""
print(arr)
arr3 = arr.transpose()
arr3 += np.array([1,2,3,4])
arr = arr3.transpose()
print(arr)

The exact [rules of how broadcasting works](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) are a bit complex to explain, but it generally works as expected. For example, if your data is a set of measurements for a sample, and your columns are the different types of measurements, then, you can easily remove the mean like this:

## Footnotes

[homogeneous]: There is a loophole to get heterogeneous arrays, namely an array of `object`. Then, you can store any Python object. This comes at the cost of decreased computational efficiency (both in terms of processing time and memory usage).

In [None]:
arr = np.array([
    [1.2, 2.3, 4.0],
    [1.2, 3.4, 5.2],
    [0.0, 1.0, 1.3],
    [0.0, 1.0, 2e-1]])
print ('The original matrix is:\n ', arr)
print()
print('The average value per column is: ',arr.mean(0))
print('The average value per row is: ',arr.mean(1))
print()
# here we make a copy of the variable arr since we will modify it several times
arr_aux1 = arr.copy()
arr_aux2 = arr.copy()

# we substract the average values to the whole matrix, first row based and later column based

arr_aux1 -= arr.mean(0)
print('The average value after subtracting the average values per row is: ') 
print(arr_aux1)

arr_aux2 = arr_aux2.transpose()
arr_aux2 -= arr.mean(1)
print('The average value after subtracting the average values per coloumn is: ')
print(arr_aux2)
print()

# The normalization is performed by dividing the matrix by its mean. 
print(arr.mean())
print ('The original matrix after normalizing is:\n ', arr/arr.mean())