<p style="font-family: Arial; font-size:3.75em;color:purple; font-style:bold"><br>
Introduction to numpy:
</p><br>

<p style="font-family: Arial; font-size:1.25em;color:#2462C0; font-style:bold"><br>
Package for scientific computing with Python
</p><br>

<p>This course has been prepared by Tamas Gal</p>

Numerical Python, or "Numpy" for short, is a foundational package on which many of the most common data science packages are built.  Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.  

The key features of numpy are:

- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient.  There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
- Vectorization: enables numeric operations on ndarrays.
- Input/Output: simplifies reading and writing of data from/to file.

<b>Additional Recommended Resources:</b><br>
<a href="https://docs.scipy.org/doc/numpy/reference/">Numpy Documentation</a><br>
<i>Python for Data Analysis</i> by Wes McKinney<br>
<i>Python Data science Handbook</i> by Jake VanderPlas



In [None]:
import numpy as np
import sys

print("Python version: {0}\n"
      "NumPy version: {1}"
      .format(sys.version, np.__version__))

In [None]:
def describe(np_obj):
    """Print some information about a NumPy object"""
    print("object type: {0}\n"
          "size: {o.size}\n"
          "ndim: {o.ndim}\n"
          "shape: {o.shape}\n"
          "dtype: {o.dtype}"
          .format(type(np_obj), o=np_obj))

In [None]:
from IPython.core.magic import register_line_magic

@register_line_magic
def shorterr(line):
    """Show only the exception message if one is raised."""
    try:
        output = eval(line)
    except Exception as e:
        print("\x1b[31m\x1b[1m{e.__class__.__name__}: {e}\x1b[0m".format(e=e))
    else:
        return output
    
del shorterr

## The basic datastructure in NumPy: `ndarray`

In [None]:
a = np.array([1, 2, 3, 4, 5, 6])
a

In [None]:
type(a)

### Array properties

In [None]:
a.size  # number of elements

In [None]:
a.ndim # 1 dimension (vector)

In [None]:
a.shape 
a[1]='toto'

In [None]:
a.dtype # try to change one of the value to a string

### Array are mutable

In [None]:
b= np.array(['toto'])
b

In [None]:
print(id(a))
a[0]=6

print(id(a))

### Multi-Dimensional Arrays

In [None]:
b = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
b

In [None]:
describe(b)

### Array Methods

In [None]:
a.min(), a.max(), a.mean(), a.sum()

In [None]:
b

In [None]:
b.sum()

In [None]:
b.sum(axis=0)

In [None]:
b.sum(axis=1)

## Operations with Arrays

In [None]:
a

In [None]:
a - 42

In [None]:
a * 42 / np.pi

In [None]:
a**np.e, np.e**a

In [None]:
a * a  # element-wise

In [None]:
a @ a  # use np.dot(a, a) if you are using < Python 3.5

In [None]:
a

In [None]:
a < 3

In [None]:
a == 4

In [None]:
(a > 3) & (a < 5)  # bitwise AND

In [None]:
a < np.array([2, 3, 5, 2, 1, 5])

In [None]:
np.sum(a > 2)

## Basic Indexing and Slicing

In [None]:
a[0]  # indexing starts at 0

In [None]:
a[-1]  # -1 refers to the last element

In [None]:
a[2:6:3]  # just like in Python: [start:end:step]

In [None]:
a[::-1]  # reversing an array

In [None]:
b[::-1]  # reverses axis 0

### Indixing and Slicing in Multiple Dimensions

In [None]:
b

In [None]:
b[0, 2]

In [None]:
b[0, 1:4]

In [None]:
b[:, 1:4]  # the `:` selects the whole axis

In [None]:
b[:, 2:5:2]

In [None]:
b[::-1, ::-1]  # reverses both axes

### Advanced Indexing

In [None]:
d = np.array([4, 3, 2, 5, 4, 5, 4, 4])

In [None]:
mask = np.array([True, False, False, True, False, False, True, True])
mask

In [None]:
d[mask]

In [None]:
d[[1, 3, 1, 6]]

#### Be careful with boolean indexing, the mask has to be a boolean array or a list of booleans.

In [None]:
d

In [None]:
d[[False, True, False, False, True, False, False, True]]

In [None]:
d[[0, 1, 0, 0, 1, 0, 0, 1]]  # although we know that True==1 and False==0

In [None]:
d[np.array([0, 1, 0, 0, 1, 0, 0, 1], dtype=bool)] 

## The `dtype`

In [None]:
np.dtype

In [None]:
a, a.dtype

In [None]:
e = a * 42 / np.pi  # NumPy will choose the "right" `dtype` automatically
e, e.dtype

## Helper Functions to Create Arrays

In [None]:
np.arange(7)

In [None]:
np.ones(10)

In [None]:
np.zeros(5)

In [None]:
np.zeros((2, 4))

In [None]:
np.empty(20)

In [None]:
np.eye(5)

In [None]:
np.linspace(1, 2, 11)

In [None]:
np.ones_like(b)

In [None]:
np.ones(10, dtype='i2')

### Random numbers

In [None]:
np.random.randint(1, 10, (2, 20))

In [None]:
np.random.random((3, 4))

In [None]:
np.random.uniform(0, 5, 10)

## Broadcasting

In [None]:
g = np.array([1, 2, 3, 4])
h = np.array([5, 6, 7, 8])
g * h  # if the shapes match, operations are usually done element-by-element

In [None]:
g * 23  # as we have already seen, the rule relaxes when the shapes meet certain constraints

### Broadcasting rules
- NumPy compares the shapes element-wise, starting with the trailing dimension
- two dimensions are compatible if they are equal or one of them is __1__
- raises a `ValueError: frames are not aligned` if the shapes are incompatible
- the size of a successfully broadcasted array is the maximus size along each dimension of the input arrays

### Operation on two arrays with different shapes
```
A      (4d array):  5 x 1 x 4 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  5 x 7 x 4 x 5
```

In [None]:
arr_1 = np.array([[1, 2, 3], [4, 5, 6]])
arr_2 = np.array([[1], [2]])

print('arr_1 shape:', arr_1.shape)
print('arr_2 shape:', arr_2.shape)

arr_3 = arr_1 + arr_2
print('arr_3 shape:', arr_3.shape)

arr_3

In [None]:
i = np.arange(20).reshape(4, 5)
i

In [None]:
describe(i)

In [None]:
i * np.array([[0], [1], [2], [4]])

In [None]:
j = np.array([0, 10, 20, 30])
k = np.array([7, 8, 9])

In [None]:
%shorterr j+k

In [None]:
j[:, np.newaxis]  # inserts a new axis, making it two dimensional

In [None]:
j[:, np.newaxis] + k

## Universal Functions (`ufunc`)

#### A `ufunc` is a "vectorized" wrapper for a function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs.

NumPy provides a bunch of `ufunc`s:
- Math operations (`add()`, `subtract()`, `square()`, `log10()`, ...)
- Trigonometric functions (`sin()`, `cos()`, `tan()`, `deg2rad()`, ...)
- Bit-twiddling functions (`bitwise_and()`, `right_shift()`, ...)
- Comparison functions (`greater()`, `less_equal()`, `fmax()`, ...)
- Floating functions (`isnan()`, `isinf()`, `floor()`, ...)
    
They all are subclasses of `np.ufunc`

In [None]:
type(np.cos)  # they all are subclasses of np.ufunc

### Create your own `ufunc` with `np.frompyfunc(func, nin, nout)`

In [None]:
m = np.random.randint(0, 100, 17)
m

In [None]:
def step_23(x):
    return 1 if x > 23 else 0

In [None]:
%shorterr step_23(m)

In [None]:
ustep_23 = np.frompyfunc(step_23, 1, 1)

In [None]:
ustep_23(m)

## Views and Copies

In [None]:
n = np.arange(10)
n

In [None]:
o = n         # `o` will point to `n`
o[2] = 99
n             # changing `o` has changed `n`

In [None]:
p = n[5]      # single element access returns a copy
p

In [None]:
p = 9999
o             # o is not affected when `p` is changed

### Slices return (memory) views

In [None]:
o= np.arange(10)
q = o[2:4]    # slices return (memory) views
q


In [None]:
q[1] = 99  # changing elements of `o` are actual changes to `a`
o

## Acknowledgements
![](images/eu_asterics.png)

This tutorial was supported by the H2020-Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477).