# NumPy basics

## Import convention

In [1]:
import numpy as np
np.random.seed(657743)

In [2]:
%pylab inline
plt.style.use("bmh")

%pylab is deprecated, use %matplotlib inline and import the required libraries.
Populating the interactive namespace from numpy and matplotlib


## Creating arrays from Python sequences

Core data structure in NumPy:

In [3]:
?np.ndarray

[1;31mInit signature:[0m [0mnp[0m[1;33m.[0m[0mndarray[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
ndarray(shape, dtype=float, buffer=None, offset=0,
        strides=None, order=None)

An array object represents a multidimensional, homogeneous array
of fixed-size items.  An associated data-type object describes the
format of each element in the array (its byte-order, how many bytes it
occupies in memory, whether it is an integer, a floating point number,
or something else, etc.)

Arrays should be constructed using `array`, `zeros` or `empty` (refer
to the See Also section below).  The parameters given here refer to
a low-level method (`ndarray(...)`) for instantiating an array.

For more information, refer to the `numpy` module and examine the
methods and attributes of an array.

Parameters
----------
(for the __new__ method; see N

Creating a $2\times 2$ array of floating point numbers (note the garbage in the resulting array):

In [4]:
np.ndarray((2,2), dtype=float)

array([[-2.14771588e+052,  4.25328604e-201],
       [-7.48758875e-086, -1.38755278e-246]])

By default `float` is 64-bit floating point number:

In [5]:
np.ndarray((2,2), dtype=float).dtype

dtype('float64')

It works, but is not very convenient. A more convenient high-level option is

In [6]:
?np.array

[1;31mDocstring:[0m
array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
      like=None)

Create an array.

Parameters
----------
object : array_like
    An array, any object exposing the array interface, an object whose
    __array__ method returns an array, or any (nested) sequence.
dtype : data-type, optional
    The desired data-type for the array.  If not given, then the type will
    be determined as the minimum type required to hold the objects in the
    sequence.
copy : bool, optional
    If true (default), then the object is copied.  Otherwise, a copy will
    only be made if __array__ returns a copy, if obj is a nested sequence,
    or if a copy is needed to satisfy any of the other requirements
    (`dtype`, `order`, etc.).
order : {'K', 'A', 'C', 'F'}, optional
    Specify the memory layout of the array. If object is not an array, the
    newly created array will be in C order (row major) unless 'F' is
    specified, in which case it will be in Fortr

In [None]:
arr = np.array([[7, 2, 3.], [3, 9, 6]])

In [None]:
type(arr)

Each array has known `shape`, `size` and `ndim`:

In [None]:
arr

In [None]:
print("Array shape is", arr.shape)
print("Array size is", arr.size)
print(f"Array has {arr.ndim} dimensions")

And `dtype`, `itemsize` and `nbytes`:

In [None]:
print("Array dtype is", arr.dtype)
print(f"Each item takes {arr.itemsize} bytes")
print(f"Array takes {arr.nbytes} bytes")

## Creating arrays of special shape and/or type

Array with specific `shape` and `dtype`, filled with `0`'s:

In [None]:
zeros_array = np.zeros((2,6), dtype=bool)
zeros_array

In [None]:
arr

Array of `0`'s with the same shape as `arr`, but of different `dtype`:

In [None]:
zeros_like_array = np.zeros_like(arr, dtype=np.complex128)
zeros_like_array

Array with specific `shape` and `dtype`, filled with `1`'s:

In [None]:
ones_array = np.ones((3,9), dtype=np.float32)
ones_array

Array of `1`'s with the same shape as `zeros_array`, and of different `dtype`:

In [None]:
zeros_array

In [None]:
ones_like_array = np.ones_like(zeros_array, dtype=np.float32)
ones_like_array

Range arrays are very common for indexing and as a drop-in replacement for built-int `range`.

The most simple form is `np.arange(n)`: **start** at default `0`, **increment** by default `1`, **end** at `n` (exclusive):

In [None]:
range_array = np.arange(10)
range_array

Or you can specify both **starting** (inclusive) and **ending** points (exclusive):

In [None]:
range_array = np.arange(-5, 5)
range_array

Or all three:

In [None]:
range_array = np.arange(0, 5, 2)
range_array

Negative increment (or *step*) works as usual, but beware of bounds ordering:

In [None]:
range_array = np.arange(0, 10, -2)

In [None]:
range_array

In [None]:
range_array = np.arange(10, 0, -2)
range_array

It's not only integer (hence, it's a generalization of `range`):

In [None]:
range_float_array = np.arange(-0.5, 5., 0.5)
range_float_array

## Basic indexing of numpy arrays

Integer and slicing notations:

In [None]:
range_float_array

Get first element:

In [None]:
range_float_array[1]

Get a slice (right index is not included):

In [None]:
range_float_array[1:2]

Get a slice:

In [None]:
range_float_array[:5]

Get a slice with negative indices:

In [None]:
range_float_array[-5:-2]

Indexing 2D arrays:

In [None]:
arr

In [None]:
arr[:1, 1:]

In [None]:
arr[0, 1:]

In [None]:
arr[0, ::2]

Generally, basic indexing works very similar to usual Python lists, but in many dimensions.

## Boolean and fancy indexing

In [None]:
random_array = np.random.randn(10)
random_array

Most operations (arithmetic, logical, etc.) are vectorized for NumPy arrays and we do not need loops at all. For example, to create a boolean **mask** (i.e. `>` is a **vectorized** operation):

In [None]:
random_array>0

Boolean masks can be used for indexing (including logical operations on mask themselves, as they are vectorized as well!):

In [None]:
random_array[random_array>0]

In [None]:
random_array[(random_array>0) | (random_array<-1)]

In [None]:
random_array>0

In [None]:
random_array<-1

In [None]:
random_array

In [None]:
random_array[(random_array>0) & (random_array<1)]

Instead of using boolean masks, fancy indexing provides an alternative way with index arrays:

In [None]:
np.where(random_array>0)

In [None]:
ix0, = np.where(random_array>0)

In [None]:
ix0

In [None]:
random_array[random_array>0]

In [None]:
random_array[ix0]

In [None]:
random_array = np.random.randn(3, 4)
random_array

You can use other iterabes as indexers, but note the difference:

In [None]:
random_array[[0],[2]]

In [None]:
random_array[0, 2]

## View vs. copy

We'll figure this out a bit later, but can already test one of the main sources of bugs in numerical code:

In [None]:
arr

Creating a **view** and a **copy**:

In [None]:
arr_view = arr[:, :]
arr_copy = arr.copy()

In [None]:
arr_view

In [None]:
arr_copy

Changing an element(-s) in a view (note how assignment works despite fancy indexing):

In [None]:
arr_view[1:, 1:] = 41

In [None]:
arr_new = arr_view[1:, 1:]

In [None]:
arr_view

In [None]:
arr

Changing an element(-s) in a copy:

In [None]:
arr_copy[1:, 1:] = 32

In [None]:
arr_copy

In [None]:
arr

How to check an array is a view to another array:

In [None]:
arr_view.base is arr

In [None]:
arr_copy.base is arr

## Changing array shape

In [None]:
arr

We can simply reshape it:

In [None]:
arr.reshape((6,1))

In [None]:
arr.reshape((6,))

In [None]:
arr.reshape((2,1,3))

Or expand dimensions (adding dimensions of shape `1`):

In [None]:
np.expand_dims(arr, axis=1)

Or transpose it:

In [None]:
arr

In [None]:
arr.T

In [None]:
arr

Default transposing of `3+D` arrays:

In [None]:
np.expand_dims(arr, axis=-1).T.shape, np.expand_dims(arr, axis=-1).shape

In [None]:
np.expand_dims(arr, axis=(-1, -2)).T.shape, np.expand_dims(arr, axis=(-1, -2)).shape

Generic rranspose of an array:

In [None]:
arr_t = np.transpose(np.expand_dims(arr, axis=-1), axes=(1,2,0))

In [None]:
arr_t.shape

Note, that `arr_t` (and other arrays created with a similar operation) are **views** into the original array:

In [None]:
arr_t.base is arr

## Changing array type

It very simple in general:

In [None]:
arr.dtype

In [None]:
arr>2

In [None]:
(arr>2).astype(np.int8)

In [None]:
arr

There are some peculiarities, though:

In [None]:
arr[1,2] = -67

In [None]:
arr

Note, how `-67` transforms to `189` (`189 + 67 = 256` - the largest value for `uint8`, see also [Integer numbers storage in computer memory](https://medium.com/@luischaparroc/integer-numbers-storage-in-computer-memory-47af4b59009)):

In [None]:
arr.astype(np.uint8)

In [None]:
arr.astype(np.float32)

In [None]:
arr.astype(np.complex128)

## Stacking arrays

In [None]:
arr_1 = np.random.randint(10, size=(10,))
arr_2 = np.random.randint(10, size=(10,))

In [None]:
arr_1, arr_2

Stacking arrays vertically:

In [None]:
arr_1.reshape((2,5))

In [None]:
arr_2.reshape((2,5))

In [None]:
np.vstack([arr_1.reshape((2,5)), arr_2.reshape((2,5))])

Stacking arrays horizontally:

In [None]:
np.hstack([arr_1, arr_2])

Stacking along additional dimension:

In [None]:
np.hstack([np.expand_dims(arr_1, 1), np.expand_dims(arr_2, 1)])

In [None]:
arr_1.T

Stacking `1D` arrays:

In [None]:
np.vstack([arr_1, arr_2])

In [None]:
np.vstack([arr_1, arr_2]).T

All of these costs about the same, as transpose and expand operations only create views and `np.vstack` is the same. Check also `np.dstack` and `np.column_stack`.

## Universal functions

For a full list of universal functions, see [ufunc reference](https://docs.scipy.org/doc/numpy-1.15.1/reference/ufuncs.html).

In [None]:
arr_1

In [None]:
arr_2

In [None]:
arr

Sum all elements:

In [None]:
arr.sum()

Sum elements along specific axis:

In [None]:
arr.sum(axis=1)

In [None]:
arr

Sum element along specific axis, but preserve dimensions:

In [None]:
arr.sum(axis=1, keepdims=True)

`mean` is also a `ufunc`:

In [None]:
arr.mean(axis=0)

In [None]:
arr_1, arr_2

Using masking with `where` argument:

In [None]:
arr_1 + arr_2

In [None]:
np.add(arr_1, arr_2, where=(arr_2<6))

In [None]:
np.add(arr_1, arr_2, where=(arr_2<6), out=np.zeros_like(arr_1))

In [None]:
arr_1

In [None]:
arr_2

Inplace operations are straightforward:

In [None]:
np.add(arr_1, arr_2, where=(arr_2<6), out=arr_2)

In [None]:
arr_2