# `numpy`

   - https://docs.scipy.org/doc/numpy/reference/
   - http://scipy-lectures.org/intro/index.html
   - https://www.labri.fr/perso/nrougier/from-python-to-numpy/

#### this course is inspired by:
   - Arnaud Legout, Inria
   - Thierry Parmentelat, Inria
   - the numpy documentation
   - stackoverflow
   - ...

#### a quick story of `numpy`
   - Numpy 1.0 in 2006
   - Reused old projects Numeric and Numarray
   - No single coordinator, no single aim
   - Guido refused to add Numeric in the Python
      standard library
      +  Guido deemed the code not maintainable
   - Travis Oliphant merged Numeric and Numarray to create numpy

Many $\texttt{numpy}$ operations are programmed in C, making it faster.

# I) importing the $\texttt{numpy}$ `library` 

   - $\texttt{numpy}$ is `by convention` and `for short` named `np` 

In [None]:
import numpy as np

# II)  array in `numpy`  $\texttt{numpy.ndarray}$

## 1) internal memory layout : array are homogenous

$\texttt{numpy.ndarray}$
   - it is the `core datatype` of $\texttt{numpy}$

a $\texttt{numpy.ndarray}$:
   - it is a `contiguous one-dimensional segment of computer memory`
   - we `combined it` with `indexing schemes` for `multi-dimensions`
   - each `indexing scheme` is a `view` on the `underlying segment`

#### elements
  
   - `all` the elements have the `same data type` (`unlike` python's containers)   
   - elements are `directely` accessed

#### why ? 
   - for the sake of `speed`
   - it is `much faster` to access memory along a `single segment` (each `case` has the same size)
   - than to access `small portions` of memory (the cases) `everywhere` in the memory of the computer (cf. NTFS vs ext4 fragmentation)
   - (as we would be in the case of python lists !)
   - but python lists have other `advantages` 

#### in consequence
   - an $\texttt{numpy.ndarray}$ is very `compact`
   - the methods we can apply are `optimized`

#### you can create $\texttt{numpy.ndarray}$
   - from `existing` arrays (e.g. $\texttt{python}$ arrays)
   - by `constructing` directly $\texttt{numpy.ndarray}$ `objects`
   - by `reading files` with data such as `csv` files

## 2) creating $\texttt{numpy.ndarray}$ from existing $\texttt{python}$ arrays

In [None]:
tab = [1, 2, 3, 4, 5, 6, 7]
np.array(tab)        # from a list

# or

np.array([1, 2, 3, 4, 5, 6, 7])  # from a list
np.array((1, 2, 3, 4, 5, 6, 7))  # from a tuple

In [None]:
type(np.array ([1, 2, 3, 4, 5, 6, 7])) # we create a numpy.ndarray

   - you can `create` an $\texttt{numpy.ndarray}$ from a `python` generator

In [None]:
p = (i for i in range(20) if i%2 == 0) # p is a generator of the even numbers from 0 to 18

In [None]:
p

In [None]:
a = np.fromiter(p, dtype = np.int8)
a

## 3) deducing the type of the elements

   - `without` any indication $\texttt{numpy}$ `decides` on `its own`

### a) initialising with an homogeneous array

#### an `integer` array

In [None]:
np.array([0, 1, 2, 3, 4, 5, 6, 7]).dtype
# elements are all integers
# the array will be an integer-typed array

   - here we `get` integers on $64$ bits

#### a `float` array

In [None]:
np.array([0.72, -1.45, 2.29]).dtype
# elements are all floats
# the array will be a floating-point typed array

#### a `Boolean` array

In [None]:
np.array([True, False]).dtype
# elements are all booleans (scalar type)
# the array will be a boolean typed array

#### a `character string` array

In [None]:
np.array(['héllo', 'world!']).dtype

   - the element must have the same `type`
   - the `longuest` string has $6$ characters
   - `all` the elements will be `strings with $6$ characters`

#### the type is '<U6'
   - `U` for Unicode
   - $6$ is the number of octets to hold the longuest string (here 'world!')
   - (< is for little endian, i.e. the order in which the octets are stored in memory)

## exercise$^{(*)}$


(\*) *the exercices must be completed on the students' free time*

### b) initialisation with an heterogeneous array

   - for example, we mix `integers` and `float`

In [None]:
[0, 1., 2, 3, 4, 5, 6, 7.]

   - containers in python are `heterogeneous`
   - `no` conversion is done

   - but in $\texttt{numpy}$ the `array type` is homogeneous
   - `conversions` shall be `done`

In [None]:
np.array([0, 1., 2, 3, 4, 5, 6, 7.]).dtype

   - $0$, $1$, ..., $6$ are `converted` in `floats`

#### the data type
   - is `deduced` as the *`smaller`* `type` that can `hold` all the elements
   - $\texttt{numpy}$ tries `not` to `lose` information
   - the elements are converted `automatically` and `silently`
   - it is `very different` from the $\texttt{Python}$ `philosophy`

   - `False` is converted to  $0$ of the `integers`

In [None]:
np.array([False, 1, 2, 3, 4, 5, 6, 7]).dtype

   - `True` is converted to $1.$ of the `floats`

In [None]:
np.array([False, 1., 2, 3, 4, 5, 6, 7]).dtype

   - mixing `numbers` and `strings`

In [None]:
np.array([0, 1, 2, 3, 4, 5, 6, '7'])

   - mixing `strings` and `numbers`

In [None]:
np.array(['0', 1, 2, 3, 4, 5, 6, 7])

in such cases ask `stackoverflow` or do not try to understand ...

### d) modifying an element in an $\text{numpy.ndarray}$ can result in a loss of precision

In [None]:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7])
a.dtype

In [None]:
a[0] = 3.14159
a[0]

   - you will `not` get a `float`
   - it will be `converted` to `integer`
   - you have lost `precision`

   - you can `forbid` to modify the `elements` of an array $\texttt{numpy.ndarray.flags.writeable}$
   - i.e. the array became `immutable` (an array with constant elements)

In [None]:
a = np.random.randint(1, 20, 10)

In [None]:
a[0] = 99

In [None]:
a.flags.writeable = False

In [None]:
try:
    a[0] = 100
except ValueError as e:
    print(e)

## 4) giving the type of the elements using $\texttt{dtype}$

$\texttt{numpy}$ types are listed there: https://docs.scipy.org/doc/numpy/user/basics.types.html

In [None]:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7])
a.dtype

###### computer memory
   - a `chunk` of memory of $8$ `bits` forms a `byte` or an `octet`
   - sometime, we speak of `types` as multiples of one `byte`
   - $4$ bytes is $32$ bits, $8$ is $64$ bits

   - you can `access` the number of `bytes` an `element` is `stored on`
   - one `byte` is $8$ bits ($64 = 8 \times 8$)

In [None]:
a.itemsize # 8 bytes (of 8 bits)

#### how many integers can I store on a chunk of memory of $8$ bits ?
   - $2^{8}$
   - from $0$ to $255$ if `unsigned` ($0$ to $2^n-1$)
   - from $-128$ to $127$ if `signed` ($-2^{n-1}$ to $2^{n-1} -1$)

In [None]:
a = np.array([0, 1, 2, 3, 4, 5, 6], dtype=np.int8)
a

In [None]:
a.itemsize # one byte

   - the size in `bytes` of the `array`

In [None]:
a.nbytes # the number of bytes

   - here we `only` calculated the size of the data `buffer underlying` the $\texttt{ndarray}$
   - the $\texttt{ndarray}$ has also some `overhead` of memory to store other attributes (meta data) 

#### conversion from 64-bits integers to 32-bits or 16-bits integers
   - is not `safe`
   - i.e. values can be `truncated`

#### you must master what you do
   - because $\texttt{numpy}$ will obey your wishes

#### integers can be truncated

In [None]:
nmax = 2**64 - 1  # greater unsigned 64-bits integer

In [None]:
np.array([nmax])

In [None]:
np.array([nmax], np.uint32)  # nmax is converted to the greatest 32-bits unsigned integer

In [None]:
2**32-1

#### floats will be truncated too

In [None]:
np.array([1.22, 2.34, 3.57, 4.99], dtype=np.int32)  # you will not obtain floats !

your floats have been `truncated` without hesitation !

#### characters strings

In [None]:
np.array(['0', '1', '2', '3', '4', '5', '6', '7'], dtype='<U4') # your strings will 4 bytes long rather than 1

In [None]:
np.array(['0', '1', '2', '3', '4', '5', '6', '7'], dtype='int16') # you will obtain integers on 16 bits

In [None]:
np.array(['0', '1', True], dtype=np.int16) # you will obtain integers on 16 bits

#### if you do non-sense, errors will be raised

In [None]:
np.array(['0', '1', 'True'], dtype=np.int16) # now you will obtain nothing

In [None]:
# in order to avoid interrupting the execution: we can catch the error 
try:
    np.array(['0', '1', 'True'], dtype=np.int16)
except ValueError as e:
    print("we are here")
    print(e)    

### conversion from one type to another with the method $\texttt{numpy.ndarray.astype}$

In [None]:
d = np.array([1, 2, 3, 4, 5])
d.dtype

In [None]:
d.astype(np.int16)  # cast to another type

In [None]:
d.dtype # the original array has not changed

#### the method returns a new array
   - it does `not modify` the `original` one
   - you have to `assign` the new array to some variable

In [None]:
d = np.array([1, 2, 3, 4, 5])
d = d.astype(np.int32)

#### the conversion is not `safe`: values can be truncated

In [None]:
np.can_cast(np.int32, np.int64)  # ok to convert integers from 32-bits to 64-bits 

In [None]:
np.can_cast(np.int64, np.int32)  # it is not safe to convert from 64-bits to 32-bits

#### you can ask the method to `refuse` an `unsafe` conversion
   - regardless to any modification of values

#### it is `not safe` to cast from an integer 64-bits to an integer 32-bits

In [None]:
try:
    d = np.array([1, 2, 3, 4, 5], dtype='int64')
    d.astype(np.int32, casting='safe')
except TypeError as e:
    print(e)

#### it is `safe` to cast from a 16-bits integer to a 32-bits integer

In [None]:
d = np.array([1, 2, 3, 4, 5], dtype='uint16')
d.astype(np.int32, casting='safe')

### predefined-types for $\texttt{numpy-ndarray}$

https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html

In [None]:
np.sctypes # u for unsigned

### `min` and `max` values of `numpy` `types`

   - integer info

In [None]:
np.iinfo(np.int8).min

In [None]:
np.iinfo(np.int64).max

   - float info

In [None]:
np.finfo(np.float32).min

In [None]:
np.finfo(np.float64).max

### size of arrays

| method                            | what they do                           	  |
|-----------------------------------|---------------------------------------------|
| $\texttt{numpy.size}$             | total number of elements in the array       |
| $\texttt{numpy.ndarray.itemsize}$  | size in bytes of a single item              |
| $\texttt{numpy.ndarray.nbytes}$  	| total size in bytes of the underlying array |
| $\texttt{numpy.shape}$  	        | `dimentional shape` of the array            |
| $\texttt{numpy.ndim}$  	        | `dimentional shape` of the array            |

In [None]:
d = np.array([1, 2, 3, 10, 20, 30])

In [None]:
print(f'array\n {d}\n') # formating string using f and {}

In [None]:
print(f'd.size is {d.size} (number of elements)\n')

In [None]:
print(f'd.itemsize is {d.itemsize} (number of bytes of a single element)\n')

In [None]:
print(f'd.nbytes is {d.nbytes} (number of bytes of the elements in the array)\n')

In [None]:
print(f'shape {d.shape} (dimentional structure of the array)\n')

In [None]:
print(f'shape {d.ndim} (dimentional structure of the array)\n')

In [None]:
print(f'shape {d.ndim} (number of dimensions of the array\n')

### xxx) other methods to create $\texttt{numpy.ndarray}$

| methods                    | what they do                           	|
|---------------------------|-------------------------------------------|
| $\texttt{numpy.array}$  	| create an array                           |
| $\texttt{numpy.empty}$  	| return an empty without initializing its elements |
| $\texttt{numpy.zeros}$  	| return an array filled with $0.$ (float)  |
| $\texttt{numpy.ones}$  	| return an array filled with $1.$ (float)  |
| $\texttt{numpy.linspace}$ | floats spaced evenly within on an interval |
| $\texttt{numpy.arange}$   | integers spaced  evenly on an interval   |
| $\texttt{numpy.random.*}$ | random sampling                           |
| $\texttt{numpy.logspace}$ | return numbers spaced evenly on a log scale  |

#### a) creating `integer` ranges

In [None]:
np.arange(0, 10, 2) # from, to-excluded, step
                    # like the python range
    
np.arange(start=0, stop=10, step=2)

   - `stop` is `excluded` (here $10$)

#### a) creating `float` ranges with `step`


In [None]:
np.arange(0., 3., 0.3) # from, to-excluded, step

   - `stop` is `excluded` (here $3$)

#### b) creating `float` ranges with `number` of values}

In [None]:
np.linspace(0, 1, 5) # from, to-included, number

   - `stop` is `included` (here $1$)
   - (it was `excluded` in $\texttt{numpy.arange}$)

   - very useful for `function input` array

In [None]:
x = np.linspace(0, 6*np.pi, 80)
y = np.sin(x)

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline

In [None]:
plt.figure(figsize=(3, 1))
plt.plot(x, y)

### exercice
   - create an array of size $n \times 2$
   - initialized with `randon numbers` ($\in [0, 1[$)
   - the first column is the $x$ axe, the second the $y$
   - compute the distance of all points

### correction

In [None]:
n = 5
pos = np.random.random((10,2))
dist = np.sqrt(np.power(pos[:, 0], 2) + np.power(pos[:, 1], 2))

### creation of `multi-dimensional` arrays

   - the underlying array of a $\texttt{numpy.ndarray}$
   - is stored in a `contiguous one-dimensional segment of computer memory`
   
   
   - for now we have only created `flat` arrays like `vectors`
   
   
   - `but` with $\texttt{numpy}$ you can create `multi-dimensional` arrays
   - the underlying memory space is `still` a one-dimensional segment
   - `but` it is view as a `multi-dimensional array`  

In [None]:
#np.array?

In [None]:
np.array([[1, 2, 3], [4, 5, 6]], np.int32)
   # a (2 x 3) arrays of 32-bits integers

In [None]:
np.empty(shape=(3, 2))
   # 1 array of shape (3 lines x 2 columnsS)
   # without initialisation
   # but not without value !

In [None]:
np.ones(shape=(3, 4), dtype=np.int16)
   # a (3 X 4) matrix of 32-bits integers
   # initialized to 1

In [None]:
np.zeros(shape=(2, 3, 4, 5))
   # 2 times 3 arrays of shape (4 x 5)
   # initialised with 0.

#### note
   - a `shape = (n, )` shaped array is `not` a `multi-dimensionals` array

In [None]:
np.array([1, 2, 3]) # a (3,) vector

In [None]:
np.array([1, 2, 3]).shape

In [None]:
np.array( [[1, 2, 3]] ).shape

#### example with $\texttt{numpy.random.rand}$

In [None]:
np.random.randn(3, 2)
   # a (3 x 2) array of random samples
   # "standard normal" distribution N(mu=0, sigma=1)

to generate a normal distribution of $\mathcal{N(\mu, \sigma^2)}$

In [None]:
sigma = 0.17
mu = 2.5

In [None]:
sigma * np.random.randn(3, 2) + mu

###  we can `reshape` $\texttt{numpy.ndarray}$ with the method $\texttt{ndarray.reshape}$

   - it changes the `shape` of an array (even a multi-dimensional one)
   - by creating a `new view` with `new indexing`
   - it does not change the `original` array
   
   
   - the `two` arrays `share` the data

   - the new `dimension` must `match` the number of elements of the `original array`

In [None]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
print(a)

In [None]:
a.shape

   - `taking` a new `view` on the array $\texttt{a}$

In [None]:
b = a.reshape(2, 2, 3)  # b is a new view of a
print(b)
print('\nthe shape of b is ', b.shape)

   - the two arrays $\texttt{a}$ and $\texttt{b}$ `share` the same `underlying` array

   - access to elements depends on the `geometry` of the views
   - a is a `flat` array
   - b is a $2 \times 2 \times 3$ shaped-array

In [None]:
a[0] = 99 # we change a

In [None]:
b[0][0][0] # it changes b

   - `several` arrays can share the same data

   - $\texttt{c}$ is a new view of $\texttt{b}$ which is a new view of $\texttt{a}$

In [None]:
c = b.reshape(2, 6) 

In [None]:
c.shape

In [None]:
a[0] = 300

In [None]:
c

### we can `flatten` an array

   - $\texttt{numpy.ravel}$ returns a `view` on the `original data` (when possible)
   - $\texttt{numpy.ndarray.flatten}$ always returns a `copy` ofn the `original data`
   - $\texttt{numpy.ndarray.flat}$ returns a flat `iterator` over the array

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6]])

In [None]:
a_ravel = a.ravel()
a_flatten = a.flatten()

   - if we modify `a_ravel`, `a` will be modified
   - if we modify `a_flatten`, `a` will `not` be modified

In [None]:
a_ravel[0] = 99 ; a[0][0]

In [None]:
a_flatten[2] = 77 ; a[1][0]

   - use of the `flat iterator`
   - for example access the $n^{th}$ element of a $p \times q$ array

In [None]:
a = np.random.randint(0, 20, 12).reshape(4, 3)
a

   - we get the index of the minimum with $\texttt{numpy.ndarray.agrmin}$
   - we obtain a index on the falt array
   - we use the `flat iter` to get the element

In [None]:
a.flat[  a.argmin()  ]

###  we can `resize` $\texttt{numpy.ndarray}$ with the method $\texttt{ndarray.resize}$

   - it change the shape of an array `in-place`
   - i.e. the indexing of the `original array` is `changed`

   - we have the same restriction as for the reshaping
   - the new `dimension` must `match` the number of elements of the `original array`

In [None]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
print(a)

In [None]:
a.resize(2, 2, 3)  # indexes of a are modified
print(a)

In [None]:
a.shape

In [None]:
a.resize(2, 6) # indexes of a are modified
print(a)

In [None]:
a.shape

###  we can `tile`and `repeat` $\texttt{numpy.ndarray}$

   - you can `repeat` `elements` of an array `several times`

In [None]:
a = np.array([[1, 2], [3, 4]])
a

In [None]:
np.repeat(a, 3) # repeating elements 3 times in the flattened array

In [None]:
np.repeat(a, 3, axis=0) # repeating rows 3 times

In [None]:
np.repeat(a, (2, 3), axis=0) # repeating rows 2 times the first array
                             #                3 times teh second array 

In [None]:
np.repeat(a, (4, 3), axis=1) # repeating columns 4 times the first array
                             #                   3 times the second array

   - `repeating` the array along `axis`

In [None]:
np.tile(a, 3)

In [None]:
np.tile(a, (2, 3))

###  we can `concatenate` $\texttt{numpy.ndarray}$ with the method $\texttt{ndarray.concatenate}$

   - you concatenate along a dimension i.e. along an `axis`
   - for example for a 2-dimensional array:
      - $0$ is for `rows`
      - $1$ is for `columns`

   - $\texttt{ndarray.concatenate}$ `returns` a `new` object of type $\texttt{numpy.ndarray}$

#### we can concatenate the rows i.e. `axis` $0$
   - rows are `stacked` on top of each other
   - (axis $0$ is the `default`)

In [None]:
a = np.array([[1, 2, 3, 4, 5, 6]])
b = np.array([[10, 20, 30, 40, 50, 60]])

In [None]:
a, b

In [None]:
a.shape, b.shape

In [None]:
np.concatenate((a, b), axis=0)

In [None]:
np.concatenate((a, b))

In [None]:
a.reshape(2, 3)

In [None]:
b.reshape(2, 3)

In [None]:
np.concatenate((a.reshape(2, 3), b.reshape(2, 3)), axis=0)

#### we can concatenate the columns i.e. `axis` $1$
   - we `extend` the `columns`
   - by concatenating the `rows`

In [None]:
a, b  # one row, 6 columns

In [None]:
np.concatenate((a, b), axis=1) # 1 rows, 12 columns

In [None]:
a.reshape(2, 3) # 2 rows, 3 columns

In [None]:
b.reshape(2, 3) # 2 rows, 3 columns

In [None]:
np.concatenate((a.reshape(2, 3), b.reshape(2, 3)), axis=1) # 2 rows, 6 columns

##### we can `stack`  arrays along an `axis`

$\texttt{numpy.stack}$
   - you `join` a sequence of `arrays` along an `axis`
   - $\texttt{axis = 0}$ the arrays are `rows`
   - $\texttt{axis = 1}$ the arrays are `columns`

In [None]:
a = [0, 2, 3,], [5, 6, 7], [8, 9, 10]

In [None]:
np.stack(a, axis = 0)

In [None]:
np.stack(a, axis = 1)

   - see also $\texttt{np.dstack}$, $\texttt{np.hstack}$, $\texttt{np.vstack}$, $\texttt{numpy.column_stack}$

### `concatenation` in higher dimensions [complement]

#### we consider two $4$-dimensional arrays
   - each array is of shape $(2, 3, 4, 5)$ i.e. $2$ times $3$ matrices of size $4 \times 5$
   - the first arrays contains `even` values
   - the second contains `odd` values

we use $\texttt{numpy.arange(f, t, s)}$ to generate the values of the arrays:
   - a `range` of values `starting` from $\texttt{f}$ `going to` $\texttt{t}$ (non included)
   - with a step $\texttt{s}$  

In [None]:
a = np.array(np.arange(0, 240, 2)) # even numbers from 0 to 238
a.resize(2, 3, 4, 5)               # 2 times 3 matrix of 4 lines and 5 columns

In [None]:
a.shape

In [None]:
a[1] # the first three matrices

In [None]:
a[0][0][0][0]  # last value 

In [None]:
a[1][2][3][4]  # last value 

same with odd numbers

In [None]:
b = np.array(np.arange(1, 241, 2)) # odd numbers from 1 to 249
b.resize(2, 3, 4, 5)
b[0]

#### `concatenation` along axis $0$
   - is like stacking the $2$ times ($3$ matrices of size $4 \times 5$ of the two arrays)
   - i.e. we obtain $4$ times $3$ matrices of size $4 \times 5$

In [None]:
c = np.concatenate((a, b), axis=0)
c.shape

In [None]:
c

#### we can see that:
   - $c[0]$ is $a[0]$
   - $c[1]$ is $a[1]$
   - $c[2]$ is $b[0]$
   - $c[3]$ is $b[1]$

In [None]:
c[0] == a[0]   # we can compare array we obtain an array of booleans

In [None]:
np.all(c[0] == a[0]) # we can check if all booleans are True

#### `concatenation` of $\texttt{a}$ and $\texttt{b}$ `along` axis $1$
   - it is like stacking the $3$ matrices of size $4 \times 5$ of the two arrays
   - i.e. we obtain $2$ times $6$ matrices of size $4 \times 5$
      - the three first matrices come from $\texttt{a}$ (even numbers)
      - the three other matrices come from $\texttt{b}$ (odd numbers)

In [None]:
d = np.concatenate((a, b), axis=1)
d.shape

In [None]:
d

#### `concatenation` of $\texttt{a}$ and $\texttt{b}$ `along` axis $2$
   - in our example, it is like `stacking` the rows of the matrices ($4 \times 5$)
   - i.e. we obtain $2$ times $3$ matrices of size $8 \times 5$
      - the first four rows of the matrices come from $\texttt{a}$ (even numbers)
      - the other rows come from $\texttt{b}$ (odd numbers)
      
      
   - same for last `axis` $3$

In [None]:
d = np.concatenate((a, b), axis=2)
d.shape

In [None]:
d[0]

###  we can `split` $\texttt{numpy.ndarray}$ with the method $\texttt{ndarray.split}$

   - You split along a dimension i.e. along an `axis`
   - for 2-dimensional arrays ($0$ is for `rows`, $1$ is for `columns`)

   - an array of $30$ elements between $0$ and $30$
   - is created with a `shape` $(30, )$ using the function $\texttt{nyumpy.arange}$
   - and `reshaped` to a $5 \times 6$ matrix

In [None]:
a = np.array(np.arange(1, 31)).reshape((5, 6)) # one 5 x 6 matrix of 64-bits integers
a

In [None]:
a.dtype, a.shape

to `split`
   - you indicate the way to split: integer or sub-arrays
   - and the `axis` to split

`split` giving an integer $n$:
   - it splits along the `axis` in two sub-arrays
   - the first array contains the $n$ elements (if possible)
   - the last sub-array contains what remains
   - the last sub-array can be empty

   
   
   
   - if the index exceeds the shape
   - a partial sub-array is returned

In [None]:
np.split(a, [3]) # in axis 0 by default (rows)
                 # you split in two sub-arrays
                 #   - the first is a 3 x 6 sub-array
                 #   - the second is a 2 x 6 partial sub-array

In [None]:
np.split(a, [6]) # the first sub-array is the 5 x 6 array
                 # the second sub-array is an empty-array

In [None]:
np.split(a, [10]) # the first sub-array is the 5 x 6 array
                  # the second sub-array is an empty-array

In [None]:
np.split(a, [3], axis=1) # the first sub-array contains the 3 columns
                         # the second the 3 others

In [None]:
np.split(a, [6], axis=1) # the first sub-array contains the 3 columns
                         # the second is empty

`split` giving an `section`:
   - you indicate along which `axis` the array must be split
   - and you indicate the sections
   - e.g. `[p, q]` in `axis 0` results in:
      - the `first` `p` elements of the axis
      - then the elements from `p` to `q` (`q` is not included)
      - then the elements from `q` to the end

In [None]:
a

In [None]:
np.split(a, [2, 4], axis=0) # the 2 first rows (indices 0 and 1)
                            # then the rows from indice 2 to 4 not included (indice 3 and 4)
                            # then the last row (indice 4)

In [None]:
a1, a2, a3 = np.split(a, [2, 5], axis=1) # the two first columns
                                         # then the columns from 2 to 5 (not included)
                                         # then the last column

In [None]:
a1.shape, a2.shape, a3.shape

#### spliting in higher dimensions

In [None]:
a = np.array(np.arange(1, 61)).reshape((2, 5, 6)) # 2 arrays of 5 x 6 matrix
a

along `axis 0`
   - i have two matrices of shape $5 \times 6$

In [None]:
np.split(a, [1], axis=0) # i split the axis 0 in two, one matrix in each sub-array

along `axis 1` (rows of the matrices)

In [None]:
a

In [None]:
a1, a2 = np.split(a, [2], axis=1) # we obtain two sub-arrays
                                  # they have each:
                                  #    - two times one array of size 2 x six (the [2])
                                  # the first has two matrices with the first 2 rows (as requested by [2])
                                  # the second has the remaining three rows

In [None]:
a1

In [None]:
a2

In [None]:
a1.shape, a2.shape

In [None]:
a1, a2

## 2) you can `access` and `slice` arrays
   - exactely like you `access` or `slice` `python lists`
   - it returns a view, not a copy
   - for the sake of `memory efficient`
   - `copy` with the $\texttt{numpy.ndarray.copy}$ function if needed

In [None]:
a = np.array(np.arange(1, 61)).reshape((2, 5, 6)) # 2 arrays of 5 x 6 matrix
a

In [None]:
a # the whole object

you can re-assign `a` but you loose the original array

   - accessing sub-arrays

In [None]:
a[0] # the first element of axis 0
     # all its rows
     # all its columns

   - the same slice

In [None]:
a[0, :, :] # : means "every thing" on this axis

   - you can initialize a variable to refer to a portion of the object $\texttt{a}$

In [None]:
a0 = a[0, :, :]

In [None]:
a0

In [None]:
a0.shape

   - you can modify a slice

In [None]:
a0[0,:] = [4, 5, 6, 1, 2, 3]

to `modify` a slice
   - you must stay `consistent` with the initial sizes
   - here you need a $5 \times 6$ matrix

In [None]:
a[0]

the object $\texttt{a}$ has also been `modified`
   - $\texttt{a}$ and $\texttt{a0}$ are refering to the same $\texttt{nupy.ndarray}$ object

   - you can create a variable refering to the object $\texttt{a}$

In [None]:
b = a

   - the two variables `share` the same `object`
   - by modifying $\texttt{b}$, you modify  $\texttt{a}$

In [None]:
a[0][0] # the first element of axis 1
        # of the first element of axis 0

In [None]:
b[0][0] # the same as previously but accessed through b

you can affect the element but you must stay `consistent` with the initial sizes

In [None]:
b[0, 0, :] = [100, 200, 300, 400, 500, 600]

you have modified $\texttt{b}$ so you have modified $\texttt{a}$

In [None]:
a

In [None]:
a[0][0][0] # the first element of axis 2 (columns)
           # the first element of axis 1 (rows)
           # of the first element of axis 0 

   - the same

In [None]:
a[0, 0, 0]

In [None]:
a.dtype

In [None]:
a[0, 0, 0] = 999.90 # the value will be truncated to int64

   - `basic slicing` $\texttt{[from:to:step]}$ can be done (as for python lists):
   - basic slicing can be applied in `each` dimension 

In [None]:
a

In [None]:
a[:, 0::2, 3:-1]  # element from 3 to the end-1 on axis 2 
                   # elements from 0 to the end with a step 2 on axis 1  (one in two rows)
                   # : all the elements on axis 0                        (the two matrices)

-----------------

## 3) applying `vectorized` `operations`

to apply `functions` to each element of a $\texttt{numpy.ndarray}$:
   - **never** use a $\texttt{python}$ `loop` !
   - **always** use the `vectorized` versions of the `operations`

   - they are applied to  `each` element of a $\texttt{numpy.ndarray}$
   - but the `loop` is done in the `underlying library`

### Why ?
   - for the sake of `computation time`
   - iterative version are always `slowler`
   
   - $\texttt{numpy}$ provides `optimized functions on numeric types`

   - $\texttt{numpy}$ vectorized functions are called `UFuncs` (universal functions)
   - see https://docs.scipy.org/doc/numpy/reference/ufuncs.html

### execution time
   - we will compute the `execution time` of the programs (with `magic functions`)
   - only to `get an idea` of what's going on
   
   
   - but `never` deduce `intangible rules` from `execution times`
   - (too many parameters are `at play`)

   - we use the magic functions $\texttt{%timeit}$

In [None]:
import numpy as np

In [None]:
a = np.arange(1, 10000)
   # it won't be relevant on a small number of elements

#### we raise each element of $\texttt{a}$ to the power of $2$

   - with a python `loop`

In [None]:
l = []

%timeit for e in a: l.append(e**2)

   - using numpy arrays with a python `loop`

In [None]:
l = np.empty(a.shape)

%timeit for i in np.arange(0, a.shape[0]): l[i] = a[i]**2

   - with a python `comprehension`

In [None]:
%timeit [e**2 for e in a]

   - with a `vectorized` $\texttt{numpy}$ operation

In [None]:
%timeit a**2

   - with a `vectorized` $\texttt{numpy}$ function

In [None]:
%timeit np.power(a, 2)

#### conclusion
   - vectorized operations and functions are `way` much faster !
   - never use `python loop`

### classical operators **are** `UFuncs`
   - classical `operators` applied to $\texttt{numpy.ndarray}$ are `mapped` to `ufuncs`
   - their $\texttt{numpy}$ counterpart


| operator | numpy function    |
|----------|-------------------|
|   $+$    | $\texttt{numpy.add}$ |
|   $-$    | $\texttt{numpy.substract}$|
|   $-$    | $\texttt{numpy.negative}$ |
|   $*$    | $\texttt{numpy.multiply}$ |
|   $/$    | $\texttt{numpy.divide}$ |
|   $//$   | $\texttt{numpy.floor_divide}$ |
|   $\%$   | $\texttt{numpy.mod}$ |
|   $**$   | $\texttt{numpy.power}$ |

In [None]:
a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])
a + b

   - **but** on `python` `list`
   - $\texttt{+}$ is `concatenation`
   - to add `element-by-element` use $\texttt{numpy.add}$

In [None]:
c = [1, 2, 3, 4, 5]
d = [10, 20, 30, 40, 50]
c + d

In [None]:
np.add(c, d)

   - the following function will work for python list and numpy ndarrays

In [None]:
def add (x, y):
    return np.add(x, y)
add(c, d)

### there are many other `UFuncs` functions 

| function         | numpy function    |
|------------------|-------------------|
| comparison       | $\texttt{numpy.greater}$, $\texttt{numpy.less}$, $\texttt{numpy.equal}$, ...|
|   absolute       | $\texttt{numpy.absolute}$ or $\texttt{numpy.abs}$ |
|   trigonometry   | $\texttt{numpy.sin}$, $\texttt{numpy.cos}$, ... |
|   exponentiation | $\texttt{numpy.exp}$, $\texttt{numpy.exp2}$, .. |
|   logarithm      | $\texttt{np.log}$, $\texttt{np.log2}$, $\texttt{numpy.log10}$ |
|   Floating point | $\texttt{numpy.isinf}$, ....|
| not a number     | $\texttt{numpy.isnan}$, $\texttt{numpy.isnull}$, ...|



### checking if a function is a `UFunc`
   - a `UFunc` is a $\texttt{numpy.ufunc}$ object (its type is $\texttt{numpy.ufunc}$)
   - refer to the  `help`
      - $\texttt{help(np.sum)}$
      - $\texttt{numpy.sum?}$
      - $\texttt{numpy.info(numpy.sum)}$

In [None]:
type(np.sum) # numpy.sum is not a Ufunc

In [None]:
type(np.add) # numpty.ad is a UFunc

In [None]:
# help(np.add)

In [None]:
# np.add?

In [None]:
# np.info(np.add)

#### conclusion
   - `write` code using `vectorization` can be `harder` than `loop-based` python `code`
   - but for the sake of time performance you cannot avoid it
   - it is just `another way` to `think` your problem
   - you might need to `use` `different` algorithms or `invent` `new` ones
   
   
   - see https://www.labri.fr/perso/nrougier/from-python-to-numpy/#bibliography

#### **exercice$^{(*)}$ NNN**
   - example of chosing the wrong algorithm ...

   - given two vectors $a$ and $b$ of shape $(n,)$
   - we want to compute the sum of the product of each pairs of elements $a[i]* b[j]$
   - with two algorithms: the `direct` algorithm and a `more subtle` one
   

(\*) *the exercices must be completed in students' mandatory non-attendance time*

   1. compute the function with `two nexted loops in python` (timeit)
   2. compute the function by summing the `inner` product of two vectors
      - $a$ being reshaped to $(n, 1)$ and $b$ to $(1, n)$ (timeit)
      
      
   3. notice that
      - $x[0] \times y[0] + x[0] \times y[1] + x[0] \times y[2] +$  
      - $x[1] \times y[0] + x[1] \times y[1] + x[2] \times y[2] +$  
      - $x[2] \times y[0] + x[1] \times y[1] + x[2] \times y[2]$
      - $ == $  
      - $(x[0] + x[1] + x[2]) \times (y[0] + y[1] + y[2])$  
   
   
   - the two last functions were `implementing` a $O(n^2)$ algorithm
   - while there `exist` a different algorithm in $O(n)$ 
   
   
  4. compute the new algorithm in python (using the built-in function $\texttt{sum}$)
  5. compute the function in `numpy` (using $\texttt{nunmpy.sum}$)

   - example of code to time your functions

In [None]:
def fct_in_python (a, b):
    # your code here
    return 0 # of course another value here ...

n = 1000
a = np.arange(n) # for example
b = np.arange(n)
print(fct_in_python(a, b))
%timeit fct_in_python(a, b)

#### **correction NNN**

   - the `python` way

In [None]:
def fct_in_python (x, y):
    ''' two loops to obtain all the pairs
        of elements (x_i, y_j) '''
    s = 0
    for x_i in x:
        for y_j in y:
            s = s + x_i*y_j
    return s

In [None]:
n = 1000
a = np.arange(n)
b = np.arange(n)
print(fct_in_python(a, b))
%timeit fct_in_python(a, b)

   - the `numpy` way

In [None]:
def fct_in_numpy(x, y):
    ''' x is reshaped in a column vector (len(x), 1)
        y is reshaped in a row column    (1 , len(y))
        we do the product (each array is broadcasted)
    '''    
    z = x.reshape(len(x), 1) * y.reshape(1, len(x))
    return z.sum()

In [None]:
n = 1000
a = np.arange(n)
b = np.arange(n)
print(fct_in_numpy(a, b))
%timeit fct_in_numpy(a, b)

   - the two last functions were `implementing` a $O(n^2)$ algorithm
   - while there `exist` a $O(n)$ algorithm 

In [None]:
def fct_python_better (x, y):
    return sum(y) * sum(y)

In [None]:
n = 1000
a = np.arange(n)
b = np.arange(n)
print(fct_python_better(a, b))
%timeit fct_python_better(a, b)

In [None]:
def fct_numpy_better (x, y):
    return np.sum(y) * np.sum(y)

In [None]:
n = 1000
a = np.arange(n)
b = np.arange(n)
print(fct_numpy_better(a, b))
%timeit fct_numpy_better(a, b)

## 4) $\texttt{na}$, $\texttt{NaN}$, ...
   - Not Avalaible
   - Not a Number

   - $\texttt{numpy.NaN}$ is a `float`   
   - there is `no` equivalent for `integers`

In [None]:
type(np.nan)

  - `NaN` can be tested:

In [None]:
np.isnan([np.log(-1.), 1., np.log(0)])    # np.log(-1) is an invalid value i.e. it will be a NaN

In [None]:
np.log(0) == -np.inf

   - $\texttt{numpy.log(0)}$ is $-\infty$ and not `NaN`
   - see $\texttt{numpy}$ `constants` https://www.numpy.org/devdocs/reference/constants.html

## 5) function can aggregate array values in $\texttt{numpy}$

   - i.e. `combine` the values of the array

#### classic functions:
   - $\texttt{numpy.sum}$, $\texttt{numpy.prod}$
   - $\texttt{numpy.mean}$, $\texttt{numpy.std}$, $\texttt{numpy.var}$
   - $\texttt{numpy.min}$, $\texttt{numpy.max}$
   - $\texttt{numpy.median}$, $\texttt{numpy.percentile}$
   
example of one operation (here the sum)   

### 1) suming an `array` along `axis`

#### a) summing an `array` along `axis`

In [None]:
a = np.random.randint(0, 10, size=(3, 4))
a

   - we have an array of $3$ rows and $4$ columns

`axis = 0` 
   - we sum along the `rows`
   - i.e. we sum the `columns` together

In [None]:
np.sum(a, axis = 0)

`axis = 1`
   - we sum along the `columns`
   - i.e. we sum the `rows` together

In [None]:
np.sum(a, axis = 1)

### 2) summing a group of arrays along `axis`

   - we have $2$ arrays to $3$ rows and $4$ columns 

In [None]:
a = np.random.randint(0, 50, size=(2, 3, 4))
a

  - summing on `axis` $0$ is `summing` the $2$ arrays (of the`axis` $0$) 

In [None]:
np.sum(a, axis=0) # we sum the arrays

   - summing along `axis` $1$ is summing along the `rows` of each array 
   - i.e. we obtain one `row` per array
   - they form a new array

In [None]:
np.sum(a, axis=1)

   - summing along `axis` $2$ is summing along the `columns` of each array 
   - i.e. we obtain one `column` per array
   - they form a new array

In [None]:
np.sum(a, axis=2)

### 3) summing over all the elements

In [None]:
np.sum(a) # sum of all the elements

### 4) summing in presence of $\texttt{numpy.nan}$ values

#### classic functions with their `NaN-safe` counterpart:
   - $\texttt{numpy.nansum}$, $\texttt{numpy.nanprod}$
   - $\texttt{numpy.nanmean}$, $\texttt{numpy.nanstd}$, $\texttt{numpy.nanvar}$
   - $\texttt{numpy.nanmin}$, $\texttt{numpy.nanmax}$
   - $\texttt{numpy.nanmedian}$, $\texttt{numpy.nanpercentile}$

   - `NaN` needs `floats`

In [None]:
a = np.random.randint(0, 50, size=(2, 3, 4)).astype(float)
a

we insert some `NaN` values

In [None]:
a[0, 1, 0] = np.nan
a[0, 2, 2] = np.nan
a[0, 0, 3] = np.nan

a[1, 0, 0] = np.nan
a[1, 1, 3] = np.nan
a[1, 2, 2] = np.nan
a

with `normal` operations `NaN` is `dominant`

In [None]:
np.sum(a) # np.nan is dominant - contagious

we can `treat` `NaNs` as `zero`

In [None]:
np.nansum(a) # np.nan values are 0

   - on `axis=0` 

In [None]:
np.nansum(a, axis=0) # summing the two array together

In [None]:
np.nansum(a, axis=1) # summing the rows together in each array

In [None]:
np.nansum(a, axis=2) # summing the columns together in each array

### 5) index of min adn  max values
   - $\texttt{numpy.argmax}$, $\texttt{numpy.argmin}$

the indice is given on the `flattened` array

In [None]:
a = np.random.randint(0, 100, 30).reshape(5, 6)
a

In [None]:
np.min(a) # or a.min()

In [None]:
np.argmin(a) # or a.argmin()

In [None]:
a.flatten().argmin()

In [None]:
a.flatten().argmin()

## xxx) tests on all values
   - $\texttt{numpy.all}$ returns $\texttt{True}$ if `all` values are $\texttt{True}$
   - $\texttt{numpy.any}$ returns $\texttt{True}$ if `any` value is $\texttt{True}$
   - $\texttt{np.where(cond, x, y)}$ returns $\texttt{x}$ or $\texttt{y}$ depending on the condition
   - (https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html)
   
   - they have no NaN-safer counterpart

In [None]:
a = np.random.randint(0, 100, 30).reshape(5, 6)
a

In [None]:
a <= 50

In [None]:
np.any(a <= 50)

In [None]:
np.all(a <= 100)

In [None]:
#np.where?

In [None]:
np.where(a<50, 2*a, 3*a) 

## 6)  Broadcasting when working with arrays of different sizes

### reminder about the dimensions

`dimension 2`
   - $\texttt{shape=(r, c)}$ $\texttt{r}$ `rows`, $\texttt{c}$ `columns`
   
   
`dimension 3`
   - $\texttt{shape=(p, r, c)}$ $\texttt{p}$ `frames` $\texttt{r}$ `rows`, $\texttt{c}$ `columns`
   
   
`greather dimension`
   - $\texttt{shape=(g1, ..., g, r, c)}$
   - the two last elements are always $\texttt{r}$ `rows`, and $\texttt{c}$ `columns` 

### usually in $\texttt{numpy}$
   - operations are done on `pairs of arrays`
   - on an `element-by-element` basis
   - the two arrays must have `exactly the same shape`

In [None]:
a = np.arange(0, 10)
a

In [None]:
a * a # multiplication element-by-element
      # power of 2

In [None]:
a + a # sum element-by-element

###   $\texttt{numpy}$ `relaxes` this constraint
   - when the `arrays’ shapes` meet `certain conditions`

#### example

In [None]:
a = np.arange(0, 4) # [0, 1, 2, 3]
a

In [None]:
b = np.array([10])  # [10]
b

In [None]:
a + b # [0, 1, 2, 3] + [10, 10, 10, 10] = [10, 11, 12, 13]

   - to `add` the array $[0, 1, 2, 3]$ to the array $[10]$ 
   - the array $[10]$ is `expended` to `match the size` $[10, 10, 10, 10]$

In [None]:
10 + a # the same with a single value

### broadcasting `rules`

   - when arrays `do not have` the `same` shape
   - $\texttt{numpy}$ `expands` the arrays (*when possible*)
   - for `element-by-element` operation to `take place`

   
   
   - dimensions are `compared` from `right` to `left`
   - dimensions are taken `pairwise`
   - broadcasting is `possible`
      - 1) when the `dimensions` are `identical`
      - 1) when `one` is $1$
      
      
   
   - https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
   - http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc

In [None]:
a = 100 * np.ones((2, 3))
a

In [None]:
b = 4
b

   - $a$ has `shape` $(2_a, 3_a)$
   - $b$ has `shape` $(1_b)$
   
   
   
   - $3_a$ is `compared` to $1_b$
   - $1_b$ `became` one row of size $3$ i.e. $b = [4, 4, 4]$
   
   
   - the shape of $b$ is `now` $(1_b, 3_b)$
   
   
   
   - $2_a$ is `compared` to $1_b$
   - $1_b$ `became` two rows i.e. $b = [[4, 4, 4], [4, 4, 4]]$
   


   - the shape of $b$ is `now` $(2_b, 3_b)$
   
   
   - the two arrays can be added `element by element`
   

#### wrong example

   - sometime shapes `cannot` be `broadcasted`

In [None]:
a = 100 * np.ones((2, 3))
a

In [None]:
b = 10 * np.ones((2, 4))
b

   - $a$ has `shape` $(2_a, 3_a)$
   - $b$ has `shape` $(2_b, 4_b)$
   
   
   - broadcasting compares the pairs $(3_a, 4_b)$ then $(2_a, 2_b)$  
   
   
   - $3_a$ is `compared` to $4_b$ and it `fails`
   
   

the operation `does not follow the rules`

In [None]:
try:
    a + b
except ValueError as e:
    print(e)

   - broadcasting compares the pairs $(3_a, 1_b)$ and $(1_a, 2_a)$  

#### 2D example 

In [None]:
a = 10 * np.ones((1, 3))
a

In [None]:
b = 100 * np.ones((2, 1))
b

   - $a$ has `shape` $(1_a, 3_a)$
   - $b$ has `shape` $(2_b, 1_b)$
   
   
   - broadcasting compares the pairs $(3_a, 1_b)$ then $(1_a, 2_b)$ 
   
   
   - first $b$ is broadcasted to `fit` $3$ `columns`
   - $b$ became $[[100., 100., 100.],[100., 100., 100.]]$
   
   
   
   - then $a$ is broadcasted to `fit` $2$ `rows`
   - $a$ became $[[10., 10., 10.], [10., 10., 10.]]$
   
   
   
   - the operation `does follow the rules`
  
   
   - the `element by element` operation can take `place`
   

In [None]:
a+b

### greather dimensions

In [None]:
a = 100 * np.ones((2, 3, 4))
a

In [None]:
b0 = 10
a + b0

In [None]:
b1 = 10 * np.ones((3, 1))
print(b1)
a + b1

In [None]:
b2 = 10 * np.ones((1, 4))
print(b2)
a + b2

In [None]:
b3 = 2* np.ones((1, 3, 1))
print(b3)
a + b3

*and so on, ...*

### Broadcasting and vectorization
   - broadcasting is very efficient
   - (the broadcasted elements are not actually created in memory)
   - broadcasting is based on optimized C code (same efficiency as vectorized operations)

## 7) Array Indexing by `Boolean Masks`

  - Boolean comparison operators are `UFuncs`
  - when applied on arrays, they return the array of the element-by-element comparisons
   
   
   - you obtain a `mask` i.e. you can `filter` your array

In [None]:
a = np.arange(6)
a

In [None]:
a < 0

In [None]:
a > 0

In [None]:
a == 1

### computing with Boolean values
   - $\texttt{False}$ is $0$, $\texttt{True}$ is $1$
   - use it to compute useful functions

   - number of values less than $0.5$ ?

In [None]:
a = np.random.random((3, 4))
a

In [None]:
np.sum(a < 0.5) # number of values less than 0.5 in the array a

In [None]:
np.count_nonzero(a < 0.5) # the same

In [None]:
np.sum(a < 0.5, axis=0) # number of values less than 0.5 in columns

In [None]:
np.sum(a < 0.5, axis=1) # number of values less than 0.5 in rows

In [None]:
np.sum(a < 1) == a.size  # True if all the values are less tha 1

In [None]:
np.all(a < 1)  # the same

In [None]:
np.sum(a < 0.5) >= 1  # True if any element is less than 0.5

In [None]:
np.any(a < 0.5)  # True if any element is less that 0.5

### you can compose questions

In [None]:
a = np.arange(10)
a

number of elements less than 6 and even in the array $\texttt{a}$

In [None]:
np.count_nonzero((a < 6) & (a%2 == 0)) # use parentheses

### in $\texttt{numpy}$
   - use the `bitwise` operators $\texttt{&}$, $\texttt{|}$, $\texttt{~}$ (or respectively `np.logical_and`, `np.logical_or`, `np.logical_not`)
   - do not use the `logical` operators $\texttt{and}$, $\texttt{or}$, $\texttt{not}$
   
   
   - only `bitwise` operators are applied `element-by-element`

### Indexing arrays with `masks`
   - returns the elements for which the mask is True
   - it creates a `new array`, not a `view`on the existing one

In [None]:
a = np.random.rand(3, 4) # a 3 x 4 matrix randomly initialized by values between 0 and 1
a

array of the elements of $\texttt{a}$ that are less than $0.5$

In [None]:
a < 0.5

In [None]:
a[a < 0.5]

indices of the elements

In [None]:
np.argwhere(a < 0.5) # [[i, j], ...] where i is the indice in the rows and j in the columns

### indexing arrays with list or arrays of indices
   - it returns a `new` array, not a `view`

In [None]:
a = np.arange(5)
a

In [None]:
l = [1, 1, 4, 4]
a[l]

In [None]:
l = np.array([1, 1, 4, 4])
a[l]

#### xxx) change elements of an array based on conditions with $\texttt{numpy.putmask}$


   - we want to `modify` the elements of an array
   - based on `conditional`

   - for example replace values between $-0.5$ and $0.5$ with $0$

In [None]:
import numpy as np

In [None]:
a = np.random.randn(20)
a

In [None]:
np.putmask(a, (-0.5<a) & (a<0.5), 0)   # you cannot use `and` here

In [None]:
a

In [None]:
#np.putmask?

## 8) deleting elements, rows and columns

   - `delete` return a new array

In [None]:
a = np.arange(0, 12).reshape(3, 4)
a

   - deleting `rows`

In [None]:
np.delete(a, [0], axis=0)

   - deleting `column` 

In [None]:
np.delete(a, [0, 3], axis=1)

   - deleting `elements`

In [None]:
np.delete(a, [0, 6, 11])

In [None]:
x.put?

## 9) `view`, `copy` and temporary `copy`

#### `views` on arrays

   - `view` does `not` use `computer memory` for `array elements`
   - (only for the indexes)
   - for the `sake` of `memory efficiency`

   - you can `check` if the `resut` of an `operation` is a `view` or a `copy`
   - because `views` need to know the `original array`
   - thus they must `store` it
   - it is `called` $\texttt{numpy.ndarray.base}$

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6]])

   - $\texttt{a}$ is an `original` array
   - it is `not` a `view` on an `existing` array
   - i.e. it does `not` have a `base` array

In [None]:
a.base == None

   - if we `take` a `view` on the array $\texttt{a}$
   - the `view` will refer to $\texttt{a}$
   - i.e. its $\texttt{base}$ is $\texttt{a}$

In [None]:
a_view = a[1:3]

In [None]:
a == a_view.base

   - the `original` array ($\texttt{a}$) and the `base` array of $\texttt{a_view}$ are not only `equal`

In [None]:
a[1:3].base is a

   - they are the same `computer objects`

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.ravel(a)
b.base is a

#### `copy` on arrays

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6]])
b = a.flatten()
b.base is None

#### `temporary arrays`

   - `copy` can be make `implicitly` during the `operations`
   - to `store` `intermediate` values of the `array`

In [None]:
a = np.ones(5)
b = np.ones(5)

In [None]:
x = 3 * a + 5 * b
x

   - `one` `temporary` array holds $3 \times a$
   - `one` holds $5 \times b$
   - and $x$ holds the `result`

In [None]:
a = np.ones(1000000000, dtype=np.int)

In [None]:
b = np.ones(1000000000, dtype=np.int)

In [None]:
timeit('x = 3*a + 5*b', globals())

timeit('np.multiply(a, 3, out=a); np.multiply(b, 5, out=b); np.add(a, b, out=a)', globals())

## XXX) sorting an array 

   - $\texttt{numpy.sort}$ returns a `copy` of the array
   - $\texttt{numpy.ndarray.sort}$ sorts the array `in place`
      
   - you sort along an `axis`

In [None]:
a = np.random.randint(0, 10, (3, 4))
a

   - we can sort in the `axis` of the `column`
   - i.e. the `columns` will be sorted for each `row`
   - i.e. `row` will end-up `sorted`

In [None]:
np.sort(a, axis=1)  # a.sort() will modify a in place

  - we can sort along the `axis` of the `rows`

In [None]:
np.sort(a, axis=0)

## XXX) `advanced` indexing
   - you can define `slices` of arrays using `lists` or `arrays` of index
   - you obtain a `copy` of the `original` array
   
   - *(like we have done for the boolean masks)*

In [None]:
a = np.array([[5, 2, 0], [9, 3, 8], [7, 0, 6]])
a

   - what if, for example, we want to `sort` the array $\texttt{a}$, along a `given column`
   - and `keep` the `rows` the same ?

   - we want to sort the array `by` the `second column`
   - $\texttt{a[:, 1]}$ is $[2_0, 3_1, 0_2]$

   - we get the `indices` of the `sorted column`
   - here $[2, 0, 1]$

In [None]:
np.argsort(a[:, 1])

   - we `indice` the array by the `array of indices`
   - it is called `advanced` indexing 
   - it returns a `copy` of the array `not` a `view`

In [None]:
a[np.argsort(a[:, 1])].base == None # None because it is not a view on an existing array

In [None]:
a[0].base is a # simple indexing it returns a view on the array `a` (it has a base)

In [None]:
a[[0]].base == None # advanced indexing: it returns a copy of the slice

In [None]:
a

## XX) strides

   - `strides` are `offset` you have to `step` in each `dimension` when `traversing` an `array`

https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html#numpy.ndarray.strides

In [None]:
import numpy as np

In [None]:
x = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], dtype=np.int32)

In [None]:
x

##### memory is contiguous i.e.
   - the array is $[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]$
   - each element is $4$ bytes long (i.e. 32 bits)
   
   
   - to go from the `first element` of the `first array`
   - to the `first element` of the `second array`
   - i `step` $5 \times 4$ bytes i.e. $20$ bytes
   
   
   - thus the two `offsets` are $20$ and $4$

   - this is given by the $\texttt{strides}$ of the array

In [None]:
x.strides

   - another example

In [None]:
a = np.arange(0, 24)
x = a.reshape(2, 3, 4)
x

   - the underlying array is $[0, 1, ..., 23]$

In [None]:
x.base is a

   - to go `from` the `first` array ($x[0]$) to the `second` ($x[1]$)
   - we have to step $3 \times 4$ elements
   - each `element` being 8 `bytes`
   - the offset is $3 \times 4 \times 8$

In [None]:
x.itemsize * x.shape[1] * x.shape[2] # 8 x 3 x 4

   - to `step` from `one` row of the array to the `next one`
   - it is $4 \times 8$

In [None]:
x.strides

   `strides` for `views` 

In [None]:
y = x[0, 0:3:2]
y

In [None]:
y.base is x.base

   - to `step` from one `row` to the other
   - we go from element $0$ to element $8$
   - we jump $4$ elements, two times and elements are $8$ bytes
   - $4 \times 2 \times 8$

In [None]:
y.strides

## 10) linear algebra

   - with $\texttt{numpy}$ you `manipulate` `vectors` and `matrices`
   - https://docs.scipy.org/doc/numpy/reference/routines.linalg.html

### a) vector, matrix multiplication with $\texttt{numpy.dot}$

https://stackoverflow.com/questions/3890621/how-does-multiplication-differ-for-numpy-matrix-vs-array-classes

##### we can multiply `two vectors` (`inner product` of vectors)

In [None]:
v = np.array([1, 2, 3])
w = np.array([4, 5, 6])

In [None]:
np.dot(v, w)

In [None]:
v.dot(w)

   - be careful $\texttt{*}$ is `an element by element` product

In [None]:
v*w

##### we can multiply a `matrix` by a `vector`

   - $a$ is a $(3 \times 4)$ matrix

In [None]:
a = np.arange(1, 13).reshape(3, 4)
a

   - the `vector` must have $4$ rows

In [None]:
b = np.arange(11, 15).reshape(4, 1)
b

In [None]:
print(a, '.\n', b)

In [None]:
np.dot(a, b)

   - another way to call $\texttt{numpy.dot}$

In [None]:
a.dot(b)

   - be careful $*$ is the `element-by-element` multiplication

In [None]:
a*a

##### we can multiply two `matrices`

   - $a$ is a $(3 \times 4)$ matrix

In [None]:
b = np.arange(11, 23).reshape(4, 3)
b

   - the second `matrix` must be  $(4 \times 3)$

In [None]:
np.dot(a, b)

   - the method exists also on the $\texttt{numpy}$ `object` 

In [None]:
a.dot(b)

$\texttt{numpy.vdot}$
   - it returns the `dot product` of two `vectors`
   - should `only` be used for `vectors`
   - handles `multidimensional` arrays by `flattening` its `input` arrays


In [None]:
a = np.arange(1, 13).reshape(4, 3)

b = np.arange(11, 23).reshape(3, 4)

print(np.vdot(a, b))

   - to obtain the same with $\texttt{numpy.dot}$
   -you must `flatten` the matrices

In [None]:
np.dot(a.flatten(), b.flatten())

$\texttt{numpy.dot}$ return a $\texttt{numpy.ndarray}$

see also $\texttt{numpy.tensordot}$, $\texttt{numpy.inner}$, $\texttt{numpy.outer}$, ...

### a) matrix multiplication with $\texttt{numpy.matmul}$

In [None]:
np.matmul(a, b)

In [None]:
b = np.array([[10, 20],
              [30, 40]])
b

$\texttt{numpy.matmul}$ return a $\texttt{numpy.ndarray}$

   - in this example $\texttt{numpy.matmul}$ and $\texttt{numpy.dot}$ return the `same` result

In [None]:
np.matmul(a, b) == np.dot(a, b)

   - but it not always the same
   - the functions are `different` see the help
   - or see https://stackoverflow.com/questions/34142485/difference-between-numpy-dot-and-python-3-5-matrix-multiplication

### matrix transposition

In [None]:
a.T

In [None]:
np.transpose(a)

### other mathematic functions

| methods           |   behavior |
|-----------------|--------|
| $\texttt{numpy.linalg.det}$ | determinant |
| $\texttt{numpy.linalg.inv}$ | inversion |
| $\texttt{numpy.linalg.eig}$ | eigen values |
| $\texttt{numpy.linalg..solve}$ | solving equation system |
| $\texttt{numpy.eye}$       |identity matrix |
| $\texttt{numpy.diag}$      | extract diagonal|
| $\texttt{numpy.diag}$      | build diagonal matrix |
| `...`           | ...|


### norm

In [None]:
a = np.arange(1, 16).reshape(3, 5)
a

   - `2-morm`
   - $\displaystyle \left\|{\boldsymbol {x}}\right\|_{2}={\sqrt {x_{1}^{2}+\cdots +x_{n}^{2}}}$ 

In [None]:
np.linalg.norm(a)

In [None]:
np.sqrt(np.sum(np.power(a.ravel(), 2)))

   - `2-norm` on `rows`

In [None]:
np.linalg.norm(a, axis=0)

In [None]:
np.sqrt(np.sum(np.power(a, 2), axis=0))

   - `2-norm` on `columns`

In [None]:
np.linalg.norm(a, axis=1)

In [None]:
np.sqrt(np.sum(np.power(a, 2), axis=1))

### determinant

In [None]:
b = np.random.random(size=(3, 3))
b

In [None]:
np.linalg.det(b)

### diagonal

   - return the diagonal

In [None]:
np.diag(b) 

   - create a diagonal matrix

In [None]:
np.diag([1, 2, 3])

### trace

   - the sum along the `diagonal` of the arrays

In [None]:
b = np.arange(12).reshape(3, 4)
b

In [None]:
np.trace(b)

### inversion

In [None]:
b = np.random.random(size=(3, 3))
b

   - $b^{-1}b=I$
   - (`almost equal` for computer-numbers) 

In [None]:
np.isclose(  np.dot(np.linalg.inv(b), b),   np.identity(b.shape[0]) )

### eignen values

   - $f(v) = \lambda v$ 
   - $M v = \lambda v$

In [None]:
M = np.random.random(size=(3, 3))
l, v = np.linalg.eig(M)  # eigen_values, eigen_vectors
l, v

   - $M v = \lambda v$

In [None]:
np.isclose(   np.dot(M, v[:, 0]),  l[0]* v[:, 0] )

   - $(M - \lambda I) = 0$

In [None]:
np.isclose(    np.dot(M - l[0]*np.identity(3), v[:, 0]),   np.zeros(3))

### solve

a x = b

In [None]:
A = np.random.random(size=(3, 3))
b = [1, 2, 3]

In [None]:
x = np.linalg.solve(A, b) # A x = b
x

In [None]:
np.isclose( np.dot(A, x), b)

## xxx) structured arrays

   - we have seen `homogeneous` arrays
   - i.e. `all the elements` have the `same type` (int, float, bool, str, etc.)

   - we can also define `structured types` (also called `composed-type`, `user-defined` type, ...)
   - (like `class` in c++ or java)

   - the data is a list of `homogeneous` elements
   - but each element is `structured`

### xxx) without giving the `dtype` the array will still be `homogeneous` and `non -structured`

In [None]:
names = np.array([('Carmen', 'Thompson', 27),
                  ('Billie', 'Chambers', 25),
                  ('Kris', 'Moore', 29) ])
names

   - $\texttt{numpy}$ infers a `data type` or a $3 \times 3$ array
   - here a `8 bytes long` character string

   - i.e. each `element` is a `string` of $8$ characters

In [None]:
names[0].dtype

   - we can `lose` information
   - if we `modify` an element, it can be `truncated` if needed

In [None]:
names[0][0] = 'Jean-Sebastien' # it is more than 8 characters long

In [None]:
names[0][0] # 

## specifying the `dtype`

   - if you want the `components` of your objects to be `heteroneneous`
   - you must `specify` the $\texttt{dtype}$

for example:
   - $\texttt{last name}$ is a `10 bytes long unicode character string`
   - $\texttt{first name}$ is a `8 bytes long unicode character string`
   - $\texttt{age}$ is a `1 byte (8 bits) long integer`

In [None]:
names = np.array([('Carmen', 'Thompson', 27),
                  ('Billie', 'Chambers', 25),
                  ('Kris', 'Moore', 29),
                  ('Anna', 'Young', 24) ],
                 dtype = [('first name', '<U10'), ('last name', '<U8'), ('age', np.int8)])
names

   - $\texttt{names}$ is a `flat array` of $4$ elements

In [None]:
names.shape

   - you can access `elements`

In [None]:
names[0]['first name'], names[0]['last name'], names[0]['age']

   - you can modify elements

In [None]:
names[0][0] = 'Maria-Carmelita'

   - the $\texttt{firt name}$ is `too long` for the `defined type`
   - it will be `truncated` to `fit`

In [None]:
names[0][0] 

   - you can `reshape` the array

In [None]:
names = names.reshape((2, 2))
names

In [None]:
names[:, 1] # all rows column 1

   - there are `short cuts` to create `structured` arrays

In [None]:
np.array?

In [None]:
names = np.array([('Carmen', 'Thompson', 27),
                  ('Billie', 'Chambers', 25),
                  ('Kris', 'Moore', 29),
                  ('Anna', 'Young', 24) ],
                 dtype = ['U10,U8,i8'],
                 names = ['first name', 'last name', 'age'])
names