<div class="licence">
<span>Licence CC BY-NC-ND</span>
<span>Valérie Roy</span>
<span><img src="media/ensmp-25-alpha.png" /></span>
</div>

In [1]:
import numpy as np

# aggregating array values

## *numpy* function can **combine** the elements of the array along the specified axis
  
example of some *classic functions*:


| function | behavior|
|------|-----|
| *numpy.sum* | sums elements over an axis|
| *numpy.prod*| multiplies elements along an axis|
| *numpy.min* | returns the smallest element|
| *numpy.max* | returns the greatest element |
| *numpy.argmin* | returns the index of the smallest element|
| *numpy.max* | returns the index of the greatest element |
| *numpy.mean*| computes the means of the elements|
| *numpy.std*  | computes the standard deviation|
| *numpy.var* | .../... |

## suming elements along rows axis

we create a $3$ rows and $4$ columns **matrix**

In [34]:
a = np.random.randint(0, 10, size=(3, 4))
a

array([[0, 8, 1, 8],
       [8, 6, 1, 2],
       [4, 5, 7, 4]])

summing returns the global sum

In [35]:
a.sum()

54

## suming elements along the rows axis
   - it will be the same for the other operations (product, division, ...)   

we create a *3* rows and *4* columns **matrix**

In [36]:
a = np.random.randint(0, 10, size=(3, 4))
a

array([[2, 9, 9, 9],
       [3, 6, 6, 1],
       [5, 1, 8, 0]])

summing along the **rows** axis (0 in our example) is summing the **rows** together

In [37]:
a.sum(axis=0)

array([10, 16, 23, 10])

## suming elements along the columns axis

we create a *3* rows and *4* columns **matrix**

In [6]:
a = np.random.randint(0, 10, size=(3, 4))
a

array([[4, 3, 5, 2],
       [7, 4, 0, 9],
       [1, 9, 4, 9]])

summing along the **columns** axis (1 in our example) is summing the **columns** together

In [7]:
np.sum(a, axis=1)

array([14, 20, 23])

## summing along groups of array (axis 0)

we create two **groups** of one *3* rows and *4* columns **matrix**

In [38]:
a = np.random.randint(0, 50, size=(2, 3, 4))
a

array([[[21,  3,  7, 33],
        [41, 24, 10, 43],
        [42, 34, 31, 40]],

       [[37, 38, 10, 49],
        [10, 11, 31, 13],
        [39, 18, 31,  5]]])

summing along **axis** $0$ is **summing** the two arrays together 

In [39]:
np.sum(a, axis=0) # we sum the arrays

array([[58, 41, 17, 82],
       [51, 35, 41, 56],
       [81, 52, 62, 45]])

In [10]:
a[0] + a[1]

array([[76, 32, 44, 26],
       [56, 56, 59, 32],
       [30, 72, 30, 22]])

## summing along axis 1 when we have several matrices

we create two matrices of size *(3 x 4)*

In [41]:
a = np.random.randint(0, 50, size=(2, 3, 4))
a

array([[[22, 38,  2, 24],
        [44,  1, 19, 33],
        [ 7, 28, 42, 48]],

       [[ 4, 27, 40, 14],
        [44,  3, 24,  9],
        [46, 11, 40, 13]]])

summing along the **axis** $1$ is summing along the **rows** of each array 
   - i.e. we obtain one **row** per array
   - they form a new array

In [12]:
b = np.sum(a, axis=1)
b, b.shape

(array([[ 71,  91, 114,  36],
        [ 92,  61,  92,  69]]), (2, 4))

## summing along axis 2 when we have several matrices

we create two matrices of size *(3 x 4)*

In [13]:
a = np.random.randint(0, 50, size=(2, 3, 4))
a

array([[[11, 47, 42, 17],
        [26,  0, 37,  3],
        [14, 11, 19,  4]],

       [[45, 11, 39, 15],
        [ 8, 29,  1, 28],
        [12,  8, 30, 20]]])

summing along **axis** $2$ is summing along the **columns** of each array 
   - i.e. we obtain one **column** per array
   - they form a new array

In [14]:
b = np.sum(a, axis=2)
b, b.shape

(array([[117,  66,  48],
        [110,  66,  70]]), (2, 3))

## summing without axis when we have several matrices

we create two matrices of size *(3 x 4)*

In [51]:
a = np.random.randint(0, 50, size=(2, 3, 4))
a

array([[[31, 43, 45,  8],
        [48, 28, 30, 49],
        [22, 35, 36, 35]],

       [[46,  8, 32, 35],
        [15, 12, 26, 39],
        [22,  8, 31,  0]]])

it sums all the elements

In [52]:
np.sum(a)

684

## summing in presence of *numpy.nan* values
   - it will be the same for the other functions

**classic functions** have their **NaN-safe** counterpart:

   - *numpy.nansum*, *numpy.nanprod*, ...
   - *numpy.nanmean*, *numpy.nanstd*, *numpy.nanvar*, ...
   - *numpy.nanmin*, *numpy.nanmax*, ...
   - *numpy.nanmedian*, *numpy.nanpercentile*, ...
   
where *numpy.nan* can be **replaced** by a given value  
to avoid NaN contagion

### NaN is dominant in classic operations

   - remember that **only** **float** values can be **NaN**  
   - we create two matrices *(3 x 4)* of type **float**  
   - **where** we insert some **NaN** values

In [53]:
a = np.random.randint(0, 50, size=(2, 3, 4))
a = a.astype(float)
a[0]

array([[26., 30., 34.,  9.],
       [44.,  2., 24.,  7.],
       [12., 18.,  6., 40.]])

In [54]:
a[0, 1, 0] = np.nan
a[0, 2, 2] = np.nan
a[1, 0, 0] = np.nan
a[1, 1, 3] = np.nan
a[1]

array([[nan, 16., 34.,  5.],
       [45., 30., 36., nan],
       [44., 20., 18., 35.]])

 you can see that *numpy.nan* is **dominant** (**contagious**)

In [55]:
np.sum(a) # the result is NaN

nan

## NaN-safe function (summing several matrices without axis and on axis 0)

**NaN** is treated as **zero**

In [61]:
a

array([[[26., 30., 34.,  9.],
        [nan,  2., 24.,  7.],
        [12., 18., nan, 40.]],

       [[nan, 16., 34.,  5.],
        [45., 30., 36., nan],
        [44., 20., 18., 35.]]])

In [62]:
np.nansum(a) # np.nan values are 0

485.0

on **axis=0** the two **arrays** are summed together

In [64]:
np.nansum(a, axis=0)

array([[26., 46., 68., 14.],
       [45., 32., 60.,  7.],
       [56., 38., 18., 75.]])

## NaN-safe function (summing several matrices on axis 1)

**NaN** is treated as **zero**

on **axis=1** the **rows** of the two arrays are **added**

In [63]:
np.nansum(a, axis=1)

array([[38., 50., 58., 56.],
       [89., 66., 88., 40.]])

## NaN-safe function (summing several matrices on axis 2)

**NaN** is treated as **zero**

on **axis=2** the **columns** of the two arrays are **added**


In [49]:
np.nansum(a, axis=2)

array([[ 45.,  92.,  55.],
       [ 74., 103.,  98.]])

## index of min and  max values
   - *numpy.argmax*, *numpy.argmin*

In [24]:
a = np.random.randint(0, 100, 15).reshape(3, 5)
a

array([[22, 63, 30, 10, 93],
       [96, 78, 91, 45, 13],
       [74, 42, 85, 57, 74]])

the indice is given on the **flattened** array

In [25]:
np.min(a) # or a.min()

10

In [26]:
np.argmin(a) # or a.argmin()

3

In [27]:
a.flatten().argmin()

3

In [28]:
a.flatten().argmin()

3

## tests on all values
   - *numpy.all* returns *True* if **all** values are *True*
   - *numpy.any* returns *True* if **any** value is *True*
   - *np.where(cond, x, y)* returns *x* or *y* depending on the condition
   - (https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html)
   
   - they have no NaN-safer counterpart

### all or any values

In [29]:
a = np.random.randint(0, 100, 15).reshape(3, 5)
a

array([[16, 70, 34, 94, 38],
       [19, 75, 39, 81, 87],
       [21, 46, 31, 19, 82]])

In [30]:
# you create a mask
a <= 50

array([[ True, False,  True, False,  True],
       [ True, False,  True, False, False],
       [ True,  True,  True,  True, False]])

In [31]:
np.any(a <= 50)

True

In [32]:
np.all(a <= 100)

True

In [33]:
np.where(a<50, 2*a, 3*a) 

array([[ 32, 210,  68, 282,  76],
       [ 38, 225,  78, 243, 261],
       [ 42,  92,  62,  38, 246]])