# Numpy Aggregations

It’s very common to want to aggregate along a row or column.  
By default, every NumPy aggregation function will return the aggregate of the entire array.

In [1]:
import numpy as np
np.random.seed(0)

In [2]:
def array_info(array: np.ndarray) -> None:
    print(f"ndim: {array.ndim}")
    print(f"shape: {array.shape}")
    print(f"size: {array.size}")
    print(f"dtype: {array.dtype}")
    print(f"values:\n{array}\n")

In [3]:
x = np.array([[1, 2],
              [5, 3],
              [4, 6]])

array_info(x)

ndim: 2
shape: (3, 2)
size: 6
dtype: int32
values:
[[1 2]
 [5 3]
 [4 6]]



![](../media/np_matrix_aggregation_row.png)

*The axis parameter indicates which axis gets collapsed.*

In [4]:
print(np.max(x, axis=0))

[5 6]


In [5]:
print(np.max(x, axis=1))

[2 5 6]


In [6]:
print(np.sum(x))

21


In [7]:
print(np.sum(x, axis=0))

[10 11]


In [8]:
print(np.sum(x, axis=1))

[ 3  8 10]


In [9]:
print(np.prod(x))

720


In [10]:
big_array = np.random.rand(100_000)

In [11]:
%timeit sum(big_array)
%timeit np.sum(big_array)

13 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
40 µs ± 606 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [23]:
13 * 1000 / 40  # Speedup

325.0

## Minimum and Maximum

In [13]:
print(min(big_array))
print(max(big_array))

3.3105544573475143e-06
0.9999779517807228


In [14]:
print(np.min(big_array))
print(np.max(big_array))

3.3105544573475143e-06
0.9999779517807228


In [15]:
%timeit min(big_array)
%timeit np.min(big_array)

7.38 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
30.3 µs ± 784 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [24]:
7.38 * 1000 / 30.3

243.56435643564356

### Multi dimensional aggregates

In [16]:
M = np.random.random(size=(3, 3))

print(M)

[[0.53525707 0.90404425 0.50239657]
 [0.10087001 0.52758198 0.71122893]
 [0.31295428 0.05032535 0.12328206]]


In [17]:
print(M.sum())

3.767940501755314


In [18]:
print(np.sum(M))

3.767940501755314


In [19]:
print(M.min(axis=0))

[0.10087001 0.05032535 0.12328206]


In [20]:
print(M.max(axis=1))

[0.90404425 0.71122893 0.31295428]


### Other aggregation functions

#### Statistics

| Name | Description |
|-|-|
| median | Compute the median along the specified axis. |
| average | Compute the weighted average along the specified axis. |
| mean | Compute the arithmetic mean along the specified axis. |
| std | Compute the standard deviation along the specified axis. |
| var | Compute the variance along the specified axis. |

In [21]:
x = np.array([1, 2, 3, 4])

print(np.mean(x))
print(np.std(x))
print(np.var(x))

2.5
1.118033988749895
1.25


#### Sums, products, differences

| Name | Description |
|-|-|
| prod | Return the product of array elements over a given axis. |
| sum | Sum of array elements over a given axis. |
| cumprod | Return the cumulative product of elements along a given axis. |
| cumsum | Return the cumulative sum of the elements along a given axis. |
| diff | Calculate the n-th discrete difference along the given axis. |

In [26]:
x = np.array([1, 2, 3, -4])

print(np.prod(x))
print(np.prod(x, axis=0))
print(np.cumprod(x))
print(np.cumsum(x))
print(np.diff(x))

-24
-24
[  1   2   6 -24]
[1 3 6 2]
[ 1  1 -7]
