<a href="https://colab.research.google.com/github/vsantos03/AulasCECE/blob/main/notebooks/01-numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NUMPY

Unlike R/MATLAB, Python relies on libraries for numerics.

- No builtin types for numeric computation
- However, packages like `numpy` are _quasi-standard_

## Basic array type

`numpy.ndarray`, which is a multi-dimensional array of numbers.

In [1]:
import numpy as np # <- import a library, like include/require in other languages

v1 = np.array([3, 8, 6.0, 12, 5])
v1

array([ 3.,  8.,  6., 12.,  5.])

In [2]:
type(v1)

numpy.ndarray

indexing is quite trivial ...

In [3]:
print(v1[0])
print(v1[0:2])
print(v1[-1])
print(v1[:3])

3.0
[3. 8.]
5.0
[3. 8. 6.]


In [4]:
len(v1)

5

In [5]:
v1.shape

(5,)

A numpy array can be created by:
- providing a list of elements
- creating an array of zeros, or an array of ones (the resulting array contains floats by default)

In [6]:
np.array([3, 8, 6, 12])

array([ 3,  8,  6, 12])

In [7]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [8]:
np.ones(5)

array([1., 1., 1., 1., 1.])

Matrices are created in a similar way:
- by providing a list of lists
- matrix of zeros or matrix of ones

In [9]:
A = np.array([
    [0,1,2],
    [2,3,4],
    [4,5,6],
    [6,7,8]])
A

array([[0, 1, 2],
       [2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

In [16]:
B = np.zeros([3, 4])
print(B)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [17]:
C = np.ones([3, 4])
print(C)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [18]:
A.shape

(4, 3)

In [19]:
print(A[0,0])
print(A[0,1])
print(A[1,0])

0
1
2


## Why do we need numpy?

Couldn't we just use lists?

In [20]:
A = np.array([1,2,3])
B = [1, 2, 3]

1. numpy arrays have extra numeric methods.
2. efficiency
3. expressiveness


In [21]:
A

array([1, 2, 3])

In [22]:
A.mean()

2.0

In [23]:
A.std()

0.816496580927726

In [24]:
A.max()

3

You can also use numeric operations with arrays, they work **element-wise**:

In [25]:
A

array([1, 2, 3])

In [26]:
A + 1

array([2, 3, 4])

In [27]:
A * 2

array([2, 4, 6])

Operations with two arrays also work **element-wise**:

In [28]:
B = np.array([1,1,2])
print(A)
print(B)

[1 2 3]
[1 1 2]


In [31]:
A + B

array([2, 3, 5])

In [32]:
A * B

array([1, 2, 6])

## Matrix/vector operations

In [33]:
A = np.array([
            [1,0,1],
            [0,2,0],
            [0,0,1]])
B = np.array([1,2,3])

print(np.dot(A, B))

[4 4 3]


## Numpy arrays can be very efficient


## Timing measurements

One simple example (using the magic command `%timeit`):

In [35]:
a = list(range(1024))
%timeit sum(a)

7.21 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [36]:
b = np.arange(1024)
%timeit b.sum()

2 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Actually, not that big of a difference, but the difference gets larger for larger arrays and more complex operations:

In [37]:
a = list(range(1024*1024))
%timeit sum(v*v for v in a)

82.2 ms ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [38]:
b = np.arange(1024*1024)
%timeit (b**2).sum()

1.23 ms ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Now, it starts to matter.

## Numpy arrays are *homogeneous*

- All members of an array have the same type
- Either integer or floating pooint
- Defined **when you first create the array**

In [39]:
A = np.array([0, 1, 2]) # <- IMPLICIT TYPE
A.dtype

dtype('int64')

In [40]:
B = np.array([0.5, 1.1, 2.1])
B.dtype

dtype('float64')

In [41]:
C = np.array([0, 1, 2], dtype=np.float64) # <- EXPLICIT TYPE
C.dtype

dtype('float64')

Besides the speed, it is also more expressive.

## Numpy data types

- `np.int8`, `np.int16`, `np.int32`, `np.int64`
- `np.uint8`, `np.uint16`, `np.uint32`, `np.uint64`
- `np.float32`, `np.float64`, `np.float16`, (and, sometimes, `np.float128`)
- `np.bool`

Note that these can over/underflow:

In [42]:
A = np.array([1,2,3], np.uint8)
A - 10

array([247, 248, 249], dtype=uint8)

## Reduce along axis operations

If you have a multidimensional array, you can reduce it along one of its axis:

In [43]:
A = np.array([
    [0,0,1],
    [1,2,3],
    [2,4,2],
    [1,0,1]])

In [44]:
A.max(axis=0)

array([2, 4, 3])

In [45]:
A.max(axis=1)

array([1, 3, 4, 1])

In [46]:
A.mean(axis=0)

array([1.  , 1.5 , 1.75])

## Slicing

In [55]:
A = np.array([
    [0,1,2],
    [2,3,4],
    [4,5,6],
    [6,7,8]])

A.shape

(4, 3)

In [48]:
A[0]

array([0, 1, 2])

In [49]:
A[0].shape

(3,)

In [50]:
A[1]

array([2, 3, 4])

In [None]:
A[:,2]

array([2, 4, 6, 8])

## Slices share memory!

A slice is a *view* into another array:

In [56]:
A

array([[0, 1, 2],
       [2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

In [57]:
B = A[0]
B[0] = -1
A

array([[-1,  1,  2],
       [ 2,  3,  4],
       [ 4,  5,  6],
       [ 6,  7,  8]])

## Argument passing is by reference

In [58]:
def double_array(A):
    A *= 2


A = np.arange(15)
double_array(A)
print(A)

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28]


You need to be careful, but you can always make a copy:

In [59]:
A = np.arange(15)
B = A[0:10].copy()
double_array(B)
print(A)
print(B)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
[ 0  2  4  6  8 10 12 14 16 18]


## Logical Arrays

Arrays of booleans:

In [60]:
A = np.array([-1, 0, 1, 2, -2, 3, 4, -2])
A > 0

array([False, False,  True,  True, False,  True,  True, False])

In [61]:
(A > 0) & (A < 3)

array([False, False,  True,  True, False, False, False, False])

## Logical indexing

In [62]:
A[A < 0] = 0
A

array([0, 0, 1, 2, 0, 3, 4, 0])

## Some helper functions

In [63]:
np.zeros((4,10))

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [64]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [65]:
A = np.array([1, 2, 3, 4, 5])
B = np.zeros_like(A)
B

array([0, 0, 0, 0, 0])