![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [2]:
import sys
import numpy as np

## Basic Numpy Arrays

In [None]:
np.array([1, 2, 3, 4])

In [None]:
a = np.array([1, 2, 3, 4])

In [None]:
b = np.array([0, .5, 1, 1.5, 2])

In [None]:
a[0], a[1]

In [None]:
a[0:]

In [None]:
a[1:3]

In [None]:
a[1:-1]

In [None]:
a[::2]

In [None]:
b

In [None]:
b[0], b[2], b[-1]

Multi-indexing alllows you to provide an array of indexes, and return a new array made of the elements at those indexes

In [None]:
b[[0, 2, -1]]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

In [None]:
a

In [None]:
a.dtype

In [None]:
b

In [None]:
b.dtype

Manually create an array of floats

In [None]:
np.array([1, 2, 3, 4], dtype=np.float)

create an array of 8-bit integers

In [None]:
np.array([1, 2, 3, 4], dtype=np.int8)

In [None]:
c = np.array(['a', 'b', 'c'])

In [None]:
c.dtype

Other data will default to object type

In [None]:
d = np.array([{'a': 1}, sys])

In [None]:
d.dtype

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [None]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [None]:
A.shape

In [None]:
A.ndim

In [None]:
A.size

In [None]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [None]:
B

In [None]:
B.shape

In [None]:
B.ndim

In [None]:
B.size

If the shape isn't consistent, it'll just fall back to regular Python objects:

In [None]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
])

In [None]:
C.dtype

In [None]:
C.shape

In [None]:
C.size

In [None]:
type(C[0])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [None]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [None]:
A[1]

In [None]:
A[1][0]

In [None]:
# A[d1, d2, d3, d4]

a better way to drill down is to use numpy's multi-dimensional syntact, which also lets you use slicing

In [None]:
A[1, 0]

In [None]:
A[0:2]

In [None]:
A[:, :2]

In [None]:
A[:2, :2]

In [None]:
A[:2, 2:]

In [None]:
A

In [None]:
A[1] = np.array([10, 10, 10])

In [None]:
A

numpy will intelligently expand assignments to fill the dimension...

In [None]:
A[2] = 99

In [None]:
A

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [3]:
a = np.array([1, 2, 3, 4])

In [None]:
a.sum()

In [None]:
a.mean()

In [None]:
a.std()

In [None]:
a.var()

In [None]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [None]:
A.sum()

In [None]:
A.mean()

In [None]:
A.std()

In [None]:
A.sum(axis=0)

In [None]:
A.sum(axis=1)

In [None]:
A.mean(axis=0)

In [None]:
A.mean(axis=1)

In [None]:
A.std(axis=0)

In [None]:
A.std(axis=1)

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

In [22]:
# create an array with a range
a = np.arange(4) 

In [5]:
a

array([0, 1, 2, 3])

### Operations between arrays and scalars

In [6]:
a + 10

array([10, 11, 12, 13])

In [7]:
a * 10

array([ 0, 10, 20, 30])

These operations create a new array and do not mutate the original

In [8]:
a

array([0, 1, 2, 3])

Operators like `+=` _do_ mutate the original array

In [9]:
a += 100

In [10]:
a

array([100, 101, 102, 103])

In [None]:
l = [0, 1, 2, 3]

In [None]:
[i * 10 for i in l]

### Operations between arrays
Arrays must be the same shape

In [12]:
a = np.arange(4)

In [13]:
a

array([0, 1, 2, 3])

In [14]:
b = np.array([10, 10, 10, 10])

In [15]:
b

array([10, 10, 10, 10])

In [16]:
a + b

array([10, 11, 12, 13])

In [17]:
a * b

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_

In [21]:
a = np.arange(4)

In [20]:
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [26]:
# select with regular python index
a[0], a[-1]

(0, 3)

In [27]:
# select with multi-index
a[[0, -1]]

array([0, 3])

In [28]:
# select with a boolean array
a[[True, False, False, True]]

array([0, 3])

In [29]:
a

array([0, 1, 2, 3])

Boolean arrays can be used to select / filter elements in an array

In [31]:
# create a boolean array 
a >= 2

array([False, False,  True,  True])

In [32]:
# use a boolean array to select elements >= 2
a[a >= 2]

array([2, 3])

In [33]:
a.mean()

1.5

In [34]:
a[a > a.mean()]

array([2, 3])

### Boolean operators
- `~` not
- `|` or
- `&` and

In [35]:
a[~(a > a.mean())]

array([0, 1])

In [36]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [37]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [3]:
A = np.random.randint(100, size=(3, 3))

In [4]:
A

array([[51, 99, 62],
       [18, 22,  5],
       [44, 54, 68]])

In [5]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([51, 62, 22, 44, 68])

In [6]:
A > 30

array([[ True,  True,  True],
       [False, False, False],
       [ True,  True,  True]])

In [7]:
A[A > 30]

array([51, 99, 62, 44, 54, 68])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [8]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [9]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [10]:
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [11]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [12]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [13]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [14]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

In [15]:
# An integer in Python is > 24bytes !
sys.getsizeof(1)

28

In [16]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [17]:
# Numpy size is much smaller
np.dtype(int).itemsize

8

In [18]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [19]:
np.dtype(float).itemsize

8

### Lists are even larger

In [20]:
# A one-element list
sys.getsizeof([1])

64

In [21]:
# An array of one element in numpy
np.array([1]).nbytes

8

### And performance is also important

Array operations are way faster than operations on lists in raw python.

In [22]:
l = list(range(100000))

In [23]:
a = np.arange(100000)

In [25]:
%time np.sum(a ** 2)

CPU times: user 874 µs, sys: 540 µs, total: 1.41 ms
Wall time: 967 µs


333328333350000

In [24]:
%time sum([x ** 2 for x in l])

CPU times: user 21.8 ms, sys: 5.02 ms, total: 26.8 ms
Wall time: 25.9 ms


333328333350000

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

### `random` 

In [26]:
np.random.random(size=2)

array([0.48613121, 0.22056969])

In [27]:
np.random.normal(size=2)

array([ 1.08118992, -0.19800074])

In [28]:
np.random.rand(2, 4)

array([[0.95441167, 0.17405151, 0.88909329, 0.7746177 ],
       [0.40621554, 0.74446572, 0.79544978, 0.94973539]])

---
### `arange`

In [29]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [30]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [31]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

---
### `reshape`

In [32]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [33]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

---
### `linspace`

In [34]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [35]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [36]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

---
### `zeros`, `ones`, `empty`

In [37]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [38]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [39]:
np.zeros((3, 3), dtype=np.int)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.zeros((3, 3), dtype=np.int)


array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [40]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [41]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [42]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [43]:
np.empty((2, 2))

array([[0.25, 0.5 ],
       [0.75, 1.  ]])

---
### `identity` and `eye`

In [44]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [45]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [46]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [47]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [48]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [49]:
"Hello World"[6]

'W'

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)