![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on!

In [1]:
import sys
import numpy as np

## Basic Numpy Arrays

In [2]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [43]:
a = np.array([1, 2, 3, 4]) # array

In [44]:
b = np.array([0, .5, 1, 1.5, 2]) # array

In [45]:
c = [9, 8, 7 ,6] # list

In [46]:
c

[9, 8, 7, 6]

# Difference between appending to a _**np.array**_ and an _**array**_.

In [47]:
c.append(5) # list append
c

[9, 8, 7, 6, 5]

In [48]:
a = np.append(a,5) # np.append
a

array([1, 2, 3, 4, 5])

# Some random practice

In [30]:
c[0]

9

In [55]:
c[::-1]

[5, 6, 7, 8, 9]

In [5]:
a[0], a[1]

(1, 2)

In [6]:
a[0:]

array([1, 2, 3, 4])

In [7]:
a[1:3]

array([2, 3])

In [8]:
a[1:-1]

array([2, 3])

In [42]:
a[::-1] # ::n every nth number.  ::-n  every nth number in reverse order


array([5, 4, 3, 2, 1])

In [22]:
a[::1]

array([1, 2, 3, 4])

In [11]:
a[-1]

4

In [12]:
a[-2]

3

In [26]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [27]:
b[0], b[2], b[-1]

(0.0, 1.0, 2.0)

In [28]:
b[[0, 2, -1]]

array([0., 1., 2.])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

In [49]:
a

array([1, 2, 3, 4, 5])

In [50]:
a.dtype

dtype('int64')

In [51]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [52]:
b.dtype

dtype('float64')

In [53]:
c

[9, 8, 7, 6, 5]

In [56]:
# c.dtype will not work, as c is a list. If I really need to see what type of data is inside of c, I would have to transfer c into an array.
# Like c_array = np.array(c)
# then c_array.dtype

In [69]:
d = np.array([1, 2, 3, 4], dtype= np.float_)
d


array([1., 2., 3., 4.])

In [64]:
e = np.array([1, 2, 3, 4], dtype=np.int8)
e.dtype

dtype('int8')

In [70]:
f = np.array(['a', 'b', 'c'])
f

array(['a', 'b', 'c'], dtype='<U1')

In [66]:
f.dtype

dtype('<U1')

In [71]:
g = np.array([{'a': 1}, sys])
g

array([{'a': 1}, <module 'sys' (built-in)>], dtype=object)

In [68]:
g.dtype

dtype('O')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [74]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
A

array([[1, 2, 3],
       [4, 5, 6]])

In [73]:
A.shape

(2, 3)

In [75]:
A.ndim

2

In [76]:
A.size

6

In [82]:
B = np.array([
    [[
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]]
])

In [83]:
B

array([[[[12, 11, 10],
         [ 9,  8,  7]],

        [[ 6,  5,  4],
         [ 3,  2,  1]]]])

In [84]:
B.shape

(1, 2, 2, 3)

In [85]:
B.ndim

4

In [80]:
B.size

12

If the shape isn't consistent, it'll just fall back to regular Python objects:

# Shit won't work now.

In [89]:
# C = np.array([
#     [
#         [12, 11, 10],
#         [9, 8, 7],
#     ],
#     [
#         [6, 5, 4]
#     ]
# ])

In [90]:
#C.dtype

In [91]:
#C.shape

In [93]:
#C.size

In [92]:
#type(C[0])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [94]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [95]:
A[1]

array([4, 5, 6])

In [96]:
A[1][0]

4

In [97]:
# A[d1, d2, d3, d4]

In [98]:
A[1, 0]

4

In [99]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [100]:
A[:, :2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [101]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [102]:
A[:2, 2:]

array([[3],
       [6]])

In [103]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [104]:
A[1] = np.array([10, 10, 10])

In [105]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [106]:
A[2] = 99

In [107]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [108]:
a = np.array([1, 2, 3, 4])

In [109]:
a.sum()

10

In [110]:
a.mean()

2.5

In [111]:
a.std()

1.118033988749895

In [112]:
a.var()

1.25

In [113]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [114]:
A.sum()

45

In [115]:
A.mean()

5.0

In [116]:
A.std()

2.581988897471611

In [117]:
A.sum(axis=0)

array([12, 15, 18])

In [118]:
A.sum(axis=1)

array([ 6, 15, 24])

In [119]:
A.mean(axis=0)

array([4., 5., 6.])

In [120]:
A.mean(axis=1)

array([2., 5., 8.])

In [121]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [122]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

In [125]:
a = np.arange(4)

In [127]:
a

array([0, 1, 2, 3])

In [128]:
a + 10

array([10, 11, 12, 13])

In [129]:
a * 10

array([ 0, 10, 20, 30])

In [130]:
a

array([0, 1, 2, 3])

In [131]:
a += 100

In [132]:
a

array([100, 101, 102, 103])

In [133]:
l = [0, 1, 2, 3]

In [134]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [135]:
a = np.arange(4)

In [136]:
a

array([0, 1, 2, 3])

In [137]:
b = np.array([10, 10, 10, 10])

In [138]:
a + b

array([10, 11, 12, 13])

In [139]:
a * b

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_

In [140]:
a = np.arange(4)

In [141]:
a

array([0, 1, 2, 3])

In [142]:
a[[0, -1]]

array([0, 3])

In [146]:
a[[True, False, True, False]]

array([0, 2])

In [144]:
a >= 2

array([False, False,  True,  True])

In [147]:
a[a >= 2]

array([2, 3])

In [148]:
a.mean()

1.5

In [149]:
a[a > a.mean()]

array([2, 3])

In [150]:
a[~(a > a.mean())]

array([0, 1])

In [151]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [152]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [153]:
A = np.random.randint(100, size=(3, 3))

In [154]:
A

array([[52, 81, 14],
       [52, 66,  0],
       [94, 20, 63]])

In [155]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([52, 14, 66, 94, 63])

In [156]:
A > 30

array([[ True,  True, False],
       [ True,  True, False],
       [ True, False,  True]])

In [157]:
A[A > 30]

array([52, 81, 52, 66, 94, 63])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [158]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [159]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [160]:
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [161]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [162]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [163]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [164]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

In [165]:
# An integer in Python is > 24bytes
sys.getsizeof(1)

28

In [166]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [167]:
# Numpy size is much smaller
np.dtype(int).itemsize

8

In [168]:
np.dtype(float).itemsize

8

### Lists are even larger

In [169]:
# A one-element list
sys.getsizeof([1])

64

In [170]:
# An array of one element in numpy
np.array([1]).nbytes

8

### And performance is also important

In [171]:
l = list(range(1000))

In [172]:
a = np.arange(1000)

In [173]:
%time np.sum(a ** 2)

CPU times: user 211 µs, sys: 0 ns, total: 211 µs
Wall time: 219 µs


332833500

In [174]:
%time sum([x ** 2 for x in l])

CPU times: user 519 µs, sys: 0 ns, total: 519 µs
Wall time: 525 µs


332833500

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

### `random`

In [175]:
np.random.random(size=2)

array([0.72242666, 0.35956088])

In [176]:
np.random.normal(size=2)

array([-0.42838329,  2.18343505])

In [177]:
np.random.rand(2, 4)

array([[0.41108043, 0.11476368, 0.32726027, 0.6031709 ],
       [0.25537706, 0.24082583, 0.52271313, 0.18091543]])

---
### `arange`

In [178]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [183]:
np.arange(4, 10,0.5)

array([4. , 4.5, 5. , 5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])

In [180]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

---
### `reshape`

In [184]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [213]:
np.arange(10).reshape(-1)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [185]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

---
### `linspace`

In [186]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [187]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [188]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

---
### `zeros`, `ones`, `empty`

In [189]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [190]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [197]:
np.zeros((3, 3), dtype=np.int_)

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]], dtype=int8)

In [193]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [194]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [195]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [196]:
np.empty((2, 2))

array([[0.25, 0.5 ],
       [0.75, 1.  ]])

---
### `identity` and `eye`

In [198]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [199]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [200]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [201]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [202]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [208]:
np.eye(8, 4, k=-4)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [203]:
"Hello World"[6]

'W'

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)