# Chapter 4: NumPy Basics

## What is NumPy?

NumPy is a Python library that allows offers structures representing multi-dimentional arrays and matrices, as well as mathatical functions that operate on these structures.

## Why use NumPy?
* Operations preformed on NumPy structures are much more performant than their Python counterparts
* Most scientific and computational python packages use NumPy structures.
* Offers a ton of useful functionality for data science applications


In [218]:
import numpy as np # standard way of importing np

## NumPy vs. Python list Performance example

In [219]:
list_len = 10000

np_arr = np.arange(list_len)
py_list = [i for i in range(list_len)]

# construct new array / list with each element multiplied by 2
%time np_arr_2 = 2 * np_arr
%time py_list_2 = [i * 2 for i in py_list]


CPU times: user 54 µs, sys: 39 µs, total: 93 µs
Wall time: 98.9 µs
CPU times: user 1.12 ms, sys: 212 µs, total: 1.33 ms
Wall time: 1.37 ms


## The ndarray
* has a `shape` property, that shows shape of structure (NxM, etc)
* Has an associated type, which is retrieved via `dtype` property
* has magic methods for +, -, /, etc
* can represent datasets / matrices / vectors

## ndarray constructors

### build with Python lists

In [220]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [221]:
np.array([[1, 2, 3, 4], [3, 4, 5, 6], [6, 7, 8, 9]])

array([[1, 2, 3, 4],
       [3, 4, 5, 6],
       [6, 7, 8, 9]])

### build with random values

In [222]:
np.random.randn(2, 4)

array([[-0.57146215, -0.60283063, -0.4432748 , -0.11414315],
       [ 0.46841059, -0.91048961,  0.39900582, -0.23454144]])

### build with all zeros, all ones, or empty

In [223]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [224]:
np.zeros((10, 5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [225]:
np.ones((10, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [226]:
np.empty((3, 2)) # may be filled with zeros or garbage values

array([[1.1 , 8.43],
       [2.  , 3.  ],
       [4.  , 5.  ]])

### Build as a range of values

In [227]:
np.arange(10) # analogious to pythons range function, but returns array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## ndarray properties, indexing, and slicing


starting with a 3 x 5 array (like a matrix) and a 1d array (like a vector):

In [228]:
matrix = np.random.rand(3, 5)
vector = np.random.rand(10)

In [229]:
matrix

array([[0.23506168, 0.55858537, 0.66394425, 0.47495296, 0.5068266 ],
       [0.22935408, 0.28224466, 0.59781506, 0.76026335, 0.51499348],
       [0.45714113, 0.36637713, 0.78483691, 0.66506872, 0.80663467]])

In [230]:
vector

array([0.0447637 , 0.00254948, 0.22828584, 0.16413279, 0.75045281,
       0.54295852, 0.6788315 , 0.54454077, 0.32341027, 0.4885766 ])

### basic properties

In [231]:
matrix.shape

(3, 5)

In [232]:
matrix.ndim

2

In [233]:
vector.shape

(10,)

In [234]:
vector.ndim

1

In [235]:
matrix.dtype # data type of matrix

dtype('float64')

### dtype may be casted to different values. if cast isn't possible, then will throw value error:

In [236]:
arr = np.array([1.0,2.0,3.0,4.0,5.0])
arr.astype(np.int8)

array([1, 2, 3, 4, 5], dtype=int8)

In [237]:
arr = np.array(["foo","bar"])
arr.astype(np.int8) # interesting that it f

ValueError: invalid literal for int() with base 10: 'foo'

### Indexing 1d array
returns single element like a normal array

In [None]:
vector[2]

### Indexing multi-dimentional array

In [None]:
matrix[0] # this is like the first row

In [None]:
matrix[0][0] # first element of first row

In [None]:
matrix[0, 0] # can also use this shorthand

### Slicing arrray
You can slice arrays like Python lists, except the slices returned are references, not new copies.
You can get around this by using `copy()`.

In [None]:
slice = vector[1:3]
slice

In [None]:
slice[1] = 100

In [None]:
vector[1:3]

Using copy:

In [None]:
slice = vector[:4].copy()
slice

In [None]:
slice[2] = -5
slice

In [None]:
vector[:4] # original vector is not changed

### broadcasting values onto slices

In [None]:
matrix[1:] = 5 # matrix[1:] returns rows indexed at 1 and below
matrix

In [None]:
matrix[:] = -1
matrix

### comparison operations, boolean indexing

In [None]:
names = np.array(['mike', 'caroline', 'sue'])
names == 'mike'

In [None]:
rand_matrix = np.random.randint(-100, 100, size=(5,5))
rand_matrix < 0

### boolean arrays can be used as indexes, but must be the same shape:

In [246]:
rand_matrix[rand_matrix < 0]

array([-31, -74, -81,  -9, -43, -26, -31, -96, -77, -66, -41, -35, -35,
       -63])

## Operators and common functions with np arrays

Operations with same shape arrays and with scalars are performed element-wise:

In [238]:
v1 = np.array([1,2,3])
v2 = np.array([4,3,2])

In [239]:
v1 + v2

array([5, 5, 5])

In [240]:
v1 * v2

array([4, 6, 6])

In [241]:
v1 / v2

array([0.25      , 0.66666667, 1.5       ])

In [242]:
v1 * 100

array([100, 200, 300])

In [243]:
v1 - 400

array([-399, -398, -397])

In [244]:
v1 / 1

array([1., 2., 3.])

In [245]:
v1 > v2 # returns boolean array

array([False, False,  True])