### NUMPY TUTORIAL

In [1]:
import numpy as np

#### Lists are great!
● Powerful

● Collection of values

● Hold different types

● Change, add, remove

● Need for Analytics

● Mathematical operations over collections

● Speed

In [2]:
height = [1.81, 1.79, 1.90]

weight = [65.4, 59.2, 63.6]



In [3]:
weight / height ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

### Solution is NumPy

● Numeric Python

● Alternative to Python List: NumPy Array

● Calculations over entire arrays

● Easy and Fast


In [13]:
np_height = np.array(height)

np_weight = np.array(weight)


bmi = np_weight / np_height ** 2
#Element-wise calculations
bmi

array([19.9627606 , 18.47632721, 17.61772853])

numpy is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things have changed.

First of all, numpy arrays cannot contain elements with different types. If you try to build such a list, some of the elements' types are changed to end up with a homogeneous list. This is known as *type coercion*.

Second, the typical arithmetic operators, such as +, -, * and / have a different meaning for regular Python lists and numpy arrays.

In [14]:
#NumPy arrays: contain only one type
np.array([1.0, "is", True])

array(['1.0', 'is', 'True'], dtype='<U32')

In [15]:
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])

In [16]:
python_list + python_list

[1, 2, 3, 1, 2, 3]

In [17]:
#Different types: different behavior!
numpy_array + numpy_array

array([2, 4, 6])

In [20]:
# Numpy Subsetting
#bmi[1]
bmi > 18
#bmi[bmi > 18]

array([ True,  True, False])

In [21]:
np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])
#ndarray = N-dimensional array
type(np_height)

type(np_weight)


numpy.ndarray

In [23]:
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
                    [65.4, 59.2, 63.6, 88.4, 68.7]])

np_2d

np_2d.shape # 2rows, 5 columns

#Single Type!
#          0      1     2      3    4
np.array([[1.73, 1.68, 1.71, 1.89, 1.79],  # 0
          [65.4, 59.2, 63.6, 88.4, "68.7"]]) # 1



array([['1.73', '1.68', '1.71', '1.89', '1.79'],
       ['65.4', '59.2', '63.6', '88.4', '68.7']], dtype='<U32')

In [26]:
#np_2d[0]


#np_2d[0][2]

np_2d[0,2]


1.71

In [27]:
np_2d[:,1:3]

array([[ 1.68,  1.71],
       [59.2 , 63.6 ]])

In [28]:
np_2d[1,:]

array([65.4, 59.2, 63.6, 88.4, 68.7])

In [29]:
sample=np.array([[3.62, 91.78],
                 [4.25, 54.21],
                 [5.1 , 65.06],
                 [3.04, 74.11],
                 [3.67, 71.06],
                 [3.02, 83.55]])

In [34]:
np.mean(sample[:,0])

#sum(), sort(), ...
#Enforce single data type: speed!
#np.median(sample[:,0])

np.corrcoef(sample[:,0], sample[:,1])

np.std(sample[:,0])


0.7218186906850099

In [35]:
# Generating data
height = np.round(np.random.normal(1.65, 0.20, 5),2) #mean,sd, n#samples

In [36]:
height

array([1.57, 1.55, 1.5 , 1.15, 1.59])

In [37]:
weight=np.round(np.random.normal(65.32, 15, 5), 2)

In [38]:
weight

array([67.62, 47.81, 85.92, 62.75, 50.05])

In [39]:
newc=np.column_stack((height, weight))

In [40]:
newc

array([[ 1.57, 67.62],
       [ 1.55, 47.81],
       [ 1.5 , 85.92],
       [ 1.15, 62.75],
       [ 1.59, 50.05]])

In [41]:
newr=np.row_stack((height, weight))

In [42]:
newr

array([[ 1.57,  1.55,  1.5 ,  1.15,  1.59],
       [67.62, 47.81, 85.92, 62.75, 50.05]])

In [43]:
a=[1,3,4]

In [44]:
mean(a)

NameError: name 'mean' is not defined

In [None]:
#https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html

In [None]:
#https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.dtypes.html

In [45]:
np.array([True, 1, 2]) + np.array([3, 4, False])

array([4, 5, 2])

In [46]:
a=np.array([True, 1, 2])

In [47]:
b=np.array([3, 4, False])

In [48]:
b

array([3, 4, 0])

In [49]:
from statistics import mean 
mean(a)

1

In [50]:
def Average(lst): 
    return sum(lst) / len(lst) 

In [51]:
Average(a)

1.3333333333333333

### Practice

NumPy Cheat sheet for future reference: 
https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html

In [6]:
import numpy as np
np_dot = np.array([[1, 2],
                   [3, 4],
                   [5, 6]])
# multiply np_dot by 2
# add a new array which has two elements: [10,10] to np_dot

#np_dot plus np_dot

In [53]:
import numpy as np
#create a list and convert to np array
mylist = [1, 2, 3]
x = np.array(mylist)

In [54]:
y = np.array([4, 5, 6])
y

array([4, 5, 6])

In [55]:
# pass a list of lists to create a multidimensional array
m = np.array([[7, 8, 9], [10, 11, 12]])
m

array([[ 7,  8,  9],
       [10, 11, 12]])

In [56]:
# retrieve the dimensions of m


(2, 3)

In [58]:
n = np.arange(0, 30, 2)

In [59]:
n

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

In [60]:
np.eye(4)#look at np.zeros and np.ones

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [61]:
(x+y)#,(x-y),(x/y),(x*y),(x**2)

array([5, 7, 9])

In [62]:
z = np.array([3, 6, 9, 12])

print(z/4)

[0.75 1.5  2.25 3.  ]


In [63]:
y = [3, 6, 9, 12]
print (y/3)

TypeError: unsupported operand type(s) for /: 'list' and 'int'

In [99]:
#z.sum() #z.max(),z.mean(),z.std(), z.argmin()

#return the index value of min in array

0

In [64]:
r = np.arange(36)
r.resize((6, 6))
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

In [67]:
# code for printing below subset


array([15, 16, 17])

In [89]:
r[r>30]

array([31, 32, 33, 34, 35])

In [90]:
r[r>30]=30

In [7]:
test = np.random.randint(0, 10, (4,3))
test

array([[8, 0, 1],
       [9, 0, 8],
       [5, 4, 0],
       [8, 8, 6]])

In [8]:
test[2]
#iterate by row
for row in test:
    print(row)

[8 0 1]
[9 0 8]
[5 4 0]
[8 8 6]


In [9]:
#iterate by index
for i in range(len(test)):
    print(test[i])

[8 0 1]
[9 0 8]
[5 4 0]
[8 8 6]


In [10]:
#iterate by row and index
for i, row in enumerate(test):
    print('row', i, 'is', row)

row 0 is [8 0 1]
row 1 is [9 0 8]
row 2 is [5 4 0]
row 3 is [8 8 6]


In [11]:
test2 = test**2
test2

array([[64,  0,  1],
       [81,  0, 64],
       [25, 16,  0],
       [64, 64, 36]], dtype=int32)

In [13]:
for i, row in zip(test,test2):
    print('row', i, 'is', row)

row [8 0 1] is [64  0  1]
row [9 0 8] is [81  0 64]
row [5 4 0] is [25 16  0]
row [8 8 6] is [64 64 36]


In [14]:
#iterate over multiple iterables
#check out enumerate and zip in for loops
#https://docs.python.org/3.3/library/functions.html
#https://jonlabelle.com/snippets/view/markdown/python-enumerate-and-zip

for i, j in zip(test,test2):
    print(i,'+', j, '=', i+j)

[8 0 1] + [64  0  1] = [72  0  2]
[9 0 8] + [81  0 64] = [90  0 72]
[5 4 0] + [25 16  0] = [30 20  0]
[8 8 6] + [64 64 36] = [72 72 42]


#### Broadcasting with NumPy

Source: numpy.org

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. This notebook also provides hints on when and when not to use broadcasting.

numpy operations are usually done element-by-element which requires two arrays to have exactly the same shape:

In [16]:
from numpy import array
a = array([1.0, 2.0, 3.0])
b = array([2.0, 2.0, 2.0])
a * b


array([2., 4., 6.])

numpy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

In [7]:
a = array([1.0,2.0,3.0])
b = 2.0
a * b

array([2., 4., 6.])

Scalar b being stretched during the arithmetic operation into an array with the same shape as a. The new elements in b are simply copies of the original scalar. The stretching analogy is only conceptual. numpy is smart enough to use the original scalar value without actually making copies so that broadcasting operations are as memory and computationally efficient as possible. Because second example moves less memory, (b is a scalar, not an array) around during the multiplication, it is about 10% faster than Example 1 using the standard numpy on Windows 2000 with one million element arrays.

##### The Broadcasting Rule:
###### In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.

In [17]:
a = array([[ 0.0,  0.0,  0.0],
           [10.0, 10.0, 10.0],
           [20.0, 20.0, 20.0],
           [30.0, 30.0, 30.0]])
b = array([1.0, 2.0, 3.0])##4 added
a+b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

In [11]:
a = array([0.0, 10.0, 20.0, 30.0])
b = array([1.0, 2.0, 3.0])
a[:,newaxis] + b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

Here the newaxis index operator inserts a new axis into a, making it a two-dimensional 4x1 array. The stretching of both arrays to produce the desired 4x3 output array.