## Key numpy notes
- store data of all the same type in an array vs. list
- creating vectors and matrices
- easy operations and summary statistics using vectorized operations

In [11]:
import numpy as np

In [12]:
np.__version__

'1.15.2'

Lists vs. Arrays:
- flexibility means that each item in the list must contain it's own type info
- if all item are of the same type then there is a more efficient way to store the data ... fixed type array
- lists are flexible, can contain items of any type
- numpy arrays are not flexible but are much more efficient for storing and manipulating data


In [1]:
l = list(range(10))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [2]:
type(l[0])

int

In [4]:
l2 = [str(c) for c in l]
l2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [5]:
type(l2[0])

str

In [6]:
l3 = [True, "x", 8, 9.0]
[type(item) for item in l3]

[bool, str, int, float]

Fixed-Type Arrays
- 'i' is a type code indicating contents are integers
- but ndarray object in numpy is more useful

In [10]:
import array 
l = list(range(10))
a = array.array('i', l)
a

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Creating arrays using numpy
- can use np.array to create arrays from python lists
- all items must be of the same type
- if they do not match, python will upcast if possible (i.e. int upcast to float)
- arrays can be multi-dimensional

In [14]:
a = np.array([1, 2, 3, 4, 5, 6])
a

array([1, 2, 3, 4, 5, 6])

In [15]:
a = np.array([1.0, 2, 3, 4, 5, 6])
a

array([1., 2., 3., 4., 5., 6.])

In [18]:
# e.g. nested lists for multi-dimensional arrays
a = np.array([range(i, i + 3) for i in [2, 4, 6]])
a
# inner lists are treated as rows

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

In [21]:
# create an zero filled array
a = np.zeros(10, dtype = int)
a

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [23]:
# create a 3x5 array with 1.0's
a = np.ones((3, 5), dtype = float)
a

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [24]:
# create a 3x5 array with 10's
a = np.full((3, 5), 10)
a

array([[10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10]])

In [27]:
# create a linear sequence
# similar to range() function
a = np.arange(0, 10, 2)
a

array([0, 2, 4, 6, 8])

In [31]:
# create an evenly spaced of 5 values between 0 and 1
a = np.linspace(0, 1, 5)
a

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [38]:
# generate 3x3 of random uniform values between 0 and 1
a = np.random.random((3, 3))
a

array([[0.49174806, 0.46866065, 0.13796951],
       [0.93175307, 0.19582351, 0.22230197],
       [0.88217025, 0.99605374, 0.32293634]])

In [43]:
# generate 3x3 of random standard normals
a = np.random.normal(0, 1, (3, 3))
a

array([[ 0.35408389,  1.56322028, -1.4062144 ],
       [-0.62115096,  1.30622716,  0.66890425],
       [ 0.26904651, -0.36631791, -0.98870228]])

In [40]:
# generate 3x3 of random ints between 0 and 10
a = np.random.randint(0, 10, (3, 3))
a

array([[6, 1, 8],
       [8, 8, 8],
       [1, 6, 1]])

In [44]:
# create 3x3 identity matrix
a = np.eye(3)
a

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [45]:
# create an uninitialized array of 3 ints
# values will be whatever happens to already exist in memory
a = np.empty(3)
a

array([1., 1., 1.])

### Universal Functions

Slowless of loops example: find the inverse of the values in an array

In [46]:
np.random.seed(123)

def compute_inverse(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

In [48]:
values = np.random.randint(1, 10, size = 5)
values

array([7, 2, 1, 2, 1])

In [49]:
compute_inverse(values)

array([0.14285714, 0.5       , 1.        , 0.5       , 1.        ])

This loop becomes really slow for a large volume of data (i.e. 1M values):
- takes approximately 1.4 seconds per loop
- bottleneck is the type-checking and function dispatches that must be done at each cycle of the loop
- each time the inverse is calculated, python looks at the type and looks up the correct function to use for that type

In [50]:
big_array = np.random.randint(1, 100, size = 1000000)

In [52]:
%timeit compute_inverse(big_array)

1.39 s ± 109 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**UFuncts**
- vectorized operations
- quick execution of repeated operations on values in a numpy array

In [54]:
# for example:
print(compute_inverse(values))
print(1.0 / values)

[0.14285714 0.5        1.         0.5        1.        ]
[0.14285714 0.5        1.         0.5        1.        ]


In [55]:
%timeit (1.0 / big_array)

1.66 ms ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [61]:
np.arange(5) / np.arange(1, 6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

In [63]:
x = np.arange(9)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [66]:
y = x.reshape((3, 3))
y

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [67]:
y ** 2

array([[ 0,  1,  4],
       [ 9, 16, 25],
       [36, 49, 64]])

**Types of UFuncs**

Can handle all standard mathematical functions such as:
- array arithmetics (add, subtract, multiply, divide)
- power ( ** ) and log
- modulus (%)
- boolean operators
- trig

In [69]:
## + and np.add
x = np.arange(4)
x

array([0, 1, 2, 3])

In [70]:
x + 2

array([2, 3, 4, 5])

In [71]:
np.add(x, 2)

array([2, 3, 4, 5])

In [74]:
y = np.arange(-10, 10, 2)
y

array([-10,  -8,  -6,  -4,  -2,   0,   2,   4,   6,   8])

In [75]:
abs(y)

array([10,  8,  6,  4,  2,  0,  2,  4,  6,  8])

In [76]:
np.absolute(y)

array([10,  8,  6,  4,  2,  0,  2,  4,  6,  8])

**Specialized ufuncs**
- use special from scipy
- gamma, beta, etc

**Other stuff**

Outputing

In [84]:
# specify output with the "out" argument
# useful for very large arrays and saves memory
x = np.arange(5)
y = np.empty(5)

In [85]:
np.multiply(x, 10, out = y)
print(y)

[ 0. 10. 20. 30. 40.]


In [86]:
# write to every other element
y = np.zeros(10)
np.power(2, x, out = y[::2])
print(y)

[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]


Aggregate

In [87]:
x = np.arange(1, 6)
print(x)

[1 2 3 4 5]


In [90]:
# add all values of x together
# reduce repeatedly applies the operation until a single value remains
np.add.reduce(x)

15

In [91]:
np.subtract.reduce(x)

-13

In [92]:
np.multiply.reduce(x)

120

Cumulative calculations

In [95]:
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15])

In [96]:
np.multiply.accumulate(x)

array([  1,   2,   6,  24, 120])

In [97]:
np.subtract.accumulate(x)

array([  1,  -1,  -4,  -8, -13])

**Summary Statistics**
- using the built in aggregation functions in numpy

In [98]:
# generate 100 random uniform values
l = np.random.random(100)

In [103]:
## inspect first 5 values
print(l[0:5])

[0.42546444 0.98768933 0.21602842 0.51718409 0.56645193]


In [104]:
# calculate sum of l
sum(l)

51.8639591024974

In [105]:
# calculate sum of l using the numpy version
np.sum(l)

51.8639591024974

In [109]:
# numpy version is faster!
big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)

56.9 ms ± 877 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
401 µs ± 5.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [110]:
np.min(big_array)

2.4560254799910197e-06

In [111]:
np.max(big_array)

0.9999976310835537

In [112]:
# can also use a different notation:
print(big_array.min(), big_array.max(), big_array.sum())

2.4560254799910197e-06 0.9999976310835537 499779.7617561451


**Aggregate along rows or columns**

In [115]:
# create a 3x4 matrix of unif(0,1)
m = np.random.random((3, 4))
print(m)

[[0.50802269 0.58022173 0.05119628 0.01207565]
 [0.73544921 0.66760426 0.46049994 0.63034293]
 [0.59697001 0.0731161  0.67939069 0.91222961]]


In [116]:
# sums all values in matrix
m.sum()

5.907119094950617

In [119]:
# specify axis
print(m.sum(axis=0)) # sum each column
print(m.sum(axis=1)) # sum each row

[1.84044191 1.32094209 1.1910869  1.55464819]
[1.15151635 2.49389635 2.2617064 ]
