## Basic NumPy Functionality

### NumPy n-dim array
* Part of a Data "ecosystem" which includes data science libaries, machine learning and image processing
* NumPy is huge and very powerful --> we will just hit the highlights
* optimized array, superior to lists
* can be "any" number of dimensions, typically 1 or 2 dims though
* restricted to a single data type
* required to "dimension" your arrays when you initialize them 

In [89]:
import numpy as np
import numpy.random as npr

In [17]:
l = [1, 2, 3] 
a = np.array([1,2,3])

In [19]:
l

[1, 2, 3]

In [21]:
a

array([1, 2, 3])

In [23]:
[x*5 for x in l]

[5, 10, 15]

In [25]:
a*5

array([ 5, 10, 15])

In [29]:
a + a

array([2, 4, 6])

### Performance

In [47]:
big_arr = np.arange(4.5e6)
big_list = big_arr.tolist()

In [49]:
%timeit -n5 square = [x ** 2 for x in big_list]

655 ms ± 74.3 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


In [50]:
%timeit -n5 square = big_arr ** 2 

9.72 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)


### Determining dimensions
* size = "rows" * "columns"
* itemsize --> number of bytes used by a single value
* itemsize * size = bytes of memory the array uses
* shape --> comma separated tuple of length of each dimension
* ndim --> int value of dimensions present
* reshape --> dimension manipulation

In [53]:
a = np.arange(15)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [55]:
a.size

15

In [57]:
a.itemsize

4

In [59]:
a.shape

(15,)

In [63]:
a.reshape(5,3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [65]:
a.reshape(5,3).shape

(5, 3)

In [67]:
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [69]:
a.reshape(15,)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [71]:
a.reshape(-1)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### "Slicing"

In [73]:
a[1,:]

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

### Special arrays
* useful for dimensioning an array before you aknow what values it will hold

In [75]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [79]:
np.ones((5,5,))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [81]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [83]:
np.eye(5,5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [85]:
np.empty(5)

array([1., 1., 1., 1., 1.])

### Working with random numbers
* many different distribution options
* normal and uniform most common
* normal, standard_normal
* randn, randint

### Standard normal
* mean 0
* std 1
* takes a single optional argument --> n
* `np.random.standard_normal([size])`

In [91]:
np.random.standard_normal([10])

array([ 1.06940094, -0.23477769,  0.05219795,  0.03723404, -0.09787212,
        0.45729293, -0.29450153, -0.60759662, -0.09276266,  0.01699591])

In [93]:
npr.standard_normal(10)

array([ 1.38177143,  0.8872191 ,  1.56436113,  0.76756208, -0.1665115 ,
        0.69269947, -0.2219464 , -1.7127222 ,  1.60046447,  1.42020159])

### Uniform Distributions
* `np.random.randn(size=None)`
* `np.random.randint(low, high=None, size=None, dtype=int)`
    * excludes high --> called a half-open interval

In [97]:
# notice we never get to 6...
npr.randint(1,6)


5

### Normal
* loc = mean
* scale = std
* if not specified mean = 0, std = 1

In [105]:
x = npr.normal(loc=100, scale=10, size=100)

In [107]:
print(x.mean())
print(x.std())
print(x.max())
print(x.min())
# there is no median method

99.91909094038665
8.183647124629996
122.12023620989908
82.64834686846864


In [111]:
percentiles = [5,10,25,50,75,90,95]
np.percentile(x, percentiles)

array([ 87.74030611,  90.04933547,  93.01541343, 100.16082524,
       105.34646914, 110.14627741, 113.56985517])

### Simple Linear Modeling

In [117]:
x = npr.normal(size=50)
y = x + npr.normal(loc=10, size=50)

In [119]:
np.corrcoef(x,y)

array([[1.        , 0.71109668],
       [0.71109668, 1.        ]])

In [59]:
slope, intercept = np.polyfit(x,y, deg=1)
slope

0.9617250638432819

### More syntactically intense method to do the same thing
* Requires a "dummy" variable next to actual x variable
* The rcond allows for special handling of extremely small values 

### Fill array with uniformly spaced data points between a minimum and maximum