# Section 3: Introduction to NumPy

## Importing Modules
If vanilla python seems rather lackluster, that's because it is. Fortunately, the scientific stack adds a broad and powerful array of python packages fill in the gaps. Once installed, packages in python are easily loaded for use.

In [67]:
import numpy
print(numpy.__version__)

1.13.0


Commands from packages are like attributes of objects. For convenience, we will import packages using shorthand.

In [68]:
import numpy as np
print(np.__version__)

1.13.0


## NumPy Arrays
### Why arrays improve on lists
Arrays are the most basic type of the NumPy package. NumPy arrays are vectors (Nx1), similar to pythonic lists. In contrast to lists, however, arrays have many more attributes and can be modified in substantially more ways. Several examples are provided below demonstrating the improvement of arrays over lists.

In [69]:
## Define example list.
example_list = list(range(5))

print(example_list)
print(example_list * 3)            # scalar * list
print(example_list * example_list) # list * list

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]


TypeError: can't multiply sequence by non-int of type 'list'

In contrast, NumPy arrays can be modified in this way. We use the **arange** command to initialize an array of sequential integers.

In [70]:
arr = np.arange(5)

print(arr, type(arr))
print(arr * 3)
print(arr * arr)

[0 1 2 3 4] <class 'numpy.ndarray'>
[ 0  3  6  9 12]
[ 0  1  4  9 16]


Every array has an object type. These can be looked up and modified.

In [71]:
print(arr, arr.dtype)   # Print current datatype.
arr = arr.astype(float) # Conver to float.
print(arr, arr.dtype)   # Print new datatype.

[0 1 2 3 4] int64
[ 0.  1.  2.  3.  4.] float64


Numpy arrays store metadata about their contents. These can be helpful, especially the **shape** atribute.

In [72]:
print('Array shape:', arr.shape) # Print shape of array.
print('Array size:', arr.nbytes) # Print bytes of array.

Array shape: (5,)
Array size: 40


Arrays now have a number of other built-in attributes 
not available for lists.

In [73]:
print('Round:', arr.round()) # Round array.
print('Min:', arr.min())     # Get max of array.
print('Max:', arr.max())     # Get min of array.
print('Sum:', arr.sum())     # Get sum of array.
print('Mean:',arr.mean())    # Get mean of array.

Round: [ 0.  1.  2.  3.  4.]
Min: 0.0
Max: 4.0
Sum: 10.0
Mean: 2.0


### Generating NumPy Arrays
There are many ways of generating NumPy arrays. The most simple way is to convert a Python list to NumPy array using the **array** command.

In [74]:
## Making an array from a list using the array command.
example_list = [4, 7, 9.4]
arr = np.array(example_list)

print(example_list, type(example_list))
print(arr, type(arr)) 

[4, 7, 9.4] <class 'list'>
[ 4.   7.   9.4] <class 'numpy.ndarray'>


NumPy has recreated all of the standard R/Matlab commands for 
generating arrays.

In [75]:
print('np.arange(5)        = %s' %np.arange(5))         # Array of 5 sequential integers.
print('np.zeros(5)         = %s' %np.zeros(5))          # Array of 5 zeros.
print('np.ones(5)          = %s' %np.ones(5))           # Array of 5 ones.
print('np.linspace(0,10,5) = %s' %np.linspace(0,10,5))  # Length-5 evenly-spaced array from 0 to 10.

np.arange(5)        = [0 1 2 3 4]
np.zeros(5)         = [ 0.  0.  0.  0.  0.]
np.ones(5)          = [ 1.  1.  1.  1.  1.]
np.linspace(0,10,5) = [  0.    2.5   5.    7.5  10. ]


## NumPy Matrices
### Why matrices improve on lists
It is possible to represent matrices in pythonic lists, though it is inefficient. Similar to the benefits of arrays, NumPy matrices dramatically improve upon the numerical capabilities of core python. Python can technically represent matrices as a list of lists.

In [76]:
nested_lists = [[1,2,3],
                [4,5,6],
                [7,8,9]]

print(nested_lists)
print(nested_lists[1][2])   # To extract the 2nd row, 3rd column, two brackets are necessary.

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
6


NumPy matrices make this much easier!

In [77]:
mat = np.array(nested_lists)

print(mat, type(mat))
print(mat[1,2])

[[1 2 3]
 [4 5 6]
 [7 8 9]] <class 'numpy.ndarray'>
6


Indexing of NumPy matrices (and arrays for that matter) obey all of the slicing conventions of lists. Commas are used to demarcate which axis a slice operation is targeting.

In [78]:
print('mat[1,2]  = %s' %mat[1,2])    # Second row, third column.
print('mat[0,:]  = %s' %mat[0,:])    # All the first row.
print('mat[:,-1] = %s' %mat[:,-1])   # All of the final column.

mat[1,2]  = 6
mat[0,:]  = [1 2 3]
mat[:,-1] = [3 6 9]


NumPy matrices have all the same attributes of NumPy arrays, but now functions can be applied to specific rows or columns in addition to the entire matrix.

In [79]:
## Sum across matrix.
print(mat)
print( mat.sum() )          

[[1 2 3]
 [4 5 6]
 [7 8 9]]
45


In [80]:
## Sum across columns.
print( mat.sum(axis=0) )

[12 15 18]


In [81]:
## Sum across rows.
print( mat.sum(axis=1) )

[ 6 15 24]


Importantly, all NumPy arrays and matrices have a **reshape** attribute allowing for transforming matrices into different dimensions.

In [82]:
print('Original shape', mat.shape)

# Reshape to column vector
mat = mat.reshape(9,1)
print('Column vector', mat.shape)

# Reshape to column vector
mat = mat.reshape(1,9)
print('Row vector', mat.shape)

Original shape (3, 3)
Column vector (9, 1)
Row vector (1, 9)


Importantly, reshape can be used to change the shape of NumPy arrays. The order flag can also change how they are organized (row-ordered vs. column-ordered).

In [83]:
print('Original:', mat)

Original: [[1 2 3 4 5 6 7 8 9]]


In [84]:
## Reshape (column organized)
print(mat.reshape(3,3,order='C'))

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [85]:
## Reshape (row organized)
print(mat.reshape(3,3,order='F')) 

[[1 4 7]
 [2 5 8]
 [3 6 9]]


The dimensions of matrices can also be quickly changed with **flatten** and **squeeze**. 

In [86]:
## Reshape to new dimensions.
mat = mat.reshape(3,3,1)
print('Original:', mat.shape)

## Flatten matrix.
print('Flatten:', mat.flatten().shape )

## Squeeze matrix.
print('Squeeze:', mat.squeeze().shape )

Original: (3, 3, 1)
Flatten: (9,)
Squeeze: (3, 3)


### Generating NumPy Matrices
Just as with arrays, there are a number of ways of generating NumPy matrices. The simplest is to use the **array** command on a list of lists. 

In [87]:
nested_lists = [[0, 1, 1],[2, 3, 5], [8, 13, 21]]
mat = np.array(nested_lists)

print(nested_lists)
print(mat)

[[0, 1, 1], [2, 3, 5], [8, 13, 21]]
[[ 0  1  1]
 [ 2  3  5]
 [ 8 13 21]]


The same commands previously introduced to generate NumPy arrays can also be used to generate matrices. Simply specify extra dimensions.

In [88]:
np.zeros( [3,3] )               # 3x3 matrix of zeros.
np.ones( [3,3] )                # 3x3 matrix of ones.
np.arange(9).reshape(3,3)       # 3x3 matrix of sequential integers.
np.linspace(0,8,9).reshape(3,3) # 3x3 matrix evenly-spaced array from 0 to 8. 
np.identity(3)                  # 3x3 identity matrix.

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Matrices can also be formed by joining NumPy arrays. There are several methods for doing this, including: **r_**, **c_**, **hstack**, **vstack**, and **concatenate**. We demonstrate each below. 

In [89]:
## np.r_ = join two arrays on their first axis.
arr = np.arange(5)
print('Original', arr)

## Join on first axis.
np.r_[arr,arr]

Original [0 1 2 3 4]


array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])

In [90]:
## np.c_ = join two arrays on their second axis.
## Create a second axis if does not exist.

np.c_[arr,arr]

array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])

In [91]:
## np.hstack = join arrays along their columns.

print(np.hstack([arr,arr]))
print(np.hstack([arr.reshape(5,1), arr.reshape(5,1)]))

[0 1 2 3 4 0 1 2 3 4]
[[0 0]
 [1 1]
 [2 2]
 [3 3]
 [4 4]]


In [92]:
## np.vstack = join arrays along their rows.
np.vstack([arr,arr])
print(np.vstack([arr.reshape(5,1), arr.reshape(5,1)]))

[[0]
 [1]
 [2]
 [3]
 [4]
 [0]
 [1]
 [2]
 [3]
 [4]]


In [93]:
## np.concatenate = join arrays along specified axis.
## Default is first axis.

print(np.concatenate([arr, arr], axis=0))
print(np.concatenate([arr.reshape(5,1), arr.reshape(5,1)], axis=1))

[0 1 2 3 4 0 1 2 3 4]
[[0 0]
 [1 1]
 [2 2]
 [3 3]
 [4 4]]


## Core NumPy Functions
NumPy also introduces a number of useful functions designed to operate efficiently over NumPy arrays. The following is a non-exhaustive overview of some important NumPy functions.

### Rounding Functions

In [94]:
mat = np.linspace(0,1,5)
print('Original: %s' %mat)
print('np.round: %s' %np.round(mat, 1) )
print('np.floor: %s' %np.floor(mat) ) 
print('np.ceil:  %s' %np.ceil(mat) )

Original: [ 0.    0.25  0.5   0.75  1.  ]
np.round: [ 0.   0.2  0.5  0.8  1. ]
np.floor: [ 0.  0.  0.  0.  1.]
np.ceil:  [ 0.  1.  1.  1.  1.]


### Mathematical functions

NumPy includes a variety of mathematical functions. All of these can be applied across an entire matrix or across arrays.

In [95]:
np.sum;       # Sum of an array or matrix.
np.cumsum;    # Cumulative sum over an array.
np.prod;      # Element-wise multiplication of an array.
np.divide;    # Element-wise division of two arrays.
np.diff;      # Pairwise difference of elements of an array.
np.exp;       # Exponential transform.
np.log;       # Natural logarithm.
np.log10;     # Base-10 logarithm.

### Summary Functions

NumPy includes many functions to summarize an array. With the exception of **corrcoef**, all of these can be
applied across an entire matrix or across arrays.

In [96]:
np.min;           # Return the smallest element.
np.max;           # Return the largest element.
np.argmin;        # Return the index of the smallest element.
np.argmax;        # Return the index of the largest element.
np.mean;          # Compute the mean of an array.
np.median;        # Compute the median of an array.
np.std;           # Compute the standard deviation of an array.
np.var;           # Compute the variance (sd^2) of an array.
np.percentile;    # Compute the xth percentile of an array.
np.corrcoef;      # Compute the row-/col-wise correlation of a matrix.

In [97]:
## To give a few examples.
mat = np.vstack([ np.arange(5), np.arange(5)[::-1] ])
print('Original:\n%s' %mat)

Original:
[[0 1 2 3 4]
 [4 3 2 1 0]]


In [98]:
## Compute percentile.
print( '70%% (all):  %s' %np.percentile(mat, 70) )

## Compute mean across rows.
print('70%% (rows): %s' %np.percentile(mat, 70, axis=1) )

## Compute mean across cols.
print('70%% (cols): %s' %np.percentile(mat, 70, axis=0) )

70% (all):  3.0
70% (rows): [ 2.8  2.8]
70% (cols): [ 2.8  2.4  2.   2.4  2.8]


In [99]:
## Compute correlation.
print('Correlation:\n', np.corrcoef(mat))

Correlation:
 [[ 1. -1.]
 [-1.  1.]]


### Set Functions
NumPy includes functions for identifying unique elements within or between arrays.

In [100]:
## Define two arrays for example.
arr1 = np.array([41, 16, 34, 0, 2, 20, 19, 14, 22, 15, 18, 9, 35, 41])
arr2 = np.array([42, 22, 40, 7, 33, 0, 12, 19, 44, 10, 31, 11, 11, 49])

In [101]:
## Sort elements (ascending order).
np.sort(arr1)

array([ 0,  2,  9, 14, 15, 16, 18, 19, 20, 22, 34, 35, 41, 41])

In [102]:
## Return unique elements.
np.unique(arr1)

array([ 0,  2,  9, 14, 15, 16, 18, 19, 20, 22, 34, 35, 41])

In [103]:
## Return unique elements, count number of appearances.
np.unique(arr1, return_counts=True)

(array([ 0,  2,  9, 14, 15, 16, 18, 19, 20, 22, 34, 35, 41]),
 array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]))

In [104]:
## Find the elements of array-1 in array-2.
np.in1d(arr1, arr2)

array([False, False, False,  True, False, False,  True, False,  True,
       False, False, False, False, False], dtype=bool)

In [105]:
## Return all unique elements of arrays 1 & 2.
np.union1d(arr1, arr2)

array([ 0,  2,  7,  9, 10, 11, 12, 14, 15, 16, 18, 19, 20, 22, 31, 33, 34,
       35, 40, 41, 42, 44, 49])

In [106]:
## Return all elements belonging to both arrays 1 & 2.
np.intersect1d(arr1, arr2)

array([ 0, 19, 22])

### Replacing List Comprehensions

NumPy includes a number of very helpful functions that act to replace list comprehensions (np.where) and for loops (np.apply_across_axis, np.apply_over_axes). These are often more efficient than writing out a full For loop. We will emphasize these functions with a simple example of standard-scoring (z-scoring) a matrix.

In [107]:
## Define the standard score (z-score) function.
def zscore(arr): 
    return (arr - arr.mean()) / arr.std()

## Define a simple matrix.
mat = np.arange(12).reshape(2,6)
print(mat)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]


Use **apply_across_axis** to apply our function across each row.

In [108]:
zmat = np.apply_along_axis(zscore, axis=1, arr=mat)
print(zmat.round(2))

[[-1.46 -0.88 -0.29  0.29  0.88  1.46]
 [-1.46 -0.88 -0.29  0.29  0.88  1.46]]


Use the **where** command to set all negative numbers to 0, else 1. **where** is identical to the **which** command in R. 

In [109]:
amat = np.where(zmat < 0, 0, 1)
print(amat)

[[0 0 0 1 1 1]
 [0 0 0 1 1 1]]


If no transforms are specified, **where** returns the indices of the array where the conditional is met.

In [110]:
print( np.where(zmat < 0 ) )

(array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2]))


### Linear Algebra Functions

NumPy includes an entire submodule dedicated to efficient linear algebra functions (though it should be noted that SciPy has reimplemented them for maximal efficiency). See np.linalg for a full list of commands.

In [111]:
## Define a simple matrix.
mat = np.arange(16).reshape(4,4)
print(mat)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [112]:
## Transpose the matrix
print(mat.T)           

[[ 0  4  8 12]
 [ 1  5  9 13]
 [ 2  6 10 14]
 [ 3  7 11 15]]


In [113]:
## Return diagonal of matrix
print(np.diag(mat))

[ 0  5 10 15]


In [114]:
## Return upper triangular matrix
print(np.triu(mat))    

[[ 0  1  2  3]
 [ 0  5  6  7]
 [ 0  0 10 11]
 [ 0  0  0 15]]


In [115]:
## Matrix multiply itself. Can also np.dot.
print(np.dot(mat, mat))    

[[ 56  62  68  74]
 [152 174 196 218]
 [248 286 324 362]
 [344 398 452 506]]


In [116]:
## Can also use:
print(mat.dot(mat))

[[ 56  62  68  74]
 [152 174 196 218]
 [248 286 324 362]
 [344 398 452 506]]


In [117]:
## Linear algebra operations include:
np.linalg.norm;        # Vector or matrix norm
np.linalg.inv;         # Inverse of a square matrix
np.linalg.det;         # Determinant of a square matrix
np.linalg.eig;         # Eigenvalues and vectors of a square matrix
np.linalg.cholesky;    # Cholesky decomposition of a matrix
np.linalg.svd;         # Singular value decomposition of a matrix
np.linalg.lstsq;       # Solve linear least-squares problem

### Generating Random Data
NumPy also includes many functions for generating random data. 

In [118]:
## Set the RNG seed!
np.random.seed(47404)

In [119]:
## Generate ten random integers between 0-9.
print( np.random.randint(0,10,10) )

[9 0 2 0 2 4 3 4 6 5]


In [120]:
## Generate five random samples of a normal distribution with mu=0,sd=1.
print( np.random.normal(0,1,5) )

[-1.46523567  0.72885891 -0.73496833 -0.38356834 -0.29662156]


In [121]:
## Generate 10 random coin flips.
print( np.random.binomial(1,0.5,10))

[1 1 0 0 1 0 0 1 1 0]


In [122]:
## Choose five numbers from 0-9 without replacement.
print( np.random.choice(np.arange(10), 5, replace=False) )

[8 2 9 7 6]
