# Introduction to Google Colab, Jupyter Notebooks, and NumPy

This is a Google Colab Notebook which serves as an introduction to running Jupyter Notebooks in Colab. It also introduces Python's NumPy library.

*NOTE: It may be helpful to click 'View' -> 'Collapse Sections' in the bar at the top left of the page.*

## Colab and Jupyter Notebooks

The document you're looking at is a **Google Colab** which is a technology developed by Google, based on Jupyter Notebook, that allows you to interactively run Python through a web browser.

Each cell is either text or code. Read the text, and then run the code blocks by cursoring over the top left of a code block and pressing the play button. Output from each code block will show up below each code cell.

In [1]:
# <-- click the play button over there!

# Make sure you see the output below this block before moving on
print("Hello World!")

Hello World!


Variables from each code block are stored so you'll have them when you run the next block. Re-running a code block will replace the data in the cell. Importantly, if you run run blocks out of order, you may run into errors relating to some variables not being defined.

This next cell defines a variable `x`. Once you run it, `x` will be available in any cell run _after_ this one runs. Try running it now.

In [2]:
x = 102

If you ran the above cell, it will now be available if you run this next cell:



In [3]:
print("x =", x)

x = 102


Try changing `x` above. If you re-run the last cell (the one that prints `x`), `x` will _not_ change. This is because you did not execute the code that sets the value of `x`.

Go back and run the cell where `x` is defined. Now the last cell will print the updated value.

Alright, that should be all you need to know for this notebook. Let's learn some NumPy!

## Introduction to NumPy


Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array (referred to as a "numpy array" or "np array"), and optimized tools for working with these arrays.

To use Numpy, we first need to import the `numpy` package:

In [4]:
import numpy as np

This follows the usual python syntax for importing libraries and will allow us to access objects and functions from numpy using the prefix `np.`.

##Array Basics

On the most basic level, we can think of numpy arrays like lists.

They hold a grid of values, all of the same type, which can be accessed by indexing. In fact, the simplest way to initialize a numpy array is from a Python list using `np.array(LIST)`.

In [5]:
my_list = [1, 2, 3]  # A Python list initialized in the usual way
print("my_list:", my_list, "\n")

my_nparray = np.array([1, 2, 3])  # A numpy array initialized from a list directly
print("my_nparray:", my_nparray, "\n")

my_nparray_2 = np.array(my_list)  # We can also, of course, use an existing list variable
print("my_nparray_2:", my_nparray_2)

my_list: [1, 2, 3] 

my_nparray: [1 2 3] 

my_nparray_2: [1 2 3]


In this one dimensional form, they can be indexed in the same way as lists and appear very similar.

In [6]:
print("my_list[1]:", my_list[1])  # Indexing the middle element in the list
print("my_nparray[1]:", my_nparray[1])  # The same operation for the np array

my_list[1]: 2
my_nparray[1]: 2


The differences in use become more apparent as we move into 2 or more dimensional lists.



We can initialize higher dimensional numpy arrays using nested lists. To examine the size of an array along each dimension we can access the `.shape` property.

In [7]:
array_2D = np.array([[1, 2, 3], [4, 5, 6]])  # We get 2D arrays from lists of lists
print("array_2D:")
print(array_2D)

print("\narray_2D.shape:", array_2D.shape)
print("(Length 2 along the first dimension, length 3 along the second dimension.)")

array_2D:
[[1 2 3]
 [4 5 6]]

array_2D.shape: (2, 3)
(Length 2 along the first dimension, length 3 along the second dimension.)


In [8]:
array_3D = np.array([[[1, 2, 3, 4], [5, 6, 7, 8]],
                     [[9, 10, 11, 12], [13, 14, 15, 16]],
                     [[17, 18, 19, 20], [21, 22, 23, 24]]]) # We get 3D arrays from lists of lists of lists

print("array_3D:")
print(array_3D)

print("\narray_3D.shape:", array_3D.shape)
print("(Length 3 along the first dimension, length 2 along the second dimension, \
length 4 along the third dimension.)")

array_3D:
[[[ 1  2  3  4]
  [ 5  6  7  8]]

 [[ 9 10 11 12]
  [13 14 15 16]]

 [[17 18 19 20]
  [21 22 23 24]]]

array_3D.shape: (3, 2, 4)
(Length 3 along the first dimension, length 2 along the second dimension, length 4 along the third dimension.)


We can carry this on into any number of dimensions if we find it useful, although it is uncommon to see arrays above around 4 dimensions.



Proper indexing for higher dimensions works a little differently from lists. We index using a tuple of indicies rather than repeated square brackets. Indexing with multiple square brackets *does* still work for numpy arrays, but is less readable and often considered bad form.

In [9]:
print("array_2D[1][2]:", array_2D[1][2])  # accesing the second row, third column with repeated square brackets (PYTHON IS ZERO INDEXED!)
print("array_2D[1, 2]:", array_2D[1, 2])  # the same thing, but in the more readable numpy syntax
print("array_2D.shape:", array_2D.shape)

print("\narray_3D[1, 0, 3]:", array_3D[1, 0, 3]) # we can do the same thing for higher dimensions
print("array_3D.shape:", array_3D.shape)

print("\nNotice that the indexing tuples ((1, 2) and (1, 0, 3)) have the same number of \
\nvalues as the shapes. \n\nAgain, this is because we are using one index on each dimension.")

array_2D[1][2]: 6
array_2D[1, 2]: 6
array_2D.shape: (2, 3)

array_3D[1, 0, 3]: 12
array_3D.shape: (3, 2, 4)

Notice that the indexing tuples ((1, 2) and (1, 0, 3)) have the same number of 
values as the shapes. 

Again, this is because we are using one index on each dimension.


This will allow you to access any single value from an array. In the next section, we'll look at indexing out multiple values from arrays.

Before we do this, let's quickly look at some useful ways to initialize arrays, and then do an exercise.

In [10]:
a = np.zeros((2,2))  # Create an array of all zeros with shape (2, 2)
print(a)

[[0. 0.]
 [0. 0.]]


In [11]:
b = np.ones((3,4))   # Create an array of all ones with shape (3, 4)
print(b)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [12]:
c = np.full((2,2), 7) # Create a constant array of shape (2, 2) with values set to 7
print(c)

[[7 7]
 [7 7]]


In [13]:
d = np.eye(2)        # Create a 2x2 identity matrix
print(d)

[[1. 0.]
 [0. 1.]]


In [14]:
e = np.random.random((5,2)) # Create an array filled with random values of shape (2, 2)
print(e)

[[0.96430391 0.90788667]
 [0.8381627  0.91919462]
 [0.62048048 0.59597549]
 [0.42561316 0.7284221 ]
 [0.42074813 0.25610879]]


##Exercise 1

Let's make an array!

Create a four dimensional numpy array. The array should have:

* dimension 1 of length 3

* dimension 2 of length 4

* dimension 3 of length 2

* dimension 4 of length 6

* all values set to 102.

You should be able to do this in one short line (without writing out the entire equivalent list).

*If your array prints and your code is one line, you did it right!*


In [15]:
exercise_1_array = None # TODO: change this line

# HINT: take a look at the different initialization functions

assert(exercise_1_array.shape == (3,4,2,6) \
       and np.min(exercise_1_array) == 102 \
       and np.max(exercise_1_array) == 102)
print(exercise_1_array.shape)
print(exercise_1_array)

AttributeError: 'NoneType' object has no attribute 'shape'

Now change the very last value in the array to 201 using indexing.

*If you see that the bottom right value in the print out has changed to 201, you did it right!*

In [None]:
# TODO: add a line of code here



print(exercise_1_array) # You should see the value changed in this print out

##Intro to Indexing Methods

In this section we will see some of the foundational indexing methods that allow us to select more complex arragements of the values in the arrays.

###Slicing

Similar to Python lists, numpy arrays can be sliced. As a quick reminder, we slice lists using the syntax `list[start:stop:step]` where `start` is inclusive, `stop` is exclusive. The default values are `start = 0`, `stop = list.length`, `step = 1` when nothing is specified. If the step value is negative then `start = -1` and `stop = -list.length - 1` so that something like `list[::-1]` is just the reverse of `list`.



In [16]:
my_list = [2, 4, 6, 7, 9, 6]  # a simple list to do a few slices on

print("my_list[::]:", my_list[::])
print("my_list[:]:", my_list[:])  # these simply index the whole list (there's no reason to do either in practice)

my_list[::]: [2, 4, 6, 7, 9, 6]
my_list[:]: [2, 4, 6, 7, 9, 6]


In [None]:
print("my_list[1::2]:", my_list[1::2])  # starting the index at index one and steping by two until the end

In [None]:
print("my_list[1::2]:", my_list[2:5]) # getting the list of values from indexes 2, 3, and 4

In [None]:
print("my_list[::-1]:", my_list[::-1])  # reversing the list

For np arrays, we slice with a tuple of `start:stop:step` indications to specify the slice along each dimension.

In [None]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])  # example 3x4 np array for slicing
print("a:\n")
print(a)

b = a[:2, 1:3]
print("\na[:2, 1:3]:\n")  # slicing out row 0 and 1, and column 1 and 2.
print(b)

It's important to note that a slice of an array is accessing the same data, so changing the slice will change the array and vice versa.

In [None]:
print(a[0, 1])
b[0, 0] = 77  # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])  # we will see that the value has changed

###Mixing Integer Indexing and Slicing

We have seen indexing with a tuple of integers or a tuple of slices. In this section we will look at indexing with a tuple containing both integers and slices.

When you index with an integer along a given dimension it will index as expected, but that dimension will not show up as a dimension in the output array's shape.

Essentially the values captured will be what you expect, but the shape will be one dimension less than that of the array you indexed from. This is *confusing*, so let's look at some examples to clarify.

In [None]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])  # resetting the previous 3x4 example array

# two ways to grab the middle row:

middle_row_sliced = a[1:2, :] # using all slices will return middle row in a 2D array
print("two slices:", middle_row_sliced)
print("shape:", middle_row_sliced.shape)

middle_row_int_slice = a[1, :] # using one integer and one slice will return middle row in a 1D array
print("\nint index and slice:", middle_row_int_slice)
print("shape:", middle_row_int_slice.shape)

Notice how the dimension we integer indexed into has been removed from the shape. (We used an integer on dimension 1 in the second method.)

Notably, if we want integer indexing to behave like slicing, we can wrap the integer in square brackets:

In [None]:
middle_row_2D_int_slice = a[[1], :] # this is equivalent to using two slices, as in the first method above
print("2D int slice:", middle_row_2D_int_slice)
print("shape:", middle_row_2D_int_slice.shape)

If we do something similar but using an integer to index a column, we predict that the second dimension will now be dropped from the shape. This has the side affect of making the column print in a "horizontal orientation" since it just pulls out a 1D array.

The lesson here to always think about what the shape will be when we use integer indexing -- we wouldn't want any surprises.

In [None]:
# two ways to grab the middle column

col_sliced = a[:, 1:2]  # using all slices to get the column will give a 2D array
print("two slices:")
print(col_sliced)
print("shape", col_sliced.shape)


col_int_sliced = a[:, 1]  # using an integer in the second dimension will remove dimension 2 from the original space
print("\nint index and slice:")
print(col_int_sliced)
print("shape:", col_int_sliced.shape)

###Integer Array Indexing

Integer array indexing uses an array of indicies we want to index into for each dimension. It's a pretty intuitive method that allows us to cherry pick values from an existing array.



In [None]:
print("a:")
print(a, "\n")

# This is integer indexing with an array of row values and an array of column values.
print(a[[0, 1, 2], [0, 1, 0]])

# Thus the above is equivalent to the array constructed element-wise below
print([a[0, 0], a[1, 1], a[2, 0]])

Slicing should be used if the indexes are in a strictly sequential pattern because it is faster to write. This means we should only use integer indexing when the indecies we want aren't in a uniform order.

###Boolean Indexing

Boolean array indexing is a powerful way to pick out all of the elements of an array that satisfy some condition. It is done by using a logical operator directly on an array variable:

In [None]:
bool_idx = a > 5  # the '>' performs boolean indexing since a python knows a is an array
print("a:")
print(a)

print("\na > 5:")
print(bool_idx)

We see that the output array has the same shape, with True everywhere the condition is satisfied and False otherwise.

This output array can then be used to index back into the original arrays to get a 1D array of all the values where the condition was satisfied:

In [None]:
print(a[bool_idx])  # getting a list of all values in a where bool_idx is True

print(a[a > 5]) # equivalently getting a list of all values > 5 in a, with a smooth one-liner (Python is awesome!)

We've covered some of the most common methods of array indexing here, and they'll be more than enough for ROB 102.

It's important to note, however, that there are a lot more tricks out there as Python is a truly complicated language.

##Exercise 2

Let's do an indexing exercise!

Let's start with a random 2D array with shape (12, 12):

In [None]:
mat = np.random.random((12, 12))
print(mat)
# Wow, that's a lot of output. Click the button in the top right of the output box to clear it.

Use slicing to get the lower left quarter of the 12 by 12 as a 2D array. This should only take one line.

*If your code is one line and your matrix prints, you did it right!*

In [None]:
lower_left = None # TODO: change this line

# yes we are just checking a few values, but it should be sufficient if you are in fact using slicing
assert(lower_left[0, 0] == mat[6, 0] and lower_left[4, 3] == mat[10, 3] and lower_left[5, 4] == mat[11, 4])
print(lower_left)

##Intro to Array Math


A big part of why numpy arrays are so useful is the support for math operations on the arrays. These operations make the math both compact to write, and fast to execute on run time.

###Element-wise Math

The simplest and most common methods of array math are element wise operations.

We can add, subtract, multiply, divide, and do many other arithmetic operations of arrays of the same shape:

In [None]:
x = np.array([[1,2],[3,4]], dtype=np.float64) # This 'dtype=' syntax just sets the values to stored in a certain variable type.
y = np.array([[5,6],[7,8]], dtype=np.float64) # In this case we use 64-bit floats because we want to do some precise arithmetic.

print("x:")
print(x, "\n")
print("y:")
print(y, "\n")

# two equivalent methods for elementwise addition
print("x + y:")
print(x + y, "\n")
print("np.add(x, y):")
print(np.add(x, y))

In [None]:
# two equivalent methods for elementwise subtraction
print(x - y)
print(np.subtract(x, y))

In [None]:
# two equivalent methods for elementwise multiplication
print(x * y)
print(np.multiply(x, y))

In [None]:
# two equivalent methods for elementwise division
print(x / y)
print(np.divide(x, y))

In [None]:
# element-wise square root
print(np.sqrt(x))

###Math Along Axes

As you might imagine, there are also functions that do math with the elements inside one array. There are tons of these that work on the same basic principle, so for now we'll just look at adding elements together using the sum function:

In [None]:
# making sure we have the a 3D_array initialized
array_3D = np.array([[[1, 2, 3, 4], [5, 6, 7, 8]],
                     [[9, 10, 11, 12], [13, 14, 15, 16]],
                     [[17, 18, 19, 20], [21, 22, 23, 24]]])
print("array_3D:")
print(array_3D, "\n")
print("array_3D.shape:", array_3D.shape)

In [None]:
# summing all the elements in the array
print(np.sum(array_3D))

Summing all the elements is simple. However the sum function also supports adding along different "axes" using the `axis=` keyword argument.

Axes correspond to dimensions but are 0 indexed, while dimensions are 1 indexed. A 3D array has dimensions 1, 2, and 3 and corresponding axes 0, 1, and 2.

Doing an operation along axis `n` is equivalent to doing it for every index in the `n+1`th dimension. This is a *difficult* concept, take your time with the example.

In [None]:
# element wise of each of the arrays returned by indexing into the first dimension
sum_along_dim1 = np.sum(array_3D, axis=0)

print(sum_along_dim1)
print("\nshape:", sum_along_dim1.shape) # the shape will be the shape of the dim 2 and dim 3 (the ones not added along)
print("shape of 3D array:", array_3D.shape)

In [None]:
# element wise of each of the arrays returned by indexing into the second dimension
sum_along_dim1 = np.sum(array_3D, axis=1)

print(sum_along_dim1)
print("\nshape:", sum_along_dim1.shape) # the shape will be the shape of the dim 1 and dim 3 (the ones not added along)
print("shape of 3D array:", array_3D.shape)

In [None]:
# element wise of each of the arrays returned by indexing into the third dimension
sum_along_dim1 = np.sum(array_3D, axis=2)

print(sum_along_dim1)
print("\nshape:", sum_along_dim1.shape) # the shape will be the shape of the dim 1 and dim 2 (the ones not added along)
print("shape of 3D array:", array_3D.shape)

Make sure you note how selecting the axis affects the shape (as described in the comments).

Again, this is a tough concept so take a second here. The `axis=` keyword argument is common to many math functions and frequently misunderstood. After understanding it, you'll be a step ahead of most python beginners.

###Matrix Multiplication

We're aware that linear algebra is not a pre-requisite for this course, so in this section we'll briefly describe the matrix in its numpy array form, and show the numpy function for matrix multiplication.

In Python, we can just think of matrices as 2D arrays of numbers. We'll call them "n by m" matrices, where "n" is the number of rows and "m" is the number of columns.

In [None]:
matrix_2x3 = np.array([[1, 2, 3], [4, 5, 6]])
matrix_3x2 = np.array([[1, 2], [3, 4], [5, 6]])

print("2 by 3 matrix")
print(matrix_2x3, "\n")
print("3 by 2 matrix")
print(matrix_3x2)

In this class, we'll let numpy do the matrix multiplication for us. The one important thing to note is that two matrices can only be multiplied if the inner dimensions of the product are the same. The product will then take on the outer dimensions of the multiplied matrices.

n x m $\cdot$ m x p $=$ n x p

In [None]:
print("2 by 3 matrix times 3 by 2 matrix:")
print(np.dot(matrix_2x3, matrix_3x2)) # The inner dimensions are both 3 so we can take the product.
                                      # The outer dimensions are both 2 so we get a 2x2 matrix output.

In [None]:
print("3 by 2 matrix times 2 by 3 matrix:")
print(np.dot(matrix_3x2, matrix_2x3)) # The inner dimensions are both 2 so we can take the product.
                                      # The outer dimensions are both 3 so we get a 2x2 matrix output.

Note that the outer dimensions don't have to be equal to take the product, only the inner dimensions.

Finally, it's good to know that numpy provides a few different ways to take this product:

In [None]:
print(np.dot(matrix_3x2, matrix_2x3), "\n") # we've already seen this method

print(matrix_3x2.dot(matrix_2x3), "\n") # a similar way of accessing the np.dot() function

print(matrix_3x2 @ matrix_2x3, "\n")  # an abreviated (and often very handy) method to call the same function

This is a pretty good foundation for numpy math.


As always, there is much more to learn. You can find the full list of mathematical functions provided by numpy in the [documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

##Exercise 3

Let's do a little array math!

Take the following matrices defined as 2D numpy arrays:

In [None]:
matrix_1 = np.full((4, 7), 1)
matrix_2 = np.full((3, 4), 2)
matrix_3 = np.full((7, 9), 3)
matrix_4 = np.full((2, 4), 4)
matrix_5 = np.full((9, 3), 5)
##print(matrix_1)

Write a valid matrix product that multiplies all five of these matrices together. (The point of this question is figuring out the right order to multiply the matrices and using the numpy matrix multiply operator.)

*If your product prints, then you did it right!*

In [None]:

matrix_product = None # TODO: change this line

# HINT: Look at the dimensions. Think about what matrix has to be all the way on the left of the product.
# note: output 2x4
#     matrix_4 * matrix_1 * matrix_3 * matrix_5 * matrix_2

correct_output = np.full((2,4), 90720)
assert(matrix_product.all() == correct_output.all())
print(matrix_product)

Now use the sum function to get a 2 long array in one line.

*If your 2 long array prints, then you did it right!*

In [None]:
sum_of_matrix = None  # TODO: change this line

correct_output = np.full((1,2), 362880)
assert(sum_of_matrix.all() == correct_output.all())
print(sum_of_matrix)