# Lab 0: Intro to Jupyter notebooks, review of Python and NumPy

Welcome to the first lab of PSTAT 100! This lab is meant to help you familiarize yourself with Jupyter Notebooks and review Python and NumPy. 

### Objectives
* Keyboard shortcuts, running cells, and viewing documentation in Jupyter Notebooks
* Review functions, lists, and loops
* Review NumPy arrays: indexing, attributes, and operations on arrays

### Collaboration Policy

Data science is a collaborative activity. While you may talk with others about the labs, we ask that you **write your solutions individually** and do not copy them from others.

By submitting your work in this course, whether it is homework, a lab assignment, or a quiz/exam, you agree and acknowledge that **this submission is your own work and that you have read the policies regarding Academic Integrity**: https://studentconduct.sa.ucsb.edu/academic-integrity. The Office of Student Conduct has policies, tips, and resources for proper citation use, recognizing actions considered to be cheating or other forms of academic theft, and students’ responsibilities. You are required to read the policies and to abide by them.

_**If you collaborate with others, we ask that you indicate their names on your submission.**_

### Your name and collaborators

**Name:** Solution

**Collaborators:** N/A

---
## 0. Using Jupyter


If you are unfamiliar with Jupyter Notebooks, we highly recommend that you skim [this short tutorial](http://nbviewer.jupyter.org/github/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb) or select **Help --> Notebook Help** in the menu bar above. 

**Learn the notebook shortcuts** it will make navigating the notebook and your workflow faster. _(We strongly recommend that you use a computer to work on the Jupyter notebooks -- they are not well-suited for being accessed from the phones, plus, some of the shortcuts will not work well on mobile devices.)_

The most important keyboard shortcuts are `Enter`, which enters **edit mode**, and `Esc`, which enters **command mode**.

In edit mode, most of the keyboard is dedicated to typing into the cell's editor. Thus, in edit mode there are relatively few shortcuts. In command mode, the entire keyboard is available for shortcuts, so there are many more. The **Help->Keyboard Shortcuts** dialog lists the available shortcuts.

### Keyboard Shortcuts

Even if you are familiar with Jupyter, we strongly encourage you to become proficient with keyboard shortcuts (this will save you time in the future). To learn about keyboard shortcuts, go to **Help --> Keyboard Shortcuts** in the menu above. 

Here are a few that we like:
1. `Ctrl` + `Return` : *Evaluate the current cell*
1. `Shift` + `Return`: *Evaluate the current cell and move to the next*
1. `ESC` : *command mode* (may need to press before using any of the commands below)
1. Saving the notebook: `s`
1. Basic navigation: up one cell `k`, down one cell `j`
1. `a` : *create a cell above*
1. `b` : *create a cell below*
1. `dd` : *delete a cell*
1. `z` : *undo the last cell operation*
1. `m` : *convert a cell to markdown*
1. `y` : *convert a cell to code*


Your turn: find out what the following commands do:

* Cell editing: `x, c, v, d, z`
* Kernel operations: `i`, `0` (press twice)


In [None]:
# Practice the above commands on this cell




### Running Cells and Displaying Output


Run the following cell.  

In [None]:
print("Hello, World!")

In Jupyter notebooks, all print statements are displayed below the cell. Furthermore, the output of **only the last line** is displayed following the cell upon execution.

In [None]:
"Will this line be displayed?"

print("Hello" + ",", "world!")

5 + 3

### Viewing Documentation

To output the documentation for a function, use the `help()` function.

In [None]:
help(print)

You can also use Jupyter to view function documentation inside your notebook. The function must already be defined in the kernel for this to work.

Below, click your mouse anywhere on `print()` and use `Shift` + `Tab` to view the function's documentation. 

In [None]:
print('Welcome to this course!')

### Importing Libraries and Magic Commands

In this course, we will be using common Python libraries to help us process data. By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Below are some of the libraries that you may encounter throughout the course, along with their respective aliases.

In [None]:
import pandas as pd
import numpy as np

A useful magic command is `%%time`, which times the execution of that cell. You can use this by writing it as the first line of a cell. (Note that `%%` is used for *cell magic commands* that apply to the entire cell, whereas `%` is used for *line magic commands* that only apply to a single line. If you are interested, you can read more about the magic commands in this [Tutorials Point article](https://www.tutorialspoint.com/jupyter/ipython_magic_commands.htm)).

In [None]:
%%time

lst = []
for i in range(100):
    lst.append(i)

---
## 1. Prerequisites

It's time to answer some review questions. Each question has a response cell directly below it. Most response cells are followed by a test cell that runs automated tests to check your work. **Please don't delete questions, response cells, or test cells**. You won't get credit for your work if you do.

If you have extra content in a response cell, such as an example call to a function you're implementing, that's fine.

**Important Note**: Test cells don't always confirm that your response is correct. They are meant to give you some useful feedback, but it's your responsibility to currectly answer the question. There may be other/additional tests that we run when scoring your notebooks. We **strongly recommend** that you check your solutions yourself rather than just relying on the test cells.

### Python

Python is the main programming language we'll use in the course. We expect that you are familiar with it, so we will not be covering general Python syntax. If any of the below exercises are challenging (or if you would like to refresh your Python knowledge), please review one or more of the following materials.

- **[Python Tutorial](https://docs.python.org/3.5/tutorial/)**: Introduction to Python from the creators of Python.
- **[Composing Programs Chapter 1](http://composingprograms.com/pages/11-getting-started.html)**: This is more of a introduction to programming with Python.
- **[Advanced Crash Course](http://cs231n.github.io/python-numpy-tutorial/)**: A fast crash course which assumes some programming background.


You need to make sure that you are comfortable with the following basic concepts (from CS 8):
* Basic datatypes: int, float, string, bool
* Variables and assignment statements
* Basic arithmetic, including exponentiation `**` and modulo `%` (i.e., the remainder from the division); precedence rules
* Boolean values, operators for Boolean logic (`and`, `or`, `not`)
* Nested expressions, compound expressions, built-in methods (e.g., `round`, `max`)
* Conditional statements / nested conditionals
* Defining and running functions, properly using input parameters and distinguishing them from the input arguments
* `import`ing modules and specific functions
* Documenting code through the use of comments (in-line and block comments)
* String concatenation, indexing, slicing, methods (e.g., `upper`/`lower`, `replace`, `strip`)
* Loops: `while` and `for`; how to avoid infinite loops
* Lists: indexing, slicing, methods (e.g., `append`, `sort`)
* Dictionaries: syntax, accessing keys and values, adding new elements
* Generating random numbers

#### Question 1a

Write a function `summation` that evaluates the following summation for $n \geq 1$:

$$\sum_{i=1}^{n} \left(i^3 + 5 i^3\right)$$

<!--
BEGIN QUESTION
name: q1a
-->

In [None]:
def summation(n):
    """Compute the summation i^3 + 5 * i^3 for 1 <= i <= n."""
    ...

Use your function to compute the sum for...

In [None]:
# i = 2
...

In [None]:
# i = 20
...

 ### List comprehension
 
 In Python, normally you can fill a list with elements using a for loop as seen in the example below.

In [None]:
squares = []
# Add square numbers from 1 to 10000 inclusive to the list "squares" if they end in the digit "4"
for i in range(1,101):
    if (i**2)%10 == 4:
        squares.append(i**2)
print(squares)

Alternatively, you can create this same list in a single line of code by moving the for-loop and if-statement inside the loop's creation. This is called a **list comprehension**.

The syntax for a list comprehension is this: \[**value** **for-loop** **condition**\]
* **value** is the value you want to put into the list
* **for-loop** is the for-loop that iterates through a list or a range
* **condition** is the if-statement that determines if the **value** is allowed to be inserted into the list

For more information, you can read [this beginners tutorial](https://www.pythonforbeginners.com/basics/list-comprehensions-in-python) on list comprehensions

In [None]:
# value: i**2
# for-loop: for i in range(1,101)
# condition: if (i**2)%10 == 4
squares_list_comprehension = [i**2 for i in range(1,101) if (i**2)%10 == 4]

### Aligning lists with `zip()`

If you need to line up 2 or more lists, Python has a built-in function called **zip()**. This allows you to loop through more than one list at a time such that you get values in each list that have the same index in the other lists. This can also be used within for-loops to make them even more powerful.

In [None]:
a = [1,2,3,4]
b = [2,3,4,5]
# Print a[0]*b[0], a[1]*b[1],...
for x, y in zip(a,b):
    print(x*y)

#### Question 1b

Write a function `list_sum` that computes **the square** of _each_ value in `list_1`, **the cube** of _each_ value in `list_2`, and returns a list containing the element-wise sum of these results. Assume that `list_1` and `list_2` have the same number of elements. Try to use a list comprehension to write it all on one line.

<!--
BEGIN QUESTION
name: q1b
-->

In [None]:
def list_sum(list_1, list_2):
    """Compute x^2 + y^3 for each x, y in list_1, list_2. 
    
    Assume list_1 and list_2 have the same length.
    """
    assert len(list_1) == len(list_2), "both args must have the same number of elements"
    ...

## 2. NumPy

NumPy NumPy (pronounced "NUM-pie") is the numerical computing module, which we will be using a lot in this course. Here's a quick recap of NumPy. For more review, read the following materials.

- **[NumPy Quick Start Tutorial](https://docs.scipy.org/doc/numpy-1.15.4/user/quickstart.html)**
- **[Stanford CS231n NumPy Tutorial](http://cs231n.github.io/python-numpy-tutorial/#numpy)**

#### Question 2a

The core of NumPy is the array. Like Python lists, arrays store data; however, they store data in a more efficient manner. In many cases, this allows for faster computation and data manipulation.

Let's use `np.array` to create an array. It takes a sequence, such as a list or range (remember that list elements are included between the square brackets `[` and `]`). 

Below, create an array `arr` containing the values 1, 2, 3, 4, and 5 (in that order).

<!--
BEGIN QUESTION
name: q2a
-->

In [None]:
arr = ...

In addition to values in the array, we can access attributes such as array's shape and data type. A full list of attributes can be found [here](https://docs.scipy.org/doc/numpy-1.15.0/reference/arrays.ndarray.html#array-attributes).

### Indexing

NumPy arrays are integer-indexed by position, with the first element indexed as position 0. Elements can be retrieved by enclosing the desired positions in brackets `[]`. 

In [None]:
arr[3]

To retrieve consecutive positions, specify the starting index and the ending index separated by `:` -- *e.g.*, `arr[from:to]`. This syntax is non-inclusive of the left endpoint; notice below that the starting index is *not* included in the output.

In [None]:
arr[2:4]

### Attributes

NumPy arrays have several attributes that can be retrieved by name using syntax of the form `arr.attr`. Some useful attributes are:

* `.shape`, a tuple with the length of each array dimension
* `.size`, the length of the first array dimension
* `.dtype`, the data type of the entries (float, integer, etc.)

In [None]:
arr.shape

In [None]:
arr.size

In [None]:
arr.dtype

Arrays, unlike Python lists, **cannot store items of different data types**.

In [None]:
# A regular Python list can store items of different data types
[1, '3']

In [None]:
# Arrays will convert everything to the same data type
np.array([1, '3'])

In [None]:
# Another example of array type conversion
np.array([5, 8.3])

### Operations on arrays

Arrays are also useful in performing *vectorized operations*. Given two or more arrays of equal length, arithmetic will perform **element-wise computations** across the arrays. 

For example, observe the following:

In [None]:
# Python list addition will concatenate the two lists
[1, 2, 3] + [4, 5, 6]

In [None]:
# NumPy array addition will add them element-wise
np.array([1, 2, 3]) + np.array([4, 5, 6])

#### Question 2b

Given the array `random_arr`, assign `valid_values` to an array containing all values $x$ such that $2x^4 > 1$.

<!--
BEGIN QUESTION
name: q2b
-->

In [None]:
# for reproducibility - setting the seed will result in the same random draw each time
np.random.seed(42)

# draw uniformly random integers between 1 and 60
random_arr = np.random.rand(60)

# solution
valid_values = ...

#### Question 2c

Use NumPy to recreate your answer to Question 1b. The input parameters will both be lists, so you will need to convert the lists into arrays before performing your operations.

**Hint:** Use the [NumPy documentation](https://docs.scipy.org/doc/numpy-1.15.1/reference/index.html). If you're stuck, try a search engine! Searching the web for examples of how to use modules is very common in data science.

<!--
BEGIN QUESTION
name: q2c
-->

In [None]:
def array_sum(list_1, list_2):
    """Compute x^2 + y^3 for each x, y in list_1, list_2. 
    
    Assume list_1 and list_2 have the same length.
    
    Return a NumPy array.
    """
    assert len(list_1) == len(list_2), "both args must have the same number of elements"
    ...

You might have been told that Python is slow, but array arithmetic is carried out very fast, even for large arrays.

For ten numbers, `list_sum` and `array_sum` both take a similar amount of time.

In [None]:
sample_list_1 = list(range(10))
sample_array_1 = np.arange(10)

In [None]:
%%time
list_sum(sample_list_1, sample_list_1)

In [None]:
%%time
array_sum(sample_array_1, sample_array_1)

The time difference seems negligible for a list/array of size 10; depending on your setup, you may even observe that `list_sum` executes faster than `array_sum`! However, we will commonly be working with much larger datasets:

In [None]:
sample_list_2 = list(range(100000))
sample_array_2 = np.arange(100000)

In [None]:
%%time
list_sum(sample_list_2, sample_list_2)
; # The semicolon hides the output

In [None]:
%%time
array_sum(sample_array_2, sample_array_2)
;

With the larger dataset, we see that using NumPy results in code that executes over 50 times faster! Throughout this course (and in the real world), you will find that writing efficient code will be important; arrays and vectorized operations are the most common way of making Python programs run quickly.

#### A note on `np.arange` and `np.linspace`

Usually we use `np.arange` to return an array that steps from `a` to `b` with a fixed step size `s`. While this is fine in some cases, we sometimes prefer to use `np.linspace(a, b, N)`, which divides the interval `[a, b]` into N equally spaced points.

`np.arange(start, stop, step)` produces an array with all the numbers starting at `start`, incremendted up by `step`, stopping **before** `stop` is reached. For example, the value of `np.arange(1, 6, 2)` is an array with elements 1, 3, and 5 -- it starts at 1 and counts up by 2, then stops before 6. `np.arange(4, 9, 1)` is an array with elements 4, 5, 6, 7, and 8. (It doesn't contain 9 because `np.arange` stops _before_ the stop value is reached.)

 `np.linspace` always includes **both end points** while `np.arange` will **not** include the second end point `b`. For this reason, especially when we are plotting ranges of values we tend to prefer `np.linspace`.

Notice how the following two statements have different parameters but return the same result.

In [None]:
np.arange(-5, 6, 1.0)

In [None]:
np.linspace(-5, 5, 11)

---
# Submission


1. Make sure you **save the notebook** first, 
2. Then go up to the `Kernel` menu and select `Restart & Clear Output` (make sure the notebook is saved first, because otherwise, you will lose all your work!). 
3. Now, go to `Cell -> Run All`. Carefully look through your notebook and verify that all computations execute correctly. You should see **no errors**; if there are any errors, make sure to correct them before you submit the notebook.
4. Then, go to `File -> Download as -> Notebook` and download the notebook to your own computer.
5. Upload to Gradescope