# SPS 'Python for Research' Workshop Series - Lecture 3 (01/31/2021)

**Welcome to the third session of UChicago SPS' introductory Python workshop series!** The instructor for today's workshop is William Cerny (willcerny@gmail.com). This notebook's checkpoint activity was created by Jared Siegel.

# Topics for the Day

**- Quick Introduction to range() and reminder of for loops**  <br/> 
**- Recap and Further Examples of Functions** (thanks for the feedback!)  <br/> 
**- Introduction to Packages, generally** <br/>
**- Introduction to "numpy": the numerical Python package** <br/>
**- Checkpoint Coding Exercise**

# -----------------------------------------------------------------------------------------------------------

## Tiny Add-on to Last Week's Lecture: the range() function

In using for loops, sometimes you want an operation to execute a fixed number of times. Here, it is useful to introduce a function called range(). Its use is best shown by example:

In [6]:
for index in range(5,10):
    print(5 * index)

25
30
35
40
45


When we use range(10) for our for loop, we have looped through 10 total numbers, starting from 0 and going up to 9. While this may seem like a non-intuitive scheme, it is useful for iterating through a list. See below:

In [2]:
zoo = ['cat','dog','frog','zebra','horse','elephant']

print('The zoo has',len(zoo),'animals')

for i in range(len(zoo)):
    print(i,zoo[i])

The zoo has 6 animals
0 cat
1 dog
2 frog
3 zebra
4 horse
5 elephant


In the above, we have used range(), paired with list indexing, to iterate through each element in our starting list. This is effectively the exact reason why range() is designed to span 0 to (given argument - 1): you can never give an invalid index for your list if you do things this way.

Unfortunately, the below shows a limitation of range(): we can't easily print these values outside of the context of a loop. We'll return to this issue later today.

In [3]:
print(range(10))

range(0, 10)


# -----------------------------------------------------------------------------------------------------------

# A Recap of Functions in Python

I'll try to clarify some of the jargon as we go along. To start,

### A "function" in Python is just a block of code that does something when it is "called" (used).

We have encountered functions in two contexts so far: The first of these are the **default** functions that come with Python. For example, 

In [7]:
mylist = [6,7,8,9]
length = len(mylist)
print(length)

4


Here, we have **called** the default Python function len() to find the length of the pre-defined list "mylist". In this case, we refer to mylist as an **argument** (input) of the function len(). 

The second kind of functions we have encountered came up last week: **user-defined** functions. An example with all the pieces labelled is shown below:

In [9]:
def functionName(argument1, argument2): ## function starts with "def", then the function's name
    '''
    Optional but highly encouraged: add text separated by lines of triple quotes to include details about 
    what your function actually does. This is called the "docstring" of your function. This needs to be indented.
    '''

     ### Now, the "meat" of your code goes here.
     ### Be sure to INDENT all lines of code within the function using a tab or 4 spaces! 
    result = argument1 - argument2 
#     print(result)
    return result ## this tells your function what to output.


Note the key point that argument1 and argument2 are effectively placeholders: you don't need to define them in advance.

We can now **call** this function either using pre-defined variables or just giving arguments directly:

In [10]:
## using pre-defined variables (recommended)

v1 = 8
v2 = 5
 
result = functionName(v1, v2) ### stores the returned value from the function in a variable
print(result)

3


The above would be fully equivalent to the following:

In [11]:
result = functionName(8, 5)  ## using numbers directly rather than pre-defined variables.
print(result)

3


**Note: The order of the arguments matters!!**

In [12]:
v1 = 8
v2 = 5

# Now with the ordering of the arguments switched!
result2 = functionName(v2, v1) 
print(result2)

-3


## A couple new nuances about function arguments

### (1) If you are worried about ordering of function arguments (which is more common than you might think), you can explicitly spell out which of your "inputs" corresponds to which argument of the function:

In [13]:
v1 = 8
v2 = 5

## now there is no ambiguity which input goes to which placeholder
result2 = functionName(argument1 = v1, argument2 = v2) 
print(result2)

3


Now, with this explicit labelling scheme, we can safely reverse the order of the arguments if desired:

In [14]:
v1 = 8
v2 = 5


result2 = functionName(argument2 = v2,argument1 = v1) 
print(result2)

3


As expected, the result above is the same as from a few cells ago.

### (2) It is often useful to have what are called "default" arguments, which stand-in for the value of an argument if one is not provided. For example,

In [15]:
def calculate_rocket_force(mass, acceleration_upwards, considerGravity = False):
    '''
    Calculate the net force experienced by a rocket, either with or without gravity.
    '''
    if considerGravity != True: ### != means "not equal to". Here, this is the same as saying "== False"
        return mass * acceleration_upwards
    
    else:
        return mass * acceleration_upwards - (mass * 9.8) 

Because we specified a default value for "considerGravity" in the function arguments, we need not redefine it here:

In [16]:
m = 10
a = 10

## In the line below, we are NOT specifiying considerGravity
net_force =  calculate_rocket_force(mass = m, acceleration_upwards =  a) 
print(net_force)

100


Of course, if we don't want this, we can just provide a value ourselves:

In [17]:
m = 10
a = 10

grav = True

## In the line below, we ARE specifyinh considerGravity
net_force =  calculate_rocket_force(mass = m, acceleration_upwards =  a, considerGravity = grav) 
print(net_force)

2.0


or, in short:

In [18]:
print(calculate_rocket_force(m,a))
print(calculate_rocket_force(m,a,True))

100
2.0


####  While this may seem like a totally pendantic nuance, it turns out to be very important when using scientific functions provided by packages. They could be making some default assumption that you might not be aware of, so be careful!

# -----------------------------------------------------------------------------------------------------------

# Function Practice Example #1

### Task: Write a function that takes in a distance (in meters) and returns the amount of time (in seconds) it would take for light to travel that distance. Include a boolean argument "roundtrip" which, if True, finds the round-trip travel time. Have the function assume by default that the trip is one-way.

In [None]:
## Solution Goes Here.

### Test Cases:

(1) Calculate the time it takes for light to travel 3 * 10^8 meters (one-way). You can guess what the answer should be! <br/>
(2) Calculate the time it takes for light to travel 1 foot (.3048 meters) <br/>
(3) Calculate the round trip travel time for light to travel from the Sun to Earth and back (~300 billion meters)

# -----------------------------------------------------------------------------------------------------------

# Function Practice Exercise (on your own time)

### Task: Write a function that calculates whether an input integer number is divisible by EITHER a single number or a list of numbers (see test cases for an example to clarify). The function should take only two arguments: the number you are testing, and a second argument for the second number/list. If the second input is a single number, then return a single True or False. If the second input is a list, return a list of True/False.

Bonus: add a safeguard of some kind in case the first number is <= 0.

In [None]:
## Solution Goes Here.

### Test Cases:

(1) Test your function with the first input as 42 and the second input as 14. The answers should be True. <br/>
(2) Test your function with the first input as 13 and the second input as a list of all the positive, nonzero integers less than 10. The resulting list should be all False because 13 is prime. <br/>
(3) Test your function with the first input as 15 and the second input as [1,3,5,7]. This should return [True, True, True, False]

### Extra Bonus Problem: Repeat the above, but adapt the list-input case to return a SINGLE boolean that is True if the first input is divisible by ALL values in the list given as input. This will help you practice boolean logic.

In other words, if first number = 20 and the second input is the list list [1,2,3,5,10], you should return a single value of "False" because 20 is not visible by 3. If the second input was the list [1,2,5,10], the result should be a single value of "True".

# -----------------------------------------------------------------------------------------------------------

# Introduction to Packages

Everything we have done thus far has used default, out-of-the-box Python. Although you can solve a huge range of problems using just default Python, there are plenty of common tasks that others have implemented for us that we can use. By doing so, we can avoiding "re-inventing the wheel". **The way we access these useful utilities created by others is through "packages" (also called by the mostly equivalent name "modules").**

As noted in Lecture 1, there are more than 300,000 existing Python packages that you could use. In practice, there are only a very small number of packages you need to do almost any research task. **The great news is that by using Anaconda, you already have the very most important packages pre-installed!** Therefore, installing additional packages is beyond the scope of this lecture.

## Importing Packages

Importing Python packages is easy. See examples:

In [19]:
import numpy # numpy = short for "numerical Python" = we'll use this a lot later today
import matplotlib # ubiquitously-used plotting library (more next week)
import pandas # a package for manipulating tabular data 
import scipy # scipy = short for "scientific Python" = we'll use this in week 6

This only needs to be done ONCE in your code/notebook. It then applies to all cells.

**This workshop series will focus exclusively on these four, because they are likely the most useful by a huge margin.**

For completeness, here's a list of packages that Anaconda comes pre-installed with. I personally only have interacted with a small fraction of these, since, as noted before, only a small number of packages are actually needed to solve a huge range of research problems. Link to list: https://docs.anaconda.com/anaconda/packages/py3.7_win-64/

## Key Point #1: In general, the reason you import a package is because it contains useful FUNCTIONS.

For example, let's say that you want to import a function from a package to solve our earlier issue with range(). In other words, let's say we want to print all numbers from 0 to 9. To do so:

In [24]:
numpy.arange(100) 

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

This is **IDENTICAL** to the functions we have been working with up to now: it has a function name (here, "arange"), followed by a pair of parantheses wrapping around the argument (here, 10). **The only change is that the function name is now preceded by the name of the package from which it came. **

### The most obvious exception to this general point is that sometimes you important packages to get specific physical constants:

In [25]:
import math # admittedly this is a pretty hillarious line of code
print('The value of pi is', math.pi)
print('The value of Euler\'s number is', math.e)

The value of pi is 3.141592653589793
The value of Euler's number is 2.718281828459045


In [27]:
math.e**6

403.428793492735

## Key Point #2: In general, we often abbreviate the names of packages when we import them in order to save time later on. 

In particular, we ALWAYS import numpy as follows:

In [28]:
import numpy as np ## the "nickname" of the package is now np.

The example from a few cells back now can be written:

In [29]:
np.arange(10) ## identical to numpy.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Key Point #3: Google is your friend. There isn't a nice and easy list of all the useful functions you might need and which packages they're from, but a simple google search will tell you! This is not something to be ashamed of: it is a VITAL part of research coding!

# ---------------------------------------------------------------------------------------------------------

# More Fun with Numpy

Above, we presented an example using the numpy package. **I claim with nearly 100% confidence (but without a source) that numpy is both the most popular and most important package in all of Python.** For this reason, it's worth a bit of a tour of what numpy can do.

## Key Point 1: The main purpose of numpy is to allow for efficient data handling and computation, including for multidimensional collections of objects. Numpy achieves this through a new data type called an "array". 

This might seem like something trivial, but it actually turns out to be super relevant. Consider the following example:

In [31]:
# define a list using default Python
list1 = [1,3,5]

## numpy.array is a function that takes a list as an argument and converts it into an array
array1 = np.array([1,3,5]) 

In [32]:
print(list1)

[1, 3, 5]


In [33]:
print(array1)

[1 3 5]


Let's say we want to add 10 to each element in list1 and array1. You might think of trying:

In [34]:
list1 + 10

TypeError: can only concatenate list (not "int") to list

but evidently that returns an error. Default Python doesn't think you can simply add a single number to a list, because they are incompatible datatypes. Contrast that with numpy:

In [37]:
array1  * 5 + 2

array([ 7, 17, 27])

The same behavior occurs if you were to try things like apply a function to the whole list/array:

In [38]:
def cube_root(input_collection):
    return input_collection**(1/3)

In [39]:
## list version
my_list = [8,27,64]
cube_root(my_list)

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'float'

In [40]:
## array version
my_array = np.array(my_list)
cube_root(my_array)

array([2., 3., 4.])

#### We often call code that operates over the whole array "vectorized". Vectorized code is always faster than loops, because loops are slow in Python.

### Multidimensional Arrays

In [None]:
multi_array = np.array([[1,3,5],[2,4,6],[3,5,7]])
print(multi_array)

#### Example of linear algebra fun that this allows:

In [None]:
multi_array.T

## Key Point 3: The other main purpose of numpy is to provide hundreds of useful functions for numerical computing.

Examples:

In [41]:
print(np.cos(10)) ## cosine function

-0.8390715290764524


In [44]:
np.linspace(0,5,51) ## generate 51 evenly spaced numbers between 0 and 5

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
       1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
       2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
       3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. ])

In [None]:
# use submodule "random" from numpy to generate random numbers 

In [45]:
import numpy.random as rand

for i in range(5): ## 
    print(rand.randint(0,10))

4
8
5
4
9


In [None]:
## demonstate ? here

In [46]:
np.median([1,3,5,7,9])

5.0

# We'll go over more next week. Now, for a checkpoint exercise!