# ULAB Physics & Astronomy: Python Module IV

In [1]:
%load_ext autoreload
%autoreload 2
import tests

#### Start this module early! It helps to thoroughly understand each part before moving forward!

## Imports and Installation

`import` tells Python to get code from somewhere else - either the directory you're already in, or a predetermined path where packages go. Importing allows you to build on code someone else wrote and avoid rewritting code that is already well established (e.g. the ```exp``` function in the built-in ```math``` module).

### Local Imports

To start we'll import code from from the file `utils.py` which is located in the local directory (the folder this Jupyter notebook is located). It's also possible to specify imports from a subdirectory or from one level 'up' from where we are, but we won't discuss that at the moment.

This file contains some functions, of which we only care about two for now: `fibonacci` and `factorial`. Note that both of these functions have docstrings!

In [2]:
import utils

We now have these two functions accessible to us under the `utils` namespace. In order to reference the `fibonacci` or `factorial` functions we have to specify that they from `utils` using the syntax `utils.function_name`. (This is so that there is no ambiguity with two modules with the same function name!)

In [3]:
utils.factorial(3)

6

Read the docstring of the ```fibonacci``` and ```factorial``` functions! (Hint: select shift-tab-tab with your cursor located ```utils.fibonacci(|)```). You can alternatively use the `help` function: `help(utils.factorial)`.

In [None]:
utils.fibonacci()

It's cumbersome to keep specifying `utils`, so we can also import functions directly and call them without referencing their namespace.

In [None]:
from utils import fibonacci, factorial

# if "x is y", then x and y refer to the same object
factorial is utils.factorial

### External imports

This is when we're importing code from somewhere other than our local directory. Python has a hidden path where it stores all of its packages, but we won't interact with that: instead, we'll interact with `pip`, a tool we've been using to get the utilities to run these modules!

As long as a Python package is registered on `pip`, we can install it by running `pip install <package_name>` from terminal, or in a Jupyter notebook if preceded by an exclamation mark:

In [None]:
!pip install pandas

This allows us to import the package `pandas` and work with it in the exact same way we were working with `utils` above. Note that sometimes we give packages shorter names while we're working with them ("aliases") so that we can make our code more brief: in this case, `pandas` is given the alias `pd`.

In [None]:
# We'll discuss the pandas package soon!
# If you have pandas installed, this line will run without error!
import pandas as pd

We can also `uninstall` packages, but that can't be done from a Jupyter notebook. Ideally you should be running `pip` commands from the terminal. For a full list of `pip` functionality, see [here](https://pip.pypa.io/en/stable/user_guide/).

## Question 0 (10 pts)

Using `pip`, install the package `numpy`. Then import `numpy` and give it the alias `np`. 

In [None]:
#use this cell for installation

In [None]:
#use this cell for importing

`numpy` also has a factorial function. Read the `numpy` documentation to determine how to access this function. Finally, run both this function and `util.factorial` for some input, and show that you get the same result!

In [None]:
f = ...
print(...)
print(...)

## Lists and list comprehensions

As we saw in lecture, we can put any kind of data into a list by placing square brackets `[]` around it, and we can *index into* it by specifying the position of the element we want.

In [None]:
letters = ['a', 'b', 'c', 'd']
letters[2]

Use the cell below to display 'd'. What does letters[2] display? What does this tell you about the indexing of lists? 

In [None]:
#display 'd'

Lists have their own notion of "adding" and "multiplying": you can add lists to one another, and you can multiply a list by a number to repeat it.

In [None]:
more_letters = ['f', 'g', 'f', 'g', 'h']
print("Added lists:       ", letters + more_letters)
print("Multiplied lists:  ", letters * 3)

In [None]:
#use this cell to display 'hihihihihi'

Note that these operations don't change the original lists; if we want to do that, we can either assign the result to the original name like we would any other variable, or we can use certain in-built list methods. We'll cover four of these methods: `append`, `extend`, `remove`, and `index`.

1. `append` adds an item to the end of a list.
2. `extend` puts all the elements of another list onto a list
3. `index` finds the position of an element we want in the list.
4. `remove` removes an item from a list.

Let's look at how to use all four!

In [None]:
letters = ['a', 'b', 'c', 'd']
print(letters)

In [None]:
# add 'e' to the existing letters list
letters.append('e')
print(letters)

In [None]:
# place all the elements of more_letters list at the end of the `letters` list
letters.extend(more_letters)
print(letters)

In [None]:
# find the first instance of the letter 'f' in letters
idx = letters.index('f')
print(idx)

In [None]:
# remove the first instance of the letter 'f' in 'letters'
letters.remove('f')
print(letters)

Another useful feature is we can check whether a list has a certain value with the keyword `in`:

In [None]:
print('d' in letters)
print('q' in letters)

#### Feel free to play around with the cells above! Change the lists and the elements and make sure your understanding of list operations is clear before you move on.

## Question 1 (10 pts)

Write a function `remove_second` that takes in a list `lst`, and returns a version of `lst` with the _second_ instance of the item `item` removed. (Hint: find the first instance, then only look after that.) If `lst` does not have two instances of `item`, raise a `ValueError`.

For example, 
`remove_second([1, 2, 3, 3, 4, 3, 3], 3)` returns `[1, 2, 3, 4, 3, 3]` while `remove_second([1, 2, 3, 3, 4, 3, 3], 2)` raises a `ValueError`

In [None]:
def remove_second(lst, item):
    ...
    return ...

In [None]:
tests.run('test_1', remove_second)

## List Comprehensions

Another useful construction is the *list comprehension*, which allows us to create a list by performing some operation to every element of an initial list or `range`. For example, this is a list comprehension that determines the square of every number from 0-9:

In [None]:
#note how this cell and the following cell print the same output
squares1 = []
for x in range(10):
    squares1.append(x**2)
print(squares1)

In [None]:
squares = [x ** 2 for x in range(10)]
print(squares)

We can also put a conditional statement in a list comprehension! For example, here we use it to get only the squares of only the odd integers 0-9:

In [None]:
odd_squares = [x ** 2 for x in range(10) if x % 2 == 1]
print(odd_squares)

Note that if you find yourself writing `[x for x in ...]` it's simpler to just write `list(...)`:

In [None]:
print([x for x in range(5)])
print(list(range(5)))

## Question 2 (5 pts)
The function `ord` converts characters to numbers with an offset such that ord('a') = 97, ord('b') = 98, etc. Write a list comprehension that uses `ord` and returns the numeric conversion of the English vowels (a, e, i, o, u) **without the offset** ('a'=1, 'b'=2, etc).

In [None]:
#use this cell to explore how ord works!

In [None]:
vowels_to_numbers = ...
print(vowels_to_numbers)

In [None]:
tests.run('test_2', vowels_to_numbers)

## Question 3 (5 pts)

We've now got the tools to do a bit of real-life data processing! The function `get_planets` in `utils` (which we already imported!) downloads the catalog of known exoplanets and exoplanet candidates from NASA's Exoplanet Archive.

In [None]:
# this cell might take 10 seconds or so, as it's getting the data from the NASA website
planets = utils.get_planets()
planets_df = pd.read_csv(planets) # remember when we did "import pandas as pd" above? it's back!

In [None]:
planets_df

This is a `pandas.DataFrame` which is beyond our scope for now. What we care about is the columns in this table, which is just a list of strings:

In [None]:
cols = list(planets_df.columns)
cols

This tells us that the data we have includes a number of parameters, and their associated errors. Suppose we're only interested in the errors on planet parameters, i.e. the column names that start with `koi` and end with `err` (there's also `err1` and `err2` in some of these, but let's ignore that.) 

In the following cell, generate a list of column names `planet_errs` that satisfy the condition above.

In [None]:
# what condition do you need to meet?
# remember that you can index into strings just like lists! 
#'hello'[:2] = 'he'
planet_errs = ...
print(planet_errs)

In [None]:
tests.run('test_3', planet_errs)

## Question 4 (5 pts)

Now let's also take a look at the actual data! The column `koi_prad` contains the planet radii, in units of Earth radii. How many planets here are smaller than or the same size as the Earth? The cell below provides you with a list of `float`s, each of which is one planet's radius. Try and do this in a list comprehension!

In [None]:
# this generates the list of radii
# you do not have to edit, but try and figure out how it works!
raw_radii = planets_df.koi_prad.values
radii = list(raw_radii[~np.isnan(raw_radii)])

In [None]:
# count the number of planets with radius <= 1 earth radius
small_planets_count = ...

In [None]:
tests.run('test_4', small_planets_count)

## Dictionaries

Dictionaries encode mappings between "keys" and "values":

In [None]:
d = {"a" : 1, "b" : 2}

We can add an item to a dictionary:

In [None]:
d["c"] = 3

We can access the value by querying the dictionary with the key:

In [None]:
d["a"] + d["b"]

The full list of functions that can be used with dictionaries can be found [here](https://docs.python.org/3/tutorial/datastructures.html#dictionaries). A noteworthy feature is that we can define a dictionary using the syntax of list comprehensions. Below, we map the key $x$ to it's square  $x^2$.

In [None]:
squares_lookup = {x : x ** 2 for x in range(10)}
squares_lookup

One of the most common uses of dictionaries is to save results of previous computations that we might use again. For example, let's revisit the Fibonacci computation! The following function gets the $n$th Fibonacci number, but slowly (it's recursive!).

In [None]:
def fib_slow(n):
    if n < 2:
        return n
    return fib_slow(n - 1) + fib_slow(n - 2)

In [None]:
fib_slow(8) == utils.fibonacci(8)

Why did we call this `fib_slow`? Well, just look at how long this takes:

In [None]:
fib_slow(35)

## Question 5 (15 pts)

Let's try and speed this up! One key feature of the Fibonacci computation is it depends only on a few previous results: that is, if you know `fib(33)` and `fib(34)`, you can get `fib(35)` without having to calculate`fib(1)`...`fib(35)`, which is what `fib_slow` is doing right now. We'd like to add **memory** to our computation.

Let's examine an outline for such a function `fib_fast` with a dictionary of `fib_values`:

- if the dictionary `fib_values` contains the key $n$:
  - look up `fib(n)` in `fib_values`
  - return the result
- otherwise:
  - compute `fib_fast(n)` as in `fib_slow`
  - save the result to `fib_values`
  - return the result

Syntax tip: dictionaries have the same `item in dictionary` syntax as lists, to check whether a key is present in the dictionary.

Hint: each bullet point corresponds to a single line of code.

In [None]:
fib_values = {0: 0, 1: 1}

def fib_fast(n):
    ...
    return 

Now, run the cells below: the cell with `fib_fast(35)` should still be slow, because we haven't saved anything yet, but then we see that `fib_values` has been filled out because we ran `fib_fast(35)`! So in the third cell, `fib_fast(37)` is only doing a little bit of extra work, and so it's much faster!

In [None]:
fib_fast(35)

In [None]:
fib_values

In [None]:
fib_fast(37)

## Submission

Check to make sure that you have answered all questions. Run all the cells so that all output is visible. Finally, export this notebook as a PDF (File/Download As/PDF via LaTeX (.pdf)) and submit to bCourses.

<b>References:</b> Based off of work inspired by <i>Computational and Inferential Thinking</i> and the Data 8 course material: Professors Ani Adhikari, John DeNero, and the Data 8 staff. Edited and complied by the ULAB staff. 

1. https://dbader.org/blog/python-memoization
2. https://dfm.io/posts/exopop/

Last Updated: July 2021