# Tutorial 12: Lists and Design Patterns

## PHYS 2600

## T12.1 - Design patterns for lists

As we discussed in lecture, a __design pattern__ is a more abstract version of an algorithm.

Patterns are especially important to know when dealing with lists.  In human language, we can easily describe operations like "take everything in the list and square it", but for a computer we have to spell out the steps using a loop!

_(Side note: if this was a formal comp sci class, all of these below would probably just be called examples of the "iterator" pattern - but here I think it's important to see the variations on them explicitly.)_

### Part A: The iterator

The __iterator pattern__ is the foundation of all the other ones below: it occurs when we want to step through a list of objects, item by item, and perform some sort of operation using each item.  To implement the iterator for a list, we require an _index variable_, `i`, which will store a valid index in our list.  We then use a loop to perform an operation with the list element at `i`, and then increment `i` by one to move on to the next item.

Here's an explicit example like the one from lecture, with a `while` loop that simply prints every element of a list:


In [None]:
i = 0  # Index variable
L = ['apple', 'banana', 'carrot', 'dragon fruit']

while i < len(L):
    print("Entry %d in list L is: " % i, L[i])
    
    i += 1  # Increment index

In Python, we have a shortcut to using this pattern: the `for` loop!  We can iterate over a list without ever having to define an index, as in the following cell:

In [None]:
for thing in L:
    print(thing)

The price of using `for` is that we're missing one thing: before we were printing out the _location_ of each entry in the list, using `i`.  With a `for` loop, no index variable is available, so we lose that information.

Fortunately, there is a middle ground: we can loop over the values of `i` using a `for` loop!  In the cell below, __use a `for` loop with the `range` function__ to match the output of the `while` loop above.  (Remember, `range()` works just like `np.arange()`.)

In [None]:
for i in range(len(L)):
    print("Entry %d in list L is: " % i, L[i])


### Part B: The accumulator

We've seen the __accumulator pattern__ once before, on tutorial 10 when trying to sum up a sequence of numbers.  The basic idea is that we define an _accumulator variable_, which begins at zero, at the start; inside of our loop, we then perform the operation

```python
accumulator = accumulator + current_term
```

For example, if we are just adding up the entries of a list `L`, here is what will be in `accumulator` after each of the first few loops:

```python
accumulator = 0
accumulator = L[0]
accumulator = L[0] + L[1]
accumulator = L[0] + L[1] + L[2]
...
```


so that at the end, the accumulator variable will contain the result `first_term + second_term + ... + last_term`.  
Note that it's crucial we define the accumulator _outside_ the loop!  (If we had `accumulator = 0` _inside_ the loop, then it would be reset to zero every time through.)

More concretely, here is a `while` loop that uses the accumulator pattern to add up the first `N` powers of two, __starting at two__:

```python
def sum_powers_of_two(N):
    i = 1  # Iterator
    total = 0  # Accumulator
    
    while i <= N:
        current_term = 2**i
        total += current_term
        i += 1
        
    return total
```

(I used `total` for my accumulator variable name because it's easier to type than `accumulator`!)  In the cell below, __use a `for` loop instead of a `while` loop__ to implement this function.  

_(Hint: as above, you can use `range()` to make a list of `i` values, and then compute `2**i` inside your loop.  Be careful with what is included in your range - we want to include $2^N$ in our sum!)_

In [None]:
def sum_powers_of_two(N):
    #

In [None]:
## Testing print statements - check them!

print(sum_powers_of_two(1))  # 2
print(sum_powers_of_two(3))  # 2+4+8 = 14
print(sum_powers_of_two(4))  # 2+4+8+16 = 30

### Part C: The accumulator and other objects

When working with lists, we very often want to carry out some operations on the entries and produce a _new_ list.  The accumulator pattern gives us a way to do that, as well!  The idea is the same as above: we start with an empty accumulator, and add the results one at a time in a loop.  In this case, the "empty accumulator" is the empty list `[]`, and we add by concatenating to it!  (We could do something similar with other kinds of objects, like strings for example.)

Here's an alternative implementation of the `copy()` function acting on a list, using the accumulator pattern:

In [None]:
def copy_list(L):
    new_list = []  # Accumulator
    i = 0  # Index
    
    while i < len(L):
        new_list.append(L[i])
        i += 1
        
    return new_list


copy_list([4,5,6])

(Note that we could also have written the accumulation step as `new_list += [ item ]`; we make a new one-element list for `item` because we can only add lists to lists.  But the `.append` version is easier to read, isn't it?)

Now you try it!  In the cell below, __implement the function `double_list_items(L)`__, which should return a new list which consists of the elements of `L` in the same order multiplied by 2.  Once again, __use a `for` loop instead of a `while` loop__ in your solution.

In [None]:
def double_list_items(L):
    #
    

In [None]:
## Testing print statements - check them!

print(double_list_items([1,2,4]))  #  Should print [2, 4, 8]
print(double_list_items(['x','y','z'])) # Should print ['xx', 'yy', 'zz']

L3 = [1.1, 2.2, 3.3]
print(double_list_items(L3)) # Should print [2.2, 4.4, 6.6]
print(L3) # Should print [1.1, 2.2, 3.3] - the original shouldn't change!

### Part D: The sentinel

Instead of operating on an entire list in a uniform way, sometimes we want to do just the opposite: locate a particular entry inside of a list.  In this situation, we can use the __sentinel pattern__.  This is another variation on the iterator, in which we define a _sentinel value_ that we are interested in, _prior to starting the loop_.  The execution of the loop is then changed in some way if the sentinel value is encountered.

For example, here's a function that finds the position of the first occurence of the number 3 in a list:

In [None]:
def find_index_of_first_three(L):
    sentinel = 3        
    i = 0
    
    while i < len(L):
        if L[i] == sentinel:  # Check sentinel value and branch
            return i
        
        i += 1
        
    # If we got through the whole list without finding 3, return None
    return None

print(find_index_of_first_three([1,2,3,4,5]))
print(find_index_of_first_three([1,2,4,5]))

By the way, the simplest tasks we might use the sentinel pattern for are __finding if a value occurs in a list__, and if so, __finding where a value occurs, i.e. the index of that value__.  These are actually provided by two built-in Python functions: the `in` operator, and the `.index()` list method.

In [None]:
print(3 in [1,2,3,4,3])      # True if 3 is anywhere in the list, False otherwise.
print([1,2,3,4,3].index(3))  # Note: only finds *first* index from the left.

Your turn to try!  __Implement the function `first_entry_appears_twice(L)`__, which will use the _first entry_ `L[0]` of the list `L` as a sentinel value, looping through the _rest_ of the list to see if it appears a second time or not.  Your function should return `True` if `L[0]` appears again later in the list, and `False` otherwise.

In [None]:
def first_entry_appears_twice(L):
    #

In [None]:
## Testing print statements - check them!

print(first_entry_appears_twice([0,0,1,0,0,1]))  # True
print(first_entry_appears_twice([1,2,4,1]))      # True
print(first_entry_appears_twice(['x','y','z']))  # False

Note that the `break` and `continue` loop control keywords commonly appear with the sentinel pattern!  In this case, because we're writing a function, we can use `return` in place of a `break` -  it terminates the loop as soon as the sentinel condition is met.

All of the above patterns can occur in different forms, and even in combination with each other!  For example, we could combine the accumulator and the sentinel to count how many vowels are in a string: the accumulator to store a count, and the sentinel values `vowels = ['a','e','i','o','u']` to check against on every iteration.  Any pattern works just as well with `for` or `while`, with some minor structural differences in the code.

### Part E: A warning about loops and speed

Any of the functions we just implemented using explicit loop control could also have been written using NumPy functions and arrays instead.  When possible, we should usually prefer NumPy arrays to lists; as I've mentioned, NumPy is constructed to do array operations _very_ efficiently, so there's a big boost in performance we can gain!

Let's see an example.  For a list of numbers, the `double_list_items()` function you wrote above is just like multiplying a NumPy array by two:

In [None]:
import numpy as np

print(double_list_items(range(5)))
print(2*np.arange(5))

Although both versions work, we can ask what the _computational efficiency_ is: on our current computer, which can produce a fixed number of calculations per second, how long does one version take versus the other?

Test these with the `%timeit` command. (Reminder: `%timeit`  runs a function repeatedly, and gives a mean + standard deviation estimate for how long it takes to execute each time.  Running repeatedly is good for consistency, since the computer is doing lots of things at once, so there may be unpredictable slowdowns from one function call to another.)

__Run the cell below__ to see how your function stacks up against NumPy!

In [None]:
%timeit double_list_items(range(10000))
%timeit 2*np.arange(10000)

On my computer, our custom function took 785 microseconds to run, and the NumPy verison took 7.6 microseconds - about a _factor of 100_ speed-up!

__Why is there such a huge difference?__ In short, NumPy comes with a large amount of pre-built _compiled_ code, which means the loop required to deal with the whole array is executed immediately as machine code.  When we wrote a `for` loop in Python, the Python interpreter is producing instructions for the machine _every time through the loop_ - and with 10000 loops, the cost of running the interpreter over and over adds up quickly!

We're not going to spend a huge amount of time on optimization of code in this class: this is actually a perfect example of why.  Although the code we wrote is 100x slower, 785 _microseconds_ is still not much time to wait for the answer - so _it doesn't matter!_  __Optimization should only be confronted if the time it takes your code to execute is too slow to deal with.__

That being said, _when possible you should use NumPy instead of Python `for` loops_, because the difference is so substantial and we're often using NumPy anyways.  

So why are we bothering to learn `for` loops at all?  There are a number of __iterative algorithms__, like the square root one we saw awhile ago, where we _can't_ just deal with whole NumPy arrays at once: each step depends on the outcome of the previous step.

## T12.2 - More list practice

First, __run the cell below__ to load some pre-made lists.  You'll then be manipulating the lists in various ways.  If you accidentally mangle one of the lists below, remember you can just come back and re-run this initialization cell.

In [None]:
# Initialize some lists for below
# Re-run me to reset the lists, if needed!

fundamental_forces = ['electromagnetic', 'strong nuclear', 'weak nuclear']
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto']
planet_moons = [0, 0, 1, 2, 79, 61, 27, 14, 5]

# Each position measurement is a list of the form [x, y, z]
positions = [
    [0.0, 0.0, 2.7],
    [1.0, 2.5, 2.3],
    [2.0, 4.6, 1.9],
    [3.0, 6.3, 1.5],
]

### Part A

Add `'gravity'` to the end of the list `fundamental_forces`.  Then, use a `for` loop to print out each entry in `fundamental_forces` on a separate line.

In [None]:
#

### Part B

Remove `'Pluto'` from the list `planets`.  (Sorry, Pluto!)

_(Once you've solved this, try running the cell again a couple of times, and notice what happens...)_

In [None]:
#

### Part C

Use a `for` loop to run through the list of planets, and print out the name and the number of moons (from `planet_moons`) for each.  For example,

```python
"Mars has 2 moons."
```

(You don't need to include Pluto, since you just deleted it!)

_(Hint: the iterator pattern also provides a nice way to run through a pair of matching lists of the same length, like `planets` and_ `planet_moons` _here.)_

_(__Optional challenge:__ can you fix up your code so that it prints "moon" and not "moons" for the Earth?  Better yet, can you do it without writing more than one `print()` statement?)_

In [None]:
for i in range(len(planets)):
    planet = planets[i]
    moon_num = planet_moons[i]
    if (moon_num == 1):
        moon_str = "moon."
    else:
        moon_str = "moons."

    print(planet, "has", moon_num, moon_str)

### Part D

From the 'list of lists' `positions`, which consists of smaller lists of coordinates `[x,y,z]`, produce a single list `y_pos` containing all of the y-coordinate entries (middle entry in each sub-list.)

_(Hint: The accumulator pattern will be useful here.)_

In [None]:
#