Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` (you should of course delete `raise NotImplementedError()` which is there only as reminder), while not modifying the other cells (but you should run them to check the output you obtain). Please also fill in your group number, your full names, and your VU IDs below:

In [1]:
STUDENT_FIRST_NAME = "Thu Trang" # e.g. "John" (no "J", no "J.", no "John S.M.")
STUDENT_LAST_NAME = "Luu"  # e.g. "Smith"
VU_ID_NUM = "2695303"          # e.g. "2789012"

---

# Assignment 1

The goal of this assignment is to make you familiar with Python's syntax, data types, and file I/O. This assignment is divided into 8 exercises, which are worth different amount of points (total is 100 points).

We assume that the folder that you work in is the one obtained by unzipping the given ``assignment1.zip`` file and thus has the following structure.

<code>
assignment01.ipynb
data/...
</code>

In particular, the subfolder data should have 16 files.

## Important remarks
- **Working together**: You are meant to work individually on the first three assignments. You can, of course, brainstorm ideas and discuss issues with your fellow students, but you are required to write your solutions individually. 
- **Plagiarism**: All your code will be automatically scanned for plagiarism. Furthermore, using the internet as a passive resource is allowed. This means that you can search for help there and partially copy code, as long as you explicitly acknowledge inside your Jupyter notebook which parts have been copied and from where. 
- **Performance**: You should optimize the computational performance of your functions. Specifically, when grading the assignments, we set a hard limit of 15 seconds for each cell execution. This should be sufficient to cover each of the test cases. Function calls that take longer than 15 seconds will not be awarded any points.
- **Code styling**: Your implementation will not be checked for style. However, we do encourage you to practice good code styling. See, for example, https://docs.python-guide.org/writing/style/.
- **Chronological run**: All outputs should be repeatable by doing one full “chronological” run of the notebook without any manual changes to code blocks, including parameters. (try it yourself by clicking ``Kernel -> Restart and run all``, which should give the result as handed in).
- **Handing in**: Hand in the .ipynb file of your notebook on ``Canvas`` before the assignment deadline. 
- **Other questions**: If you have doubts/questions about the assignments, feel free to ask them in [this discussion thread](https://canvas.vu.nl/courses/60060/discussion_topics/528045) so that everyone can will be able to see them and our answers. 

In [1]:
# Remember to run this cell before any of the following ones

import random
from numpy.testing import assert_equal

## Exercise 1: Counter (5 points)

Given a list of items, return a dictionary where the keys correspond to all unique elements and the corresponding values represent the number of occurences.

**Example**: assume that the following list is given as input:
```
[1, 'a', 2, 'b', 'a', 'c']
```
Then your function should return a dictionary
```
{1: 1, 'a': 2, '2': 1, 'b': 1, 'c': 1}
```
The ordering of the keys does not matter.

Your function will be tested with list of lenght up to 1000000 elements.

In [3]:
def counter(items):
    # empty dict
    result = {}
    
    # check if list is empty
    if len(items) == 0:
        return result
    # https://datagy.io/python-count-occurrences-in-list/#:~:text=The%20easiest%20way%20to%20count,in%20the%20list%20is%20returned.
    else:
        for i in items:
            if i in result:
                result[i] += 1
            else:
                result[i] = 1
    return result

In [4]:
# Test your function on a few possible inputs

assert_equal(counter([]), {})
assert_equal(counter(["a", "b", "c"]), {"a": 1, "b": 1, "c": 1})
assert_equal(counter([1 for _ in range(10_000)]), {1: 10_000})
assert_equal(counter(["a", 0.42, "a"]), {"a": 2, 0.42: 1})
assert_equal(counter([(1, 2), (2, 3), (3, 4)]), {(1, 2): 1, (2, 3): 1, (3, 4): 1})


## Exercise 2: Type specific operations (5 points)

Write a function that performs operations depending on specific types of the input.
- `bool`: return the negation
- `int`: square the number
- `float`: round the number to two decimal places
- `complex`: return the sum of the real and imaginary part
- `string`: make a palindrome, e.g., 'abc' becomes 'abccba'
- `list`: move the first element to the last, e.g., `[1, 2, 3]` -> `[2, 3, 1]`
- `tuple`: replace the first element of the tuple with the last element, e.g., `(1, 'b', [3])` -> `([3], 'b', [3])`
- `set`: remove all elements that are not integers or floats, e.g., `{'1', 2, 'c', 4.222, (5, 67)}` -> `{2, 4.222}`
- `dict`: reverse all (key, value) pairs. If the value is not hashable, then remove the (key, value) pair. You may assume that all keys and values are unique.
- For any other type, return `None`.

Be aware that your function will be tested with inputs of length up to 50000 items.

In [2]:
def hashable(v): # https://stackoverflow.com/questions/3460650/asking-is-hashable-about-a-python-value
    #Determine whether v can be hashed.
    try:
        hash(v)
        return True
    except Exception as ex:
        return False

def operate(item):
    if type(item) == bool:
        return not item
    elif type(item) == int:
        return item**2
    elif type(item) == float:
        return round(item,2)
    elif type(item) == complex:
        return item.real + item.imag
    elif type(item) == str:
        return item + item[::-1]
    elif type(item) == list:
        item.append(item[0])
        del item[0]
        return item
    elif type(item) == tuple:
        list1 = list(item)
        list1[0] = item[len(item)-1]
        return tuple(list1)
    elif type(item) == set:
        remove_items = set()
        for e in item:
            if type(e) != int and type(e) != float:
                remove_items.add(e)
        for e in remove_items:
            item.remove(e)
        return item
    elif type(item) == dict:
        inv_dict = dict()
        for k,v in item.items(): # https://www.geeksforgeeks.org/python-accessing-key-value-in-dictionary/
            if hashable(v):
                inv_dict[v] = k
        return inv_dict
    else:
        return None


In [3]:
# Test your function on some possible inputs

assert_equal(operate(True), False)
assert_equal(operate(99999), 99999**2)
assert_equal(operate(0.987654321), 0.99)
assert_equal(operate(0.9), 0.9)
assert_equal(operate(complex(1, 999)), 1000)
assert_equal(operate('abc'), 'abccba')
assert_equal(operate([1, 2, 3]), [2, 3, 1])
assert_equal(operate((1, 'b', [3])), ([3], 'b', [3]))
assert_equal(operate({'1', 2, 'c', 4.222, (5, 67)}), {2, 4.222})
assert_equal(operate({'a': 1, 'b': [2], 'c': (3,)}), {(3,): 'c', 1: 'a'})


## Exercise 3: coin flips (10 points)

Write a Python function that takes as input a single string of any length describing a sequence of coin flips with possible outcome head (``'H'``) or tail (``'T'``) and returns only the length of the longest coin flip streak. 

The function should return always an integer and, in particular, should return ``0`` if the string is empty. You can check the correctness of your function using some instances of strings provided below with the corresponding correct answers. 

Your function will be tested with string consisting of up to 1 million coin flips. To generate yourself longer strings of length ``l`` with probability of head equal to ``p`` (and tail then equal to ``1-p``), you can use the command 

```coin_flips = ''.join(random.choices(['H','T'], weights = [p,1-p], k = l))```

If you do so, remember to use the command ```random.seed(number)``` to get reproducible results.

In [7]:
def find_longest_streak(coin_flips):
    longest = 1
    temp = 1
    if len(coin_flips) == 0:
        longest = 0
        return longest
    elif len(coin_flips) == 1:
        longest = 1
        return longest
    else:
        for i in range(len(coin_flips)-1):
            if coin_flips[i] == coin_flips[i+1]:
                temp += 1
            else:
                if longest<temp:
                    longest = temp
                temp = 1
    return longest

In [8]:
# Test your function on some possible coin flip sequences

assert_equal(find_longest_streak(""),0)
assert_equal(find_longest_streak("T"),1)
assert_equal(find_longest_streak("HTTHT"),2)
assert_equal(find_longest_streak("TTHTTTHHTHHH"),3)
assert_equal(find_longest_streak("HTHHTTTHTHTHHHTTHHHHHHTHTTTHHTHHHTHTHTHTTTHHHHHTHTHTTH"),6)


## Exercise 4: letter occurrences (15 points)

Your task is to analyze the percentage of times a letter occurs in a given position for the unique words appearing in a given text file. 

This exercise is split in two parts: in part A, you are asked to write a function that extracts unique words from any input text file; in part B, you are asked to write a function that calculate letter occurrence statistics starting from a given input text file.

Note that you can use the Python `string` module (imported below), but *no* other packages for this exercise.

### Part A (5 points)

Write a function that takes a .txt file as input and returns the list of unique words contained in that text. Note that:
- Ignore capitalized letters, namely if a words appears both capitalized (e.g., ``'Until'``) and not capitalized (e.g., ``'until'``), it should appear only once in the final unique word list;
- The text might appear in various paragraphs separated by new lines and/or have empty lines;
- The text has punctuation, which your function should ignore. Specifically, you should ignore all and only the punctuaction returned by ``string.punctuation``
- Related to the previous point, you can assume the text has **no abbreviations** (e.g., *don't*), no **English possessive** (e.g., *John's*) and no **hypened words** (e.g., *full-scale*)
- Numbers should be processed according to the above rules (regardless of whether they appear in time/date/percentages/decimals). For instance, processing the textt ``On June 7th, at 9:00am`` should return the unique words ``['on','june','7th','at','900am']``.

In [9]:
import string

def unique_words_parser(filename):
    # read file
    with open(filename, 'r') as f:
        f_str = f.read()
    
    # make lowercase
    f_str = f_str.lower()
    
    # remove punctuations
    for i in f_str:
        if i in string.punctuation:
            f_str =  f_str.replace(i,"")
    
    # find all words
    word_list = f_str.split()
    
    # find unique words
    unique_word = []
    for i in word_list:
        if i not in unique_word:
            unique_word.append(i)
    
    return unique_word    

In [10]:
# Test your function on some possible input files. 
# Note that only the length of the returned word list is checked.
# The autograder will check the exact word list.

assert_equal(len(unique_words_parser("data/text_input0.txt")), 85)
assert_equal(len(unique_words_parser("data/text_input1.txt")), 149)


### Part B (10 points)

Write a Python function that takes as input 
- a list of words (all small-caps strings)
- a specific word length $l\geq 1$ 
- a specific position $p \in \{0,1,\dots,l-1\}$

and that returns a **dictionary** with 
- as *keys* all possible (small cap) letters that appear in position $p$ at least once across all words of length $l$ in the given list 
- as *values* the corresponding percentages of those words of length $l$ that have that letter in position $p$. The percentages should be rounded to 2 decimal digits. The dictionary should be sorted in decreasing order of frequency. If there are no word of a given length $l$, the function should return an empty dictionary.

Your function will be tested with list of up to 30000 words.

**Example**: Assume the list of word is ```['exam','data','study','year','big','code','raw','python','test','assignment','eye','line','column','row']```.

For $l=1$ and $p = 0$, then the function shoud return `{}` since there are no words of length $1$.

For $l=3$ and $p = 0$, then the function shoud return `{'r': 50.0, 'b': 25.0, 'e': 25.0}` since there four words of length 3 and for those words the initial letter is `r` in 50% of those words, ``b`` 25% of the times and `e` 25% of the times.

For $l=4$ and $p=3$, then the function should return `{'e': 33.33, 'a': 16.67, 'm': 16.67, 'r': 16.67, 't': 16.67}` since there six words of length 4 and the possible last letters all occur once, except `e` which occures twice.

For $l=6$ and $p = 1$, then the function shoud return `{'o': 50.0, 'y': 50.0}` since there two words of length 6 and for those words the second letter is `o` in 50% of those words and `y` 25% of the times.

In [11]:
def letter_occurrences(word_list, word_length, position):
    result = dict()
    length_l = []
    p_letter =[]
    
    # find words of length l
    for i in word_list:
        if len(i) == word_length:
            length_l.append(i)
    # check if no words satisfied
    if len(length_l) == 0:
        return result
    
    # find letter at position p
    for word in length_l:
        if word[position] not in p_letter:
            p_letter.append(word[position])
    
    # calculater percentages
    for letter in p_letter:
        counter = 0
        for word in length_l:
            if letter == word[position]:
                counter+=1
            result[letter] = round(counter/len(length_l)*100,2)
        
    return result

In [12]:
# Check the correctedness your function with the following tests

words = ['exam','data','study','year','big','code','raw','python','test','assignment','eye','line','column','row']

assert_equal(letter_occurrences(words, 1, 0), {})
assert_equal(letter_occurrences(words, 3, 0), {'r': 50.0, 'b': 25.0, 'e': 25.0})
assert_equal(letter_occurrences(words, 4, 3), {'e': 33.33, 'a': 16.67, 'm': 16.67, 'r': 16.67, 't': 16.67})
assert_equal(letter_occurrences(words, 6, 1), {'o': 50.0, 'y': 50.0})
assert_equal(letter_occurrences(words, 7, 2), {})
assert_equal(letter_occurrences(words, 10, 9), {'t': 100.0})

# You can also test your functiong leveraging what you did in part A (assuming you solved it correctly!)

assert_equal(letter_occurrences(unique_words_parser("data/text_input0.txt"), 5, 0),
{'o': 20.0,
 'd': 10.0,
 'f': 10.0,
 'l': 10.0,
 'p': 10.0,
 's': 10.0,
 't': 10.0,
 'v': 10.0,
 'w': 10.0})

assert_equal(letter_occurrences(unique_words_parser("data/text_input1.txt"), 6, 0),
{'c': 26.67,
 'p': 20.0,
 'i': 13.33,
 'd': 6.67,
 'e': 6.67,
 'r': 6.67,
 't': 6.67,
 'u': 6.67,
 'w': 6.67})


## Exercise 5: Integer sequences (15 points) 

For any integer $n \geq 1$ one can compute the sequence obtained iteratively by applying the following rule: $a_0 = n$ and for every $j \geq 0$ 
$$
a_{j+1}=
\begin{cases}
\frac{a_j}{2} & \text{ if } a_j \text{ is even,}\\
3 a_j +1 & \text{ if } a_j \text{ is odd.}
\end{cases}
$$
The iterative procedure should terminate the first time the number $1$ is reached. For instance, for $n = 12$ one obtains the sequence $12, 6, 3, 10, 5, 16, 8, 4, 2, 1$, which consists of $9$ *steps* and has *peak value* $16$.

The [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture) states that these sequences always reach $1$, no matter which positive integer is chosen to start the sequence. Even if it is not proved yet, it has been checked by computer for all starting values up to $2^{68}$.

Write a Python function that imports a file `filename` containing a single line with an unknown number of integers separated by commas and returns a tuple with two numbers $(a, b)$ to be determined as follows:
- $a$ is the number of steps of the *longest* sequence started at the integer values inside the file
- $b$ is the *largest* peak value achieved by any of the sequences started at the integer values inside the file

**Example**: assume the integers 1, 5, 16, 9, 15 are read from the file. The five sequences are:
- 1
- 5, 16, 8, 4, 2, 1
- 16, 8, 4, 2, 1
- 9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1
- 15, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1

and thus the required output should be (19, 160). Indeed, the sequence started at 9 takes 19 steps to reach 1, while the sequence started at 15 reaches the largest peak value at 160. 

Your function will be tested with sets of up to 10000 integers.

In [13]:
def steps_and_peak(filename):
    longest_step = 0
    largest_peak = 0
    seq = []
    
    # read file
    with open(filename, 'r') as f:
        f_str = f.read()
    
    # get number
    numbers = f_str.split(',')
    
    # find sequence
    for i in numbers:
        seq.append(int(i))
        j=0
        while j < len(seq):
            if seq[j] == 1:
                break
            elif seq[j]%2 == 0 and seq[j] > 1:
                seq.append(seq[j]/2)
                j+=1
            elif seq[j]%2 != 0 and seq[j] > 1 :
                seq.append(3*seq[j]+1)
                j+=1

        # find step and peak
        if longest_step < len(seq) - 1:
            longest_step = len(seq) - 1
        if largest_peak < max(seq):
            largest_peak = max(seq)

        # reset seq for next int
        seq.clear()
    
    return (longest_step, largest_peak)

In [14]:
# Test your function using the provided sample input files

assert_equal(steps_and_peak("data/collatz_input0.txt"), (19, 52))
assert_equal(steps_and_peak("data/collatz_input1.txt"), (109, 9232))
assert_equal(steps_and_peak("data/collatz_input2.txt"), (133, 13120))
assert_equal(steps_and_peak("data/collatz_input3.txt"), (201, 1276936))
assert_equal(steps_and_peak("data/collatz_input4.txt"), (246, 8153620))


## Exercise 6: Eratosthenes (15 points) 

A **prime number** (or a prime) is a natural number greater than $1$ that is not a product of two smaller natural numbers. In particular, $1$ is *not* a prime number.

Given two nonnegative integers $0 \leq l \leq u$ and a nonnegative integer $k \geq 0$, write a Python function that calculates the smallest $k$ prime numbers between $l$ and $u$ (and possibly equal to $l$ and $u$) and returns their sum. If there are fewer than $k$ prime numbers between $l$ and $u$, the function should still return the sum of all the primes in that interval.

**Example**: assume that $l=0$ and $u=13$. If $k=0$, trivially the function should return $0$. If $k=4$, then the smallest $k$ primes between $0$ and $13$ are $2, 3, 5, 7$, so the function should return $17$, since $17 = 2 + 3 + 5 + 7$. If $k=9$, then the function should return $41$, since $41 = 2 + 3 + 5 + 7 + 11 + 13$ and there are fewer than $9$ terms in the sum because there are only $6$ prime numbers between $0$ and $13$.

Your function will be tested for values of $u$ up to 1000000 and of $k$ up to 80000.

*Hint*: Your function needs to be able to cope with wide intervals and large values of $u$. Rather than checking individually for each number whether is prime or not, it may be convenient to first find all the prime numbers less than or equal to $l$ by implementing a version of the [Sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes).

In [15]:
def eratosthenes(l, u, k):
    if k == 0:
        return 0
    
    # find all primes <= u
#     https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
    num = range(2,u+1,1)
    index = range(2,int((u)**(1/2))+1,1)
    
    prime_dict = dict.fromkeys(num, True)

    for i in index:
        if prime_dict[i] == True:
            for j in range(i**2,u+1,i):
                prime_dict[j] = False
                
    # get list of all primes between l and u
    prime_list = []
    for key in prime_dict:
        if prime_dict[key] == True and key >= l:
            prime_list.append(key)
    
    # calculate sum of k primess
    return sum(prime_list[0:k])

In [16]:
# Test your function using the provided test cases

assert_equal(eratosthenes(0,13,0), 0)
assert_equal(eratosthenes(0,13,4), 17)
assert_equal(eratosthenes(0,13,9), 41)


## Exercise 7: Bill Splitting (15 points) 

A restaurant bill is given as a text file, where each line corresponds to a dish, the price, and the names of people that took part in eating that dish. For example:
```
Pizza Napolitana, 12.00, Bob, Alice
Mac and Cheese, 9.00, Bob, Charly
```
The goal of this exercise is to write a Python function that splits the bill.

**Assumptions**:
- Dishes may occur more than once on the bill.
- For each dish and corresponding names on a line, all people take equal share in eating.
- All prices are given as numbers with two decimals. Use floats to represent the numbers.

### Part A (5 points)
Write a function that reads the bill file and returns a list of lists. Each inner list contains as first element the dish name, as second element the price, and as remaining elements the names of people eating from the dish. Assuming that we are given the example bill above as text file, then your function should return
```
[['Pizza Napolitana', 12.0, 'Bob', 'Alice'], ['Mac and Cheese', 9.0, 'Bob', 'Charly']]
```

In [17]:
def parse_bill(filename):
    result = []
    
    # read file
    with open(filename, 'r') as f:
        f_list = f.readlines()
    
    for bill in f_list:
        #split by ',' and delete spaces
        temp = bill.strip().split(',')
        
        # remove empty str
#         List comprehensions in 
#         https://note.nkmk.me/en/python-list-clear-pop-remove-del/
        temp = [x for x in temp if x != '']

        # read values, remove unwanted char and store to list
        for i in range(0,len(temp)):
            k=0
            for j in range(len(temp[i])):
                if temp[i][j].isalpha() or temp[i][j].isdigit():
                    k=j
                    break
            temp[i] = temp[i][k:]
            
        temp[1] = float(temp[1])
        result.append(temp)

    return result

In [18]:
# Test your function with the provided test input files

assert_equal(parse_bill("data/billsplit_input0.txt"), 
             [["Etna DOC", 18.22, "Michael"], 
              ["Crescia", 11.6, "Michael"]])

# These tests only verify the length; feel free to modify it accordingly.

assert_equal(len(parse_bill("data/billsplit_input0.txt")), 2)
assert_equal(len(parse_bill("data/billsplit_input1.txt")), 5)
assert_equal(len(parse_bill("data/billsplit_input2.txt")), 8)
assert_equal(len(parse_bill("data/billsplit_input3.txt")), 10)
assert_equal(len(parse_bill("data/billsplit_input4.txt")), 1000)


### Part B (10 points)
Write a function that reads the bill file and returns a dictionary. The dictionary keys are names and the corresponding value should indicate the money that they have to pay. Make sure that all values are rounded down to two decimal numbers. 

We recommend you to make use of the parsing function that you wrote in **part A**. Be aware that your function will be tested on vary large test files.

**Example**: Assume that you are given the bill shown earlier:
```
Pizza Napolitana, 12.00, Bob, Alice
Mac and Cheese, 9.00, Bob, Charly
```

Your function should output the following:
```
{'Bob': 10.5, 'Alice': 6.0, 'Charly': 4.5}
```


In [19]:
def split_bill(filename):
    # read bills
    bill_list = parse_bill(filename)
    
    # find all name involve
    name_list = []
    for bill in bill_list:
        for i in range(2,len(bill)):
            if bill[i] not in name_list:
                name_list.append(bill[i])
                
    # calculate amount and store to dict                
    split_dict = dict.fromkeys(name_list,0)
        
    for bill in bill_list:
        amount = bill[1]/len(range(2,len(bill)))
        for i in range(2,len(bill)):
            split_dict[bill[i]] += amount
    
    for k in split_dict:
        split_dict[k] = round(split_dict[k],2)
        
    return split_dict

In [20]:
# Test your function with the provided test input files

assert_equal(split_bill("data/billsplit_input0.txt"), 
            {'Michael': 29.82})

assert_equal(split_bill("data/billsplit_input1.txt"), 
            {'Jamie': 24.31,
             'Victoria': 10.64,
             'Allison': 10.64,
             'Amanda': 2.42,
             'Samuel': 2.42,
             'Peter': 25.88,
             'Lisa': 10.37})

# These tests only verify the length (i.e., number of people); feel free to modify it accordingly.

assert_equal(len(split_bill("data/billsplit_input0.txt")), 1)
assert_equal(len(split_bill("data/billsplit_input1.txt")), 7)
assert_equal(len(split_bill("data/billsplit_input2.txt")), 18)
assert_equal(len(split_bill("data/billsplit_input3.txt")), 45)
assert_equal(len(split_bill("data/billsplit_input4.txt")), 142)


## Exercise 8: Dice math (20 points)

In this exercise you will work with .txt files where a number of dice are drawn using characters as follows

```
+-------+ 
|       |
|   O   |
|       |
+-------+

+-------+
| O     |
|       |
|     O |
+-------+

+-------+
|     O |
|       |
| O     |
+-------+

+-------+
| O     |
|   O   |
|     O |
+-------+
        
+-------+
|     O |
|   O   |
| O     |
+-------+
        
+-------+
| O   O |
|       |
| O   O |
+-------+
       
+-------+
| O   O |
|   O   |
| O   O |
+-------+
       
+-------+
| O   O |
| O   O |
| O   O |
+-------+

+-------+
| O O O |
|       |
| O O O |
+-------+
```

In particular note that the value is drawn using the letter ``O``. Note that any such .txt input file might have no die as well as up to 25 dice. As illustrated above, some of the dice can have different orientations. Multiple dice do *not* necessarily appear in distinct rows of the file (open some of the provided .txt input files to get a feeling), but they never overlap. 

### Part A (5 points)
Write a function that a .txt file containing the representation of a set of dice and returns only the nonnegative integer number equal to the total sum of the values of the dice.

In [21]:
def dice_sum(filename):

    with open(filename, 'r') as f:
        f_str = f.read()
        
    sum_dice = 0
    for i in f_str:
        if i == 'O':
            sum_dice += 1
        
    return sum_dice

In [22]:
# Test your function with the provided test input files

assert_equal(dice_sum("data/dicemath_input0.txt"), 39)
assert_equal(dice_sum("data/dicemath_input1.txt"), 19)
assert_equal(dice_sum("data/dicemath_input2.txt"), 0)
assert_equal(dice_sum("data/dicemath_input3.txt"), 16)


### Part B (15 points)

Write a function that reads the .txt file containing the representation of a set of dice and returns only one (possibly negative) integer equal to the difference between the sum of the the odd dice minus the sum of the even dice. 

In [23]:
def dice_diff(filename):
    
    DICE_WIDTH = 9
    DICE_HEIGHT = 5
    
    with open(filename, 'r') as f:
        f_list = f.readlines()
        
    dice_face = [1,2,3,4,5,6]
    occurence_dict = dict.fromkeys(dice_face, 0)
    
    for i in range(len(f_list)-DICE_HEIGHT):
        for j in range(len(f_list[i])-DICE_WIDTH):
             if (f_list[i][j] == f_list[i+(DICE_HEIGHT-1)][j] == 
                 f_list[i][j+(DICE_WIDTH-1)] == f_list[i+(DICE_HEIGHT-1)][j+(DICE_WIDTH-1)] == '+'
                 and f_list[i][j+1:j+(DICE_WIDTH-1)] == 
                     f_list[i+(DICE_HEIGHT-1)][j+1:j+(DICE_WIDTH-1)] == "-------"
                 and f_list[i+1][j] == f_list[i+2][j] == f_list[i+3][j] == 
                     f_list[i+1][j+(DICE_WIDTH-1)] == f_list[i+2][j+(DICE_WIDTH-1)] == f_list[i+3][j+(DICE_WIDTH-1)] == '|'):
                    
                if ((f_list[i+1][j+2] == f_list[i+2][j+2] == f_list[i+1][j+6] == 
                     f_list[i+2][j+6] == f_list[i+3][j+2] == f_list[i+3][j+6] =='O' 
                    and f_list[i+1][j+4] == f_list[i+2][j+4] == f_list[i+3][j+4] != 'O') or
                    (f_list[i+1][j+2] == f_list[i+1][j+4] == f_list[i+1][j+6] == 
                     f_list[i+3][j+2] ==  f_list[i+3][j+4] == f_list[i+3][j+6] == 'O'
                    and f_list[i+2][j+2] == f_list[i+2][j+4] == f_list[i+2][j+6] != 'O')):
                        occurence_dict[6] += 1 
                if ((f_list[i+1][j+2] == f_list[i+2][j+4] == f_list[i+3][j+6] == 'O'
                    and f_list[i+1][j+4] == f_list[i+1][j+6] == f_list[i+2][j+2] == 
                        f_list[i+2][j+6] == f_list[i+3][j+2] == f_list[i+3][j+4] != 'O') or 
                    (f_list[i+1][j+2] == f_list[i+1][j+4] == f_list[i+2][j+2] == 
                     f_list[i+2][j+6] == f_list[i+3][j+4] == f_list[i+3][j+6] != 'O'
                     and f_list[i+1][j+6] == f_list[i+2][j+4] == f_list[i+3][j+2] == 'O')):
                        occurence_dict[3] += 1
                if ((f_list[i+1][j+2] == f_list[i+3][j+6] =='O' 
                    and f_list[i+1][j+4] == f_list[i+1][j+6] == f_list[i+2][j+2] == 
                        f_list[i+2][j+4] == f_list[i+2][j+6] == f_list[i+3][j+2] == f_list[i+3][j+4] != 'O') or 
                    (f_list[i+1][j+2] == f_list[i+1][j+4] == f_list[i+2][j+2] == 
                     f_list[i+2][j+4] == f_list[i+2][j+6] == f_list[i+3][j+4] == f_list[i+3][j+6] != 'O' 
                     and f_list[i+1][j+6] == f_list[i+3][j+2] == 'O')):
                        occurence_dict[2] +=1
                if (f_list[i+1][j+2] == f_list[i+1][j+6] == f_list[i+3][j+2] == f_list[i+3][j+6] == 'O' 
                    and f_list[i+1][j+4] == f_list[i+2][j+2] == f_list[i+2][j+4] == 
                        f_list[i+2][j+6] == f_list[i+3][j+4] != 'O'):
                        occurence_dict[4] +=1
                if (f_list[i+1][j+2] == f_list[i+1][j+6] == f_list[i+2][j+4] == 
                    f_list[i+3][j+2] == f_list[i+3][j+6] =='O' 
                    and f_list[i+1][j+4] == f_list[i+2][j+2] == f_list[i+2][j+6] == f_list[i+3][j+4] != 'O'):
                        occurence_dict[5] +=1
                if (f_list[i+1][j+2] == f_list[i+1][j+4] == f_list[i+1][j+6] == f_list[i+2][j+2] == 
                    f_list[i+2][j+6] == f_list[i+3][j+2] == f_list[i+3][j+4] == f_list[i+3][j+6] != 'O' 
                    and f_list[i+2][j+4] == 'O'):
                        occurence_dict[1] +=1
                        
    sum_even = 0
    sum_odd = 0
    for k in occurence_dict:
        if k%2 == 0:
            sum_even += occurence_dict[k]*k
        else:
            sum_odd += occurence_dict[k]*k
    
    return sum_even - sum_odd

In [24]:
# Test your function with the provided test input files

assert_equal(dice_diff("data/dicemath_input0.txt"), -3)
assert_equal(dice_diff("data/dicemath_input1.txt"), -15)
assert_equal(dice_diff("data/dicemath_input2.txt"), 0)
assert_equal(dice_diff("data/dicemath_input3.txt"), 8)
