# Project 1 -- Time and Global State

## Instructions

Please read carefully:

* Solve the project yourself. No teamwork.
* If you have questions, please post these in the public channel on Slack. The answers may be relevant to others as well. 
* Feel free to import and use any additional Python packages you need.
* Keep in mind that the correctness of your solution will also be verified on a *different input file*. This means that you are asked to provide an algorithm, not to hardcode the answer. If your solution for a task works only on the provided input (i.e., `sampledb.log` file), but does not work on the held back input, you will get only 50% of the points for that task.
* You are allowed to solve the project using a different programming language. In this case, please send me your full code and instructions how to run it.
* Make sure to fill in your `student_name` in the following block below.

In [66]:
student_name = 'David Mihola' # fill with your student name
assert student_name != 'your_student_name', 'Please fill in your student_name before you start.'
mattrikel_nummer = 12211951

## Setup

In this mini-project, you will use your knowledge of logical clocks to analyse a sample distributed system execution. You are given a sample log file `sampledb.log` containing an event log of five communicating processes: Alice, Bob, Carol, Dave and Eve. The log file format is as follows:
```
(<event name>)\n(<host>) (<local_clock>)
```
The code below installs the utility `gdown` and downloads `sampledb.log`.

In [67]:
# DO NOT CHANGE THESE LINES
!pip install gdown
!gdown https://drive.google.com/file/d/1s7BALY1RQyHjk06Okul7_lwUpoizZVNZ/view?usp=sharing --fuzzy

Defaulting to user installation because normal site-packages is not writeable
Downloading...
From: https://drive.google.com/uc?id=1s7BALY1RQyHjk06Okul7_lwUpoizZVNZ
To: /mnt/sdc3/david/projs/fault_tolerant_systems/sampledb.log
100%|██████████████████████████████████████| 3.09k/3.09k [00:00<00:00, 21.5MB/s]


To inspect the `sampledb.log` file click on the folder icon in your Google Colab called `Files` on the left. 

Examples of events in the log file:
* Event `Making progress` finished on the host `Bob` at its local time 2.
```
Making progress 
Bob {"Bob":2}
```
* Event `Receive event` is a message receive event at the host `Alice` at its local clock time 3. The message comes from host `Bob` sent at its local time 2.
```
Receive event
Alice {"Alice":3, "Bob":2}
```
* Event `Checkpoint` takes place on the host `Carol` at its local time 12.
```
Checkpoint
Carol {"Carol":12}
```

The code below will help you to correctly parse the input file.

In [68]:
# DO NOT CHANGE THESE LINES
import re
import ast

regex = '(.*)\n(\S*) ({.*})'
events = []

with open(f'sampledb.log') as f:
    events = [{'event': event, 'host': host, 'clock': ast.literal_eval(clock)}
               for event, host, clock in re.findall(regex, f.read())]
print('Events:', events)
print('Total number of events:', len(events))

Events: [{'event': 'Init event', 'host': 'Alice', 'clock': {'Alice': 1}}, {'event': 'Init event', 'host': 'Bob', 'clock': {'Bob': 1}}, {'event': 'Send event', 'host': 'Alice', 'clock': {'Alice': 2}}, {'event': 'Making progress', 'host': 'Bob', 'clock': {'Bob': 2}}, {'event': 'Receive event', 'host': 'Bob', 'clock': {'Bob': 3, 'Alice': 2}}, {'event': 'Computing', 'host': 'Alice', 'clock': {'Alice': 3}}, {'event': 'Checkpoint', 'host': 'Alice', 'clock': {'Alice': 4}}, {'event': 'Making progress', 'host': 'Bob', 'clock': {'Bob': 4}}, {'event': 'Init event', 'host': 'Carol', 'clock': {'Carol': 1}}, {'event': 'Init event', 'host': 'Dave', 'clock': {'Dave': 1}}, {'event': 'Init event', 'host': 'Eve', 'clock': {'Eve': 1}}, {'event': 'Send event', 'host': 'Carol', 'clock': {'Carol': 2}}, {'event': 'Checkpoint', 'host': 'Carol', 'clock': {'Carol': 3}}, {'event': 'Send event', 'host': 'Carol', 'clock': {'Carol': 4}}, {'event': 'Send event', 'host': 'Bob', 'clock': {'Bob': 5}}, {'event': 'Receive

In [84]:
# pre process data into better format
!pip install py-linq
from py_linq import Enumerable
events_grouped = Enumerable(events).group_by(["name"], lambda x: x["host"]) # group the events by each person
names_enumerable = events_grouped.select(lambda x: x.key.name) # get the names of each person
names = names_enumerable.to_list()

events_by_names = (events_grouped.select(lambda x: (x.key.name, x.select(lambda y: (y["event"], list(y["clock"].items())))))
                                 .to_list())
events_by_names_ordered_dict = dict(tuple(events_grouped.select(lambda x: [x.key.name, x.select(lambda y: [y["event"], list(y["clock"].items())])
                                                                                        .order_by(lambda y: y[1][0][1])     # events seem to be order, but might not be always the case, order them to be sure
                                                                                        .to_list()])
                                                        .to_list()))
#print(events_by_names_ordered_dict)

while True:
    missplaced_recieve_event = (Enumerable(events_by_names_ordered_dict.values()).where(lambda x: x[0] == "Receive event" and x[1][0][1] <= x[1][1][1])
                                                                                 .select(lambda x: x[1])
                                                                                 .order_by(lambda x: x[1][1])
                                                                                 .first_or_default())
    if missplaced_recieve_event == None:
        break
    
    missplaced_name = missplaced_recieve_event[0][0]
    index = missplaced_recieve_event[0][1] - 1
    noops = missplaced_recieve_event[1][1] - missplaced_recieve_event[0][1] + 1
    timeline = events_by_names_ordered_dict[missplaced_name]
    for i in range(noops):
        timeline.insert(index, ["Noop", [(missplaced_name, index + noops - i)]])
    
    for event in timeline[index + noops:]:
        event[1][0] = (event[1][0][0], event[1][0][1] + noops)

    for name in names_enumerable.where(lambda x: x != missplaced_name):
        timeline = events_by_names_ordered_dict[name]
        for moved in Enumerable(timeline).where(lambda x: len(x[1]) > 1 and x[1][1][0] == missplaced_name and x[1][1][1] > index):
            moved[1][1] = (missplaced_name, moved[1][1][1] + noops)

#events_by_names_ordered_dict.values()
#length = Enumerable(events_by_names_ordered_dict.values()).max(lambda x: print(len(x)))
#length

Defaulting to user installation because normal site-packages is not writeable


TypeError: 'dict_values' object is not subscriptable

## 1 - Visualize Execution [5+ points]

**Your task:** Visualize the execution (similarly to the visualizations in the lecture). The author of the best visualization gets 3 points on top!

In [75]:
### START CODE HERE ###
None
### END CODE HERE ###

## 2 - Count Concurrent Events [5 points]

**Your task**: Count the *total number of unique* concurrent event pairs in the log file.

In [76]:
def count_concurrent_events(events):
  ### START CODE HERE ###
  None
  ### END CODE HERE ###

print('Number of concurrent event pairs:', count_concurrent_events(events))

Number of concurrent event pairs: None


## 3 - Assign Vector Clocks [4 points]

**Your task:** Assign vector timestamps to each event. Annotate the event captions with the corresponting vector timestamp. E.g., 
```
`Dummy event` --> `Dummy event [0,12,2,4,0]`.
```


In [77]:
def assign_vector_timestamps(events):
  ### START CODE HERE ###
  None
  ### END CODE HERE ###

print(assign_vector_timestamps(events))

None


## 4 - Rollback Recovery [6 points]
All events annotated with the `Checkpoint` in the title are checkpointing events. According to the provided log file `sampledb.log`, the hosts Alice, Bob, Carol, Dave and Eve are at their logical time 17, 22, 20, 18 and 17 respectively. Once of a sudden, Bob fails and has to rollback at least to its latest checkpoint.

**Your task:** Write an algorithm to calculate the correct recovery line given one or multiple host failures.

In [78]:
def recovery_line(events, failed_processes):
  ### START CODE HERE ###
  None
  ### END CODE HERE ###

print("Computed recovery line: ", recovery_line(events, ["Bob"]))

Computed recovery line:  None


## 5 - How to Submit Your Solution?
Download your notebook (File --> Download --> Download .ipynb) and send per email to [saukh@tugraz.at](mailto:saukh@tugraz.at).