# Engineering Notebook: Design Exercise 3 Scale Models and Logical Clocks Simulation

### Start

In this project, we built a simulation of a small, asynchronous distributed system running on a single machine. The goal was to model multiple virtual machines that operate at different speeds and use Lamport logical clocks to maintain event ordering. Each machine:
- Picks a random clock rate between 1 and 6 ticks per second.
- Has its own message queue for incoming messages.
- Updates its Lamport clock based on local events, sends, and receives.
- Logs every event with details like system time, logical clock value, and current queue length.

This simulation helps us understand how causal ordering works in distributed systems, even when the machines run at different speeds.


## Implementation Details

### Virtual Machines (Processes)

- **Thread-Based Simulation:**  
  Each virtual machine is implemented as a separate Python thread. We found that using threads made it straightforward to simulate independent machines without dealing with the complexities of real network communication.

- **Clock Rate:**  
  Every machine randomly selects a tick rate (1–6 ticks per second). The tick duration (1 divided by the tick rate) controls how fast the machine runs its loop.


### Logical Clocks and Message Passing

- **Message Queues:**  
  We used Python’s queue.Queue to create an individual message queue for each machine. This approach effectively simulates asynchronous communication between the virtual machines.

- **Lamport Clock Rules:**  
  - **Internal Events:** The machine simply increments its logical clock by 1.  
  - **Send Events:** The machine increments its clock by 1, sends a message (containing the current clock value) to the target machine, and logs the event.  
  - **Receive Events:** When a message is received, the machine updates its clock to `max(local_clock, received_timestamp) + 1` and logs the receipt.


### Event Handling and Logging

- **Event Selection:**  
  During each tick, the machine first checks its message queue:
  - If a message is waiting, it processes that message.
  - If the queue is empty, it randomly decides whether to send a message (to one or both of the other machines) or to perform an internal event.

- **Logging:**  
  Every time an event occurs, the machine writes a log entry. Each log entry contains:
  - The event type (SEND, RECEIVE, INTERNAL)
  - The current system time (using `datetime.now()`)
  - The current logical clock value
  - Additional details (for example, which machine the message was sent to or where it came from)
  - The length of the message queue at that moment


### Stopping the Simulation

The simulation runs for a predetermined duration (for example here is 60 seconds). Once this time is up, a shared stop event is triggered, all machine threads wrap up their tasks, close their log files, and then the program exits.


## Design Decisions

- **Using Threads and Queues:**  
  We chose Python’s threads and in-memory queues because they are simple to work with and let us simulate a distributed system without the complexity of real network communication.

- **Implementing Lamport Clocks:**  
  The classic Lamport clock mechanism discussed in lecture was implemented since it is a fundamental concept in distributed systems. It helps us understand how to maintain event ordering without relying on physical clocks.

- **Random Event Selection:**  
  To mimic the unpredictable nature of distributed systems, we used randomness to decide if a machine will send a message or perform an internal event. This also allowed us to study how different event mixes affect the clock synchronization.

- **File-Based Logging:**  
  Each machine writes its events to a separate log file, which makes it easier to analyze the system’s behavior later by correlating the logs from different machines.


### Challenges and Considerations

1. **Simulating True Asynchrony:**  
   - **Challenge:** Real distributed systems run on separate machines. Simulating this on one machine means we need to carefully manage timing and communication between threads.
   - **Our Approach:** We used `time.sleep()` to simulate ticks and thread-safe queues to mimic network delays. However, the OS scheduling can sometimes introduce minor timing inaccuracies.

2. **Handling Clock Drift:**  
   - **Challenge:** Machines running at different tick rates will naturally have different logical clock values.
   - **Our Approach:** By using the update rule `max(local, received) + 1`, we ensure that clocks adjust correctly upon receiving messages. This lets us see how the clocks drift and then re-synchronize when communication occurs.

### Observations and Reflections

After running the simulation several times (multiple one-minute runs), we noted the following:
- **Logical Clock Jumps:**  
  When a machine gets a message with a higher timestamp, its clock jumps noticeably. This behavior shows the causal dependency between machines.

- **Message Queue Behavior:**  
  The size of the message queue varies over time. Sometimes, longer queues hint at periods of high communication traffic or slower processing.

- **Clock Drift:**  
  Machines running at slower tick rates sometimes diverge more in their logical clock values than faster machines, although regular message exchanges help bring them back in sync.

- **Event Mix Impact:**  
  A higher chance of sending messages helps keep the logical clocks more aligned across machines, while too many internal events can lead to larger differences.



## Implementation Details (Variant)

After our initial experiments with three virtual machines running with random clock rates between 1 and 6 ticks per second and a 70% chance of performing an internal event, we decided to explore another scenario. In this variant, we reduced the variation in clock cycles (using only 3 or 4 ticks per second) and lowered the probability of an internal event (down to 20%). This adjustment makes message passing more dominant in the simulation, and we wanted to see how this affects the synchronization and behavior of the Lamport clocks.

### Changes Made

1. **Smaller Variation in Clock Cycles:**  
   - **Before:** Each machine randomly picks a tick rate between 1 and 6 ticks per second, which determines how frequently it processes events. The tick duration is simply 1 divided by this tick rate.
   - **Now:** Each machine chooses from a much narrower range (either 3 or 4 ticks per second).  
   - **Why?**  
     This change reduces the timing differences between machines, which helps us observe whether the logical clocks become more aligned when the machines run at similar speeds.

2. **Lower Probability of Internal Events:**  
   - **Before:** A random integer from 1 to 10 was used, with internal events occurring 70% of the time (if the number was 4–10).  
   - **Now:** We use a random number between 0 and 1 with the following probabilities:
     - 30% chance to send a message to a random other machine.
     - 30% chance to send a message to the next machine in order.
     - 20% chance to send messages to both other machines.
     - 20% chance for an internal event.
   - **Why?**  
     By reducing the chance of internal events, we force more interactions between machines. This should help keep the logical clocks more synchronized and result in fewer large jumps due to message receptions.


### Observations and Reflections (Variant)

After running the variant simulation for several one-minute sessions, here’s what we observed:

- **Logical Clock Synchronization:**  
With the narrower tick rate (only 3 or 4 ticks per second) and more frequent message passing, the logical clocks stayed much closer together. The constant exchanges kept them from drifting apart.

- **Reduced Clock Jumps:**  
Because messages were sent more often, there were fewer instances of sudden, large jumps in the clock values. In the earlier setup, a message with a high timestamp could cause a big jump; now, the differences are more subtle.

- **Message Traffic:**  
The logs revealed that machines sent more messages and did fewer internal events. This made the overall system more interactive, reinforcing the causal ordering.

- **System Stability:**  
Overall, the system felt more stable, with less drift among the logical clocks—a useful observation for systems that need tight synchronization.
