# Engineering Notebook: Design Exercise 3 Scale Models and Logical Clocks Simulation

### Start

In this project, we built a simulation of a small, asynchronous distributed system running on a single machine. The goal was to model multiple virtual machines that operate at different speeds and use Lamport logical clocks to maintain event ordering. Each machine:
- Picks a random clock rate between 1 and 6 ticks per second.
- Has its own message queue for incoming messages.
- Updates its Lamport clock based on local events, sends, and receives.
- Logs every event with details like system time, logical clock value, and current queue length.

This simulation helps us understand how causal ordering works in distributed systems, even when the machines run at different speeds.


## Implementation Details

### Virtual Machines (Processes)

- **Thread-Based Simulation:**  
  Each virtual machine is implemented as a separate Python thread. This is an easy way to simulate independent machines without the need for actual network communication.

- **Clock Rate:**  
  Every machine randomly selects a tick rate (1–6 ticks per second). The tick duration (1 divided by the tick rate) controls how fast the machine runs its loop.


### Logical Clocks and Message Passing

- **Message Queues:**  
  We use Python’s `queue.Queue` to create a separate message queue for each machine. This simulates asynchronous communication between the machines.

- **Lamport Clock Rules:**  
  - **Internal Events:** The machine simply increments its logical clock by 1.  
  - **Send Events:** The machine increments its clock by 1, sends a message (containing the current clock value) to the target machine, and logs the event.  
  - **Receive Events:** When a message is received, the machine updates its clock to `max(local_clock, received_timestamp) + 1` and logs the receipt.


### Event Handling and Logging

- **Event Selection:**  
  On each tick, a machine checks its queue first:
  - If there’s a message, it processes it.
  - If not, it randomly decides to send a message (to one or both other machines) or perform an internal event.

- **Logging:**  
  Each machine writes a log entry for every event. The log includes:
  - The event type (SEND, RECEIVE, INTERNAL)
  - The current system time (using `datetime.now()`)
  - The current logical clock value
  - Extra details (like which machine was sent to or from where the message came)
  - The length of the message queue at that time


### Stopping the Simulation

The simulation runs for a fixed duration (e.g., 60 seconds). After that, a shared stop event is set, and all machine threads finish up, close their log files, and the program exits.


## Design Decisions

- **Using Threads and Queues:**  
  We chose Python’s threads and in-memory queues because they are simple to work with and let us simulate a distributed system without the complexity of real network communication.

- **Implementing Lamport Clocks:**  
  The classic Lamport clock mechanism discussed in lecture was implemented since it is a fundamental concept in distributed systems. It helps us understand how to maintain event ordering without relying on physical clocks.

- **Random Event Selection:**  
  To mimic the unpredictable nature of distributed systems, we used randomness to decide if a machine will send a message or perform an internal event. This also allowed us to study how different event mixes affect the clock synchronization.

- **File-Based Logging:**  
  Each machine writes its events to a separate log file, which makes it easier to analyze the system’s behavior later by correlating the logs from different machines.


## Challenges and Considerations

1. **Simulating True Asynchrony:**  
   - **Challenge:** Real distributed systems run on separate machines. Simulating this on one machine means we need to carefully manage timing and communication between threads.
   - **Our Approach:** We used `time.sleep()` to simulate ticks and thread-safe queues to mimic network delays. However, the OS scheduling can sometimes introduce minor timing inaccuracies.

2. **Handling Clock Drift:**  
   - **Challenge:** Machines running at different tick rates will naturally have different logical clock values.
   - **Our Approach:** By using the update rule `max(local, received) + 1`, we ensure that clocks adjust correctly upon receiving messages. This lets us see how the clocks drift and then re-synchronize when communication occurs.

## Observations and Reflections

After running the simulation several times (multiple one-minute runs), we noted the following:
- **Logical Clock Jumps:**  
  When a machine receives a message with a higher timestamp, its clock jumps up significantly. This jump clearly indicates the causal dependency between machines.

- **Message Queue Behavior:**  
  The length of the message queue fluctuates over time. Periods with longer queues might indicate high communication traffic or slower processing on that machine.

- **Clock Drift:**  
  Machines with slower tick rates sometimes show more divergence in their logical clock values compared to faster machines. However, regular message exchanges help realign the clocks.

- **Event Mix Impact:**  
  A higher probability of message sends tends to keep the logical clocks more synchronized across machines, while mostly internal events can lead to larger disparities.


## Conclusion

This simulation allowed us to get hands-on experience with concepts like asynchronous communication, logical clock synchronization, and event ordering in distributed systems. The challenges we encountered, such as handling clock drift and simulating true asynchrony, provided valuable insights. Moreover, the project opens up many avenues for further exploration, from adding more nodes to integrating real network communication.
