In this project, I have built a two-phase commit protocol in a distributed system using Python and SQLite.
Part one: Durable Remote File Service File service supports these two methods. 1- writeFile - given the file name and contents, the corresponding file should be written to the server. 2- readFile - if a file with a given name exists on the server, return its contents. Otherwise, an exception should be thrown.
Part two: Two-Phase Commit The architecture of the replicated file service consists of a single “coordinator” process and multiple “participant” processes: coordinator - the “coordinator” process should expose an RPC interface to clients that contains two methods: writeFile and readFile. When the coordinator process receives a state-changing operation, i.e., writeFile, it uses two-phase commit to commit that state-changing operation to all participants. When the coordinator receives a readFile operation, it selects a participant at random to issue the request against. The coordinator should be concurrent. It should allow multiple clients to connect to it and should be able to process multiple operations concurrently.
participant - each participant implements a durable remote file service you built in Part 1. It exposes an RPC interface to the coordinator to participate in two-phase commit. It is up to you what other methods the participants should implement to perform two-phase commit. The participants should also be concurrent: it should be able to process multiple commands concurrently. Thus, it needs to implement some form of concurrency control. For this project, the concurrency control scheme is very simple: if two concurrent operations manipulate different files, they can safely proceed con- currently. If two concurrent operations manipulate the same file (e.g., two writeFile operations to the same file show up at the same time), the first to arrive should be able to proceed while the second’s two-phase commit should abort.
Here, both the coordinator and the participants implement logging to keep durable state avoiding inconsistency due to crashes.
Part three: Test cases 1 - Durability. After a successful commit of writeFile, all participants crashed. When they are restarted, a readFile request returns the correct file content.
2 - Concurrency Control. If two concurrent writeFile requests to the same file arrive at the same participant at the same time, the later one should be aborted. Note here that concurrency control is only done at the participants, not at the coordinator.
3 - Coordinator Failure Case 1. The coordinator failed before voting started, i.e., no voting messages were sent out. All participants time out and abort the writeFile request. When the coordinator recovers, it restarts voting and finds that the all participants have aborted.
Coordinator Failure Case 2. The coordinator failed after voting has started, and at least one participant has replied Yes. When the coordinator recovers, it first checks if it has logged the final decision. If yes, it sends out the final decision, and all participants apply appropriate changes based on the final decision. If not, the coordinator queries all participants for their decisions and make a final decision based on these decisions.
Participant Failure. One participant failed after replying Yes to commit but before receiving the coordina- tor’s final decision. When it recovers, it asks the coordinator for the final decision. If the final decision is to commit, the participant commits. Otherwise, aborts the operation.
The 2PC protocol works in the following manner: one node is a designated coordinator, and the rest of the nodes in the network are designated the participants or cohorts. The protocol assumes that there is stable storage at each node with a write-ahead log, that no node crashes forever, that the data in the write-ahead log is never lost or corrupted in a crash, and that any two nodes can communicate with each other. The protocol is initiated by the coordinator after the last step of the transaction has been reached. The cohorts then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the cohort.
The message flow between Coordinator and Participant looks like this:
Coordinator Cohort QUERY TO COMMIT --------------------------------> VOTE YES/NO prepare*/abort* <------------------------------- commit*/abort* COMMIT/ROLLBACK --------------------------------> ACKNOWLEDGMENT commit*/abort* <-------------------------------- end
Here, Client calls the Coordinator for any operation such as read/write file. The Coordinator approaches each Participant and so on. Whenever there is a Coordinator or Participant crash the file content to be read/write is saved on to the SQLite database and the activity is recorded on the write-ahead log. Now, the recovery process in a Participant or Coordinator looks for the latest activity in the log and resumes its action in the message flow.
Instructions to run
Install Thrift and go to thrift-lab/python folder:
- thrift -r --gen py ../bank.thrift Compiles and generates thrift files
- make compiles, creates classes to enable the scripts to run
- python 2PhaseCommitTest.py testcasenumber portnumber coordinatortimeout participanttimeout operation filename
- run 2PhaseCommitTest.py test script to check if all the test cases are working properly
1 - Durability. python 2PhaseCommitTest.py 1 9008 50 30 write vidya.txt python 2PhaseCommitTest.py 1 9008 50 30 read vidya.txt
output of client with write operation: write file is successful output of client with read operation: File Contents: You are reading the content of the file
Internally, the test script follows these steps: i. runs the coordinator ii. runs both participants iii. runs the client with write operation iv. terminates one participant v. runs the client with read operation
2 - Concurrency Control. python 2PhaseCommitTest.py 2 9016 50 100 write vidya.txt python 2PhaseCommitTest.py 1 9008 50 30 read vidya.txt
output: the first client which communicates coordinator will write while the other client fails to write
Internally, the test script follows these steps: i. runs the coordinator ii. runs both participant iii. runs both client at the same time with write operation
Also, the database can be queried to check which write operation is successful instead of a read operation. Query DB: sqlite3 p1.db SQLite version 3.7.13 2012-06-11 02:05:22 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> select * from Info; vidya.txt|we are awesome
3 - Coordinator Failure Case 1. python 2PhaseCommitTest.py 3 9008 50 30 write vidya.txt
output: the coordinator failed after writing WRITE to the log and recovery restarts canCommit if the log has WRITE and it finds all participants aborted
Internally, the test script follows these steps: i. runs the coordinator ii. runs both participant iii. runs client with write operation iv. kills the coordinator after writing WRITE to the log and restarts it
- Coordinator Failure Case 2. python 2PhaseCommitTest.py 4 9008 50 100 write vidya.txt and python 2PhaseCommitTest.py 5 9008 50 100 write vidya.txt
output: coordinator failed after voting and it sends out the final decision GLOBAL_COMMIT or GLOBAL_ABORT to all the participants
Internally, the test script follows these steps: i. runs the coordinator ii. runs both participant iii. runs client with write operation iv. kills coordinator after voting is started and restarts it
- Participant Failure. python 2PhaseCommitTest.py 6 9008 50 100 write vidya.txt
output: Participants fails and recovers to find the last line in the log is VOTE_COMMIT or VOTE_ABORT It requests Coordinator for final decision and output here will be Global_Commit or Global_Abort
The work done here may only be used as reference material, not to be submitted as your own, with or without edits.
Copyright © 2017