Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- it might be the case that `writing` bitset is not set, but there's a write for the slot in the queue: replies can wait on read to complete. - when a faulty read completes, it might clobber faulty bit, unset by a a write which was scheduled after the read. The specific series of events here: 1. Replica receives a RequestReply and starts a reply read 2. The read completes with a failure, replica sets the faulty bit 3. Replica receives RequestReply starts a reply read 4. Replica receives Reply and starts a reply write - the write unsets faulty bit - the write doesn't start, because there's a read executing 4. The read completes, setting the faulty bit _again_ 5. Replica receives RequestReply - It _doesn't_ start reply read, because there's an in-progress write that can resolve a read. - But the faulty bit is set, tripping up an assertion. The root issue here is the race between a read and a write for the same reply. Remove the race by explicitly handling the interleaving: * When submitting a read, resolve it immediately if there's a pending write (this was already handled by `read_reply_sync`) * When submitting a write, resolve any pending reads for the same reply. * Remove the code to block the write while the read is in-progress, as this is no longer possible. Note that it is still possible that a read and a write for the same slot race, if they target different replies. In this case, there won't be clobbering, as, when the read completes, we double-check freshness by consulting `client_sessions`. SEED: 2517747396662708227 Closes: #1511
- Loading branch information