Reliable Fetch

mrflip edited this page Sep 12, 2010 · 1 revision
Clone this wiki locally

Reliable Fetch

Writing something into a queue is pretty reliable. The client does a “set” operation, and if it worked, kestrel responds “STORED”. Naturally, it only sends that response after the item has been written into the queue’s journal file. The “STORED” response means kestrel has taken responsibility for the item.

Fetching from a queue is not such a happy story. When kestrel sends an item to a client, it will never get an acknowledgement or confirmation, and has to blithely assume that the client got all the data okay and took responsibility for it. If a client loses its connection during the data transfer, or crashes right after receiving a work item, that item is gone forever.

So I added an “open” option to “get” which opens a tentative fetch on a queue. If an item is available, kestrel will remove it from the queue and send it to the client as usual. But it will also set the item aside and prepare to “un-get” it if the client disconnects without confirming it. So a tentative fetch is started with:


and confirmed with:


which returns nothing. For efficiency, you can also confirm a previous fetch and get the next item in one operation (avoiding an extra round-trip):


Each client connection may only have one outstanding tentative fetch, and if a connection is dropped, any tentatively-fetched item will be put back on the head of the queue and given to the next available consumer.

I want to briefly make a distinction here between confirming that a client receives an enqueued item and confirming that some useful work was done on it. Kestrel can really only concern itself with the former. As a good queue server, it would like confirmation that a client has accepted responsibility for an item before that item is erased from the queue and journal. But it has no way of confirming that “something useful” was done with that item. You still need to write careful client code to ensure that an item isn’t lost after it’s received.

Using reliable fetch means you are protected from losing items, at the expense of potentially receiving duplicates — that’s the trade-off. A client may successfully handle a fetched item but crash before confirming it to kestrel, and the item may then be given to another client. I think this is a good trade-off, though. If you know you may handle some items twice, you can design your system so that duplicate work is harmless — versus the case where you may lose items and don’t have any recourse.