Blocking Fetch

ryanking edited this page Sep 12, 2010 · 2 revisions
Clone this wiki locally

Blocking fetches

Something that’s bothered me about using the memcache protocol is that there’s no way for a consumer to do a blocking fetch from a queue. If an item is immediately available, kestrel will give it to you. If not, you’ll immediately get a “nothing” response. Since, like I just said above, you always want to have more consumers/workers than work items, these consumers swarm all over the cluster, asking for work and immediately being sent away empty-handed. Just to keep them from going crazy, we have ruby client code that looks something like this:

    while !(response = QUEUE.get(queue_name))
      sleep 0.25

Good grief. If we’re going to let the workers take a nap on the job, we could at least make it happen while blocking on a queue fetch.

So I did a little sneaky thing with queue names in the memcache “get” command by letting clients add options to the end, separated by slashes. Slashes aren’t allowed in filenames anyway so they were never valid in queue names. Then I made a timeout option, so a client can ask to block for work for some amount of time:

    while !(response = QUEUE.get("#{queue_name}/t=250")); end

The “t=250” option means “if there’s nothing in the queue right now, I’m willing to wait up to 250 milliseconds for something to arrive”. After that timeout, if there’s still nothing, kestrel will answer with the usual empty-response. It’s important here to make sure that your memcache client is set to have a read-timeout larger than the timeout you send in the “get” request.

This was the easiest thing to implement after I worked out how. Each queue just has a kernel-style wait-list of clients attached to it. If a client makes a timeout-style “get” request, and the queue is empty, we just put the client on the wait-list and the client’s actor does a receiveWithin(timeout) to wait for a message saying something new has arrived. When items are put on the queue, the first wait-list client is removed from the wait-list and notified.

The ManyClients load test exercises this by having 100 (or 500) clients pile on to a queue with blocking fetches while a single producer slowly trickles out data. It seems to work like a charm.