Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locking of cache item per key #107

Closed
CMCDragonkai opened this issue Oct 10, 2013 · 13 comments
Closed

Locking of cache item per key #107

CMCDragonkai opened this issue Oct 10, 2013 · 13 comments

Comments

@CMCDragonkai
Copy link
Contributor

I have a situation where there may be race conditions in the cache. That is when multiple requests are sent to the server, each one may manipulate the cache at the same time. Is there way to lock the cache item per process and make them evaluate one at a time after the previous lock opens?

@tedivm
Copy link
Member

tedivm commented Oct 10, 2013

Have you looked at the stampede protection features? That may be what you're looking for-

http://stash.tedivm.com/Invalidation.html

@CMCDragonkai
Copy link
Contributor Author

I did have a look at it. I wish it was explained it bit clearer. I don't know what these mean:

Stash\Item::SP_OLD
Item::SP_NONE
Item::SP_PRECOMPUTE
Item::SP_VALUE

It's basically changing the way the "get" is working right?

@tedivm
Copy link
Member

tedivm commented Oct 10, 2013

Yeah- specifically how it deals with a locked item. If another process has locked the item then the get function can respond to that lock in a number of ways.

For NONE it just returns back that it's a miss. This is basically the same as not using a lock at all.

For PRECOMPUTE it actually returns a cache miss before the item expires, but only does so once- this way that single process will regenerate the cached item and put it in the cache before it expires. That means only a single process gets a miss, and you don't have the stampede problem. This is probably the best option to use.

VALUE has the get function return a specific predefined value when the item has been locked by another process.

Finally, OLD returns the old stale value (if it's present).

@CMCDragonkai
Copy link
Contributor Author

For NONE: If the item has expired already (that is ran past the expiration time), what does the "get()" actually return? What do you mean by a "miss"? Is it a falsey value?

Same question for the PRECOMPUTE.

And can you please give me an overview of the workflow. I'm kind of confused by what you mean if a another process locks it. A diagram would be good in the docs. Something like if one process calls "lock()" what does that mean for other processes that are running concurrently and what does that mean for the process that called lock()? Also I bet this doesn't do anything for the Ephemeral driver right? Since what would lock mean for the in memory cache...?

One more thing, the lock() has a ttl option. What does the ttl option do for the lock?

In the documentation it says that lock() is generally called after the isMiss() returns true. I suspect the intention of this function is to make sure that we lock this item to modified/regenerated after it has been expired. Thereby avoiding the stampeded effect where multiple processes try to modify/regenerate the expired data.

However in my situation, I actually may modify the cached item regardless of whether it has been expired or not. This may happen concurrently. Therefore can I call lock() by itself without checking whether the item has been expired or not. Furthermore, is there a way to call unlock() in the same process, so I can unlock the item without waiting for the process to end? (I'm attempting session storage locking mechanism, but it's a custom implementation). Actually how would I unlock the cache?

Are these lock/unlock affected by the storage mechanism, like how does it work on the Redis end?

@tedivm
Copy link
Member

tedivm commented Oct 10, 2013

A miss is a value that has expired or was not in the cache.

get() will return "null" for NONE.

get() is more complicated for precompute. If the value qualifies as a miss- it expired or doesn't exist- then it will return null. If the value exists and is value then it makes a calculation- if the value is going to expire soon it picks a random process to say the value is a miss. This is to trigger that one process to make a new value That process acts like it's a miss, so it should return null. During that time the rest of the processes will return the value, unless of course the value is a miss.

I'll try to come up with a diagram, but I still want to try explaining now. If someone calls lock it should only get called in one process, because when it performs a lock it makes get's behavior change. In most of those changes get returns a value or does something other than cause the process to save a new value. So the work flow for that one process is essentially cache miss -> cache block -> lock -> regeneration code -> set -> resume normal code. For other processes it's cache hit -> get value -> continue normal code. You are right that this is meaningless for the ephemeral driver, unless you happen to create multiple Items with the same key that use it as a driver.

The ttl option for lock is to prevent a failed or crashed process from keeping an item from ever being regenerated. The argument there is in case the default is shorter than a specific long running process.

I'm not sure I follow what you mean by "I suspect the intention of this function is to make sure that we lock this item to modified/regenerated after it has been expired. ".

If you call lock without running the isMiss check then it's pretty much useless. Calling set also clears the lock. Did you know there's a session storage system built into Stash?

The lock/unlock stuff should work consistently across all drivers.

@CMCDragonkai
Copy link
Contributor Author

Yes I noticed a session storage system, however it doesn't quite fit my use case.

Regarding this:
"If you call lock without running the isMiss check then it's pretty much useless. Calling set also clears the lock. Did you know there's a session storage system built into Stash?"

Are you saying that calling lock() by itself will do absolutely nothing? It's great that set will clear the lock, that means whenever I modify the cache (by setting items), the lock will be released automatically. However I would like to be able to set an exclusive write lock by just calling lock(), and this is regardless of whether the data is stale or not. This is so that concurrent processes that write to the cache won't conflict with each other, and instead the writes should be queued up. Furthermore an unlock() would be useful so that it releases the lock if you don't have any data to set.

Basically something similar to how PHP does:

session_start(); //session gets locked (this process can write)
//do whatever to the session
session_write_close(); //release lock, other processes can now write to the session (but this process can't)

session_start(); //reopen the lock again (this process can write)
//do whatever to the session
session_write_close(); //release lock, other processes can now write to the session (but this process can't)

Multiple session_starts() will cause errors, but I have something that fixes those errors, so you can assume that multiple session_starts work.

@tedivm
Copy link
Member

tedivm commented Oct 10, 2013

Stash can't currently handle that use case- the "lock" function really is put in there so one process can regenerate the cache, but there's nothing that to handle queuing of writes. I'm going to need to think a bit on this one.

@CMCDragonkai
Copy link
Contributor Author

PHP's native session implementation uses files, so I suggest perhaps flock()? Some databases won't require locking at all, such as Redis. MySQL doesn't really require locking since it does it by itself, but there is table locking. The main thing would be to use transactions. I'm not sure about the other drivers (except Ephemeral which won't require locks either).

The main thing isn't concurrent reads, it's mostly a problem of concurrent writes.

However I suppose I can work with Stash at the moment, since there's not that much writing going on.

@tedivm
Copy link
Member

tedivm commented Oct 11, 2013

Well the filesystem based cache uses flock behind the scenes. There isn't a driver that's going to have internal issues with concurrent writes, they'll all handle them just fine (but possibly not in the order you're like, which really should not matter if you're caching properly).

@CMCDragonkai
Copy link
Contributor Author

Hmm well I shall try it then. The reason that I was concerned was basically all the issues around session locking and since I'm using this as my session persistence backend I was just concerned whether that would be an issue with AJAX.

@CMCDragonkai
Copy link
Contributor Author

It would still be good for there to be an independent lock() and unlock() so there's more fine grained control in the future.

@CMCDragonkai
Copy link
Contributor Author

Hello again, just a question. Does clear() also clears the lock? You said before set() will automatically clear the lock. But sometimes I don't want to reset the data, but actually clear it completely.

Also if I call lock() on one process. If a concurrent process then called isMiss, will that return true or false? Because you said that lock affects how another process will process the get, but that must mean the isMiss for the other processes also changes their function. Should it be that isMiss returns false, because the other process has locked. This is how you prevent multiple processes from regenerating the cache?

@gggeek
Copy link
Contributor

gggeek commented Dec 2, 2013

@CMCDragonkai stampede protection is a clear usecase: "while one process is generating a cache item, avoid other processes from also doing the same, as it is a waste of resources".
The "lock" you want is instead a bit fuzzy - maybe you should start with a more detailed explanation of what you want to do?
I would be against introducing very high-level semantics in stash, and focus on low-level semantics instead, allowing users of the library to decide exactly what-to do-when.
As an example: did you take a look at the cas() function from memcache? It is a sound building block when building concurrency-protected code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants