New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locking of cache item per key #107
Comments
Have you looked at the stampede protection features? That may be what you're looking for- |
I did have a look at it. I wish it was explained it bit clearer. I don't know what these mean:
It's basically changing the way the "get" is working right? |
Yeah- specifically how it deals with a locked item. If another process has locked the item then the get function can respond to that lock in a number of ways. For NONE it just returns back that it's a miss. This is basically the same as not using a lock at all. For PRECOMPUTE it actually returns a cache miss before the item expires, but only does so once- this way that single process will regenerate the cached item and put it in the cache before it expires. That means only a single process gets a miss, and you don't have the stampede problem. This is probably the best option to use. VALUE has the get function return a specific predefined value when the item has been locked by another process. Finally, OLD returns the old stale value (if it's present). |
For NONE: If the item has expired already (that is ran past the expiration time), what does the "get()" actually return? What do you mean by a "miss"? Is it a falsey value? Same question for the PRECOMPUTE. And can you please give me an overview of the workflow. I'm kind of confused by what you mean if a another process locks it. A diagram would be good in the docs. Something like if one process calls "lock()" what does that mean for other processes that are running concurrently and what does that mean for the process that called lock()? Also I bet this doesn't do anything for the Ephemeral driver right? Since what would lock mean for the in memory cache...? One more thing, the lock() has a ttl option. What does the ttl option do for the lock? In the documentation it says that lock() is generally called after the isMiss() returns true. I suspect the intention of this function is to make sure that we lock this item to modified/regenerated after it has been expired. Thereby avoiding the stampeded effect where multiple processes try to modify/regenerate the expired data. However in my situation, I actually may modify the cached item regardless of whether it has been expired or not. This may happen concurrently. Therefore can I call lock() by itself without checking whether the item has been expired or not. Furthermore, is there a way to call unlock() in the same process, so I can unlock the item without waiting for the process to end? (I'm attempting session storage locking mechanism, but it's a custom implementation). Actually how would I unlock the cache? Are these lock/unlock affected by the storage mechanism, like how does it work on the Redis end? |
A miss is a value that has expired or was not in the cache. get() will return "null" for NONE. get() is more complicated for precompute. If the value qualifies as a miss- it expired or doesn't exist- then it will return null. If the value exists and is value then it makes a calculation- if the value is going to expire soon it picks a random process to say the value is a miss. This is to trigger that one process to make a new value That process acts like it's a miss, so it should return null. During that time the rest of the processes will return the value, unless of course the value is a miss. I'll try to come up with a diagram, but I still want to try explaining now. If someone calls lock it should only get called in one process, because when it performs a lock it makes get's behavior change. In most of those changes get returns a value or does something other than cause the process to save a new value. So the work flow for that one process is essentially cache miss -> cache block -> lock -> regeneration code -> set -> resume normal code. For other processes it's cache hit -> get value -> continue normal code. You are right that this is meaningless for the ephemeral driver, unless you happen to create multiple Items with the same key that use it as a driver. The ttl option for lock is to prevent a failed or crashed process from keeping an item from ever being regenerated. The argument there is in case the default is shorter than a specific long running process. I'm not sure I follow what you mean by "I suspect the intention of this function is to make sure that we lock this item to modified/regenerated after it has been expired. ". If you call lock without running the isMiss check then it's pretty much useless. Calling set also clears the lock. Did you know there's a session storage system built into Stash? The lock/unlock stuff should work consistently across all drivers. |
Yes I noticed a session storage system, however it doesn't quite fit my use case. Regarding this: Are you saying that calling lock() by itself will do absolutely nothing? It's great that set will clear the lock, that means whenever I modify the cache (by setting items), the lock will be released automatically. However I would like to be able to set an exclusive write lock by just calling lock(), and this is regardless of whether the data is stale or not. This is so that concurrent processes that write to the cache won't conflict with each other, and instead the writes should be queued up. Furthermore an unlock() would be useful so that it releases the lock if you don't have any data to set. Basically something similar to how PHP does: session_start(); //session gets locked (this process can write)
//do whatever to the session
session_write_close(); //release lock, other processes can now write to the session (but this process can't)
session_start(); //reopen the lock again (this process can write)
//do whatever to the session
session_write_close(); //release lock, other processes can now write to the session (but this process can't) Multiple session_starts() will cause errors, but I have something that fixes those errors, so you can assume that multiple session_starts work. |
Stash can't currently handle that use case- the "lock" function really is put in there so one process can regenerate the cache, but there's nothing that to handle queuing of writes. I'm going to need to think a bit on this one. |
PHP's native session implementation uses files, so I suggest perhaps The main thing isn't concurrent reads, it's mostly a problem of concurrent writes. However I suppose I can work with Stash at the moment, since there's not that much writing going on. |
Well the filesystem based cache uses flock behind the scenes. There isn't a driver that's going to have internal issues with concurrent writes, they'll all handle them just fine (but possibly not in the order you're like, which really should not matter if you're caching properly). |
Hmm well I shall try it then. The reason that I was concerned was basically all the issues around session locking and since I'm using this as my session persistence backend I was just concerned whether that would be an issue with AJAX. |
It would still be good for there to be an independent lock() and unlock() so there's more fine grained control in the future. |
Hello again, just a question. Does Also if I call |
@CMCDragonkai stampede protection is a clear usecase: "while one process is generating a cache item, avoid other processes from also doing the same, as it is a waste of resources". |
I have a situation where there may be race conditions in the cache. That is when multiple requests are sent to the server, each one may manipulate the cache at the same time. Is there way to lock the cache item per process and make them evaluate one at a time after the previous lock opens?
The text was updated successfully, but these errors were encountered: