Description
After switching from memory cache to Redis (Valkey) we got some errors in the RedLock context-manager release:
KeyError: '0506da2cf155316dd99dc58e1e705dc4d588ee0b7e28b9ee49ca2d1a51c88fe6a39681521397a5ee5fd577fc8e13e077-lock'
Traceback (most recent call last):
[…]
async with RedLock(cache, key, lease=settings.aiohttp_timeout):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/aiocache/lock.py", line 91, in __aexit__
await self._release()
File "/app/.venv/lib/python3.12/site-packages/aiocache/lock.py", line 96, in _release
RedLock._EVENTS.pop(self.key).set()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I have looked at the code and found the following race-condition.
Assume there are two parallel calls A and B, where A blocks B. Then another third call C is triggered, exactly as A and B resolve. Then the following can happen:
- A acquires the lock
Lines 71 to 72 in 1e55135
- A adds the lock key to Valkey
Line 77 in 1e55135
- A adds the event to _EVENTS
Line 78 in 1e55135
- B tries to acquire the lock, but waits for A to finish
Lines 79 to 80 in 1e55135
- A finishes and releases the lock:
Lines 90 to 91 in 1e55135
- A removes the lock key from Valkey
Line 94 in 1e55135
- A removes the event from _EVENTS
Lines 95 to 96 in 1e55135
Until here everything is fine, but then a new event C already comes in, before B has finished.
- C acquires the lock and adds the lock key back to Valkey, but not yet to _EVENTS
Line 77 in 1e55135
- B removes the lock key from Valkey
Line 94 in 1e55135
This is already wrong, because the lock key belongs to the run of C and not of A and B.
On the other hand C will only retrieve the cached value from A, so locking at this time it not important to work.
- B tries to also remove the event from _EVENTS, but it was not yet created by C:
Lines 95 to 96 in 1e55135
This (probably) leads to the exception we are seeing.
We are not using the lock as a real synchronization mechanism, but to reduce redundant calculations, so when the locking does not work perfectly that is fine. It just should not throw these kind of exceptions.
So I would suggest always running RedLock._EVENTS.pop(self.key)
on release (no matter whether the lock key was found in Valkey or not), but not fail on errors:
async def _release(self):
await self.client._redlock_release(self.key, self._value)
with suppress(KeyError):
RedLock._EVENTS.pop(self.key).set()