Skip to content

Race-condition in creation+release of RedLock #979

Open
@bodograumann

Description

@bodograumann

After switching from memory cache to Redis (Valkey) we got some errors in the RedLock context-manager release:

KeyError: '0506da2cf155316dd99dc58e1e705dc4d588ee0b7e28b9ee49ca2d1a51c88fe6a39681521397a5ee5fd577fc8e13e077-lock'
Traceback (most recent call last):
[…]
    async with RedLock(cache, key, lease=settings.aiohttp_timeout):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/aiocache/lock.py", line 91, in __aexit__
    await self._release()
  File "/app/.venv/lib/python3.12/site-packages/aiocache/lock.py", line 96, in _release
    RedLock._EVENTS.pop(self.key).set()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I have looked at the code and found the following race-condition.
Assume there are two parallel calls A and B, where A blocks B. Then another third call C is triggered, exactly as A and B resolve. Then the following can happen:

Until here everything is fine, but then a new event C already comes in, before B has finished.

  • C acquires the lock and adds the lock key back to Valkey, but not yet to _EVENTS
    await self.client._add(self.key, self._value, ttl=self.lease)
  • B removes the lock key from Valkey
    removed = await self.client._redlock_release(self.key, self._value)

This is already wrong, because the lock key belongs to the run of C and not of A and B.
On the other hand C will only retrieve the cached value from A, so locking at this time it not important to work.

  • B tries to also remove the event from _EVENTS, but it was not yet created by C:
    if removed:
    RedLock._EVENTS.pop(self.key).set()

This (probably) leads to the exception we are seeing.

We are not using the lock as a real synchronization mechanism, but to reduce redundant calculations, so when the locking does not work perfectly that is fine. It just should not throw these kind of exceptions.
So I would suggest always running RedLock._EVENTS.pop(self.key) on release (no matter whether the lock key was found in Valkey or not), but not fail on errors:

    async def _release(self):
        await self.client._redlock_release(self.key, self._value)
        with suppress(KeyError):
            RedLock._EVENTS.pop(self.key).set()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions