Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize entity dynamic imports #1794

Conversation

Izzette
Copy link
Contributor

@Izzette Izzette commented May 16, 2024

Proposed change

Perform dynamic imports in registry in dedicated worker thread to avoid deadlocks from concurrent calls to importlib.import_module for the same module from different threads. This may not be sufficient if other code outside of python-holdays is importing these moduels directly. I believe this cpython issue is related https://bugs.python.org/issue38884. This should resolve (through a different implementation) #1514.

Here is a flamegraph of the deadlock in action on one thread:
image

My main concern with this PR is that it may introduce a slight performance degradation. I'm not sure adding a test will be feasible, as the deadlock is inherently racy. Making a reliable production in the python-holidays code would require adding a lot of synchronization points within python-holidays to instrument such a test and hammering the importlib.import_module with many threads and than calling importlib.invalidate_caches in a loop until the issue is reproduced.

Type of change

  • New country/market holidays support (thank you!)
  • Supported country/market holidays update (calendar discrepancy fix, localization)
  • Existing code/documentation/test/process quality improvement (best practice, cleanup, refactoring, optimization)
  • Dependency update (version deprecation/upgrade)
  • Bugfix (non-breaking change which fixes an issue)
  • Breaking change (a code change causing existing functionality to break)
  • New feature (new python-holidays functionality in general)

Checklist

  • I've followed the contributing guidelines
  • I've run make pre-commit, it didn't generate any changes
  • I've run make test, all tests passed locally

holidays/registry.py Outdated Show resolved Hide resolved
holidays/registry.py Outdated Show resolved Hide resolved
@Izzette
Copy link
Contributor Author

Izzette commented May 16, 2024

Ok, I've managed to get what I believe is a minimal reproduction, but on my M2 with Python 3.12.3 the deadlock detection in importlib seems to be working correctly, so I don't reproduce a "true" deadlock, but I suspect that this is the same deadlock that is affecting us in production. My code also does some benchmarking, but because the code that loads the module caches it, it doesn't really seem like it's likely to affect anyone ... and on top of it, it's only a degradation of about 8 µs (Average duration 4276.57 ns to Average duration 12868.65 ns).

Here's the deadlock detection in action:

Traceback (most recent call last):
  File "/Users/izzi/src/python-holidays/perf.py", line 125, in main
    t.join(30.0)
  File "/Users/izzi/src/python-holidays/perf.py", line 86, in join
    raise self.error
  File "/Users/izzi/src/python-holidays/perf.py", line 66, in run
    self._run_inner()
  File "/Users/izzi/src/python-holidays/perf.py", line 81, in _run_inner
    country_holidays(vals[1])
  File "/Users/izzi/src/python-holidays/holidays/utils.py", line 188, in country_holidays
    return getattr(holidays, country)(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/izzi/src/python-holidays/holidays/registry.py", line 207, in __call__
    cls = self.get_entity()
          ^^^^^^^^^^^^^^^^^
  File "/Users/izzi/src/python-holidays/holidays/registry.py", line 225, in get_entity
    self.entity = getattr(importlib.import_module(self.module_name), self.entity_name)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/Users/izzi/src/python-holidays/holidays/countries/isle_of_man.py", line 16, in <module>
    from holidays.countries.united_kingdom import UnitedKingdom, UnitedKingdomStaticHolidays
  File "<frozen importlib._bootstrap>", line 1357, in _find_and_load
  File "<frozen importlib._bootstrap>", line 418, in __enter__
  File "<frozen importlib._bootstrap>", line 346, in acquire
_frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('holidays.countries.united_kingdom') at 4397028400

Alarmingly I see two other bugs that seem to be race conditions due to the dynamic import:

Traceback (most recent call last):
  File "/Users/izzi/src/python-holidays/perf.py", line 125, in main
    t.join(30.0)
  File "/Users/izzi/src/python-holidays/perf.py", line 86, in join
    raise self.error
  File "/Users/izzi/src/python-holidays/perf.py", line 66, in run
    self._run_inner()
  File "/Users/izzi/src/python-holidays/perf.py", line 81, in _run_inner
    country_holidays(vals[1])
  File "/Users/izzi/src/python-holidays/holidays/utils.py", line 188, in country_holidays
    return getattr(holidays, country)(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/izzi/src/python-holidays/holidays/registry.py", line 207, in __call__
    cls = self.get_entity()
          ^^^^^^^^^^^^^^^^^
  File "/Users/izzi/src/python-holidays/holidays/registry.py", line 225, in get_entity
    self.entity = getattr(importlib.import_module(self.module_name), self.entity_name)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/Users/izzi/src/python-holidays/holidays/countries/__init__.py", line 80, in <module>
    from .isle_of_man import IsleOfMan, IM, IMN
ImportError: cannot import name 'IsleOfMan' from partially initialized module 'holidays.countries.isle_of_man' (most likely due to a circular import) (/Users/izzi/src/python-holidays/holidays/countries/isle_of_man.py)

And:

Traceback (most recent call last):
  File "/Users/izzi/src/python-holidays/perf.py", line 125, in main
    t.join(30.0)
  File "/Users/izzi/src/python-holidays/perf.py", line 86, in join
    raise self.error
  File "/Users/izzi/src/python-holidays/perf.py", line 66, in run
    self._run_inner()
  File "/Users/izzi/src/python-holidays/perf.py", line 81, in _run_inner
    country_holidays(vals[1])
  File "/Users/izzi/src/python-holidays/holidays/utils.py", line 188, in country_holidays
    return getattr(holidays, country)(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/izzi/src/python-holidays/holidays/registry.py", line 207, in __call__
    cls = self.get_entity()
          ^^^^^^^^^^^^^^^^^
  File "/Users/izzi/src/python-holidays/holidays/registry.py", line 225, in get_entity
    self.entity = getattr(importlib.import_module(self.module_name), self.entity_name)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1337, in _find_and_load_unlocked
KeyError: 'holidays.countries'

These errors seem to happen more frequently on the countries that import other countries (Isle of Man, American Samoa, etc.). I'm not surprised to see these errors, as while one thread is importing one module, another may be trying to access the partially initialized module and end up with either this KeyError or an ImportError due to using directly a partially initialized module.

We've also been able to confirm that this PR is working for us in our app when running locally, and doesn't degrade performance to any measurable extent.

Here's the code for the minimal-ish reproduction. I've done some extra tuning with conditions to try to get the reproduction to be more reliable, but of course it's inherently probabilistic. It's kind of messy, but it was super helpful for me to confirm the theory of our bug, and test this fix.

#!/usr/bin/env python

from typing import Optional
from types import TracebackType, FrameType
import time
import sys
import traceback
from threading import Thread, Condition, Lock
import importlib
import random


def bench():
    from holidays.registry import EntityLoader

    iters = 10_000
    start = time.monotonic_ns()
    for _ in range(iters):
        # Raw entity loader performance, but because the `HolidayBase` get cached in `holidays`
        # module, this ultimately should get called only once per country, pre-process, unless this
        # API is used directly.
        EntityLoader("holidays.countries.australia").get_entity()

    end = time.monotonic_ns()
    avg_duration = (end - start) / iters

    print(f"Average duration {round(avg_duration, 2):.2f} ns")


class Counter:
    cond: Condition
    count: int

    def __init__(self):
        self.cond = Condition(Lock())
        self.count = 0

    def inc(self):
        with self.cond:
            self.count += 1
            self.cond.notify_all()

    def wait(self, expected_count: int):
        while True:
            with self.cond:
                if self.count >= expected_count:
                    return

                self.cond.wait()


class Loader(Thread):
    counter: Counter
    cond: Condition
    error: Optional[Exception]

    def __init__(self, counter: Counter, cond: Condition, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.counter = counter
        self.cond = cond
        self.error = None

    def run(self):
        try:
            self._run_inner()
        except Exception as error:
            self.error = error

    def _run_inner(self):
        # Reload the things we need to test for a deadlock
        from holidays.utils import country_holidays
        from holidays.registry import COUNTRIES

        # Sync all threads to get them to execute the entity loader at the same time
        with self.cond:
            self.counter.inc()
            self.cond.wait()

        for vals in random.choices(list(COUNTRIES.values()), k=100):
            country_holidays(vals[1])

    def join(self, *args, **kwargs):
        super().join(*args, **kwargs)
        if not self.is_alive() and self.error is not None:
            raise self.error


# Fake exception class used to format the traceback for a "true" deadlock condition.
class FakeException(BaseException):
    pass


def main():
    bench()

    print("Reproducing deadlock / race conditions (this will take awhile) ...")
    reprod_attempts = 1_000
    reprod_threads = 100

    error_types_seen = set()

    for _ in range(reprod_attempts):
        # Clear the cached modules between each reproduction attempt
        for k in list(sys.modules.keys()):
            if k.split(".")[0] == "holidays":
                del sys.modules[k]

        importlib.invalidate_caches()

        counter = Counter()
        cond = Condition()
        ts: list[Thread] = []
        for _ in range(reprod_threads):
            t = Loader(counter, cond)
            t.start()
            ts.append(t)

        counter.wait(reprod_threads)
        with cond:
            cond.notify_all()

        for t in ts:
            try:
                t.join(30.0)
            except Exception as error:
                error_type = type(error)
                if error_type not in error_types_seen:
                    traceback.print_exception(error)
                    error_types_seen |= {error_type}

            if t.is_alive():
                frame: FrameType = sys._current_frames()[t.ident]
                tb = None
                while frame is not None:
                    tb = TracebackType(
                        tb, frame, frame.f_lasti, frame.f_lineno
                    )
                    frame = frame.f_back

                error = FakeException().with_traceback(tb)
                raise TimeoutError("Deadlock occurred") from error


if __name__ == "__main__":
    main()

Based on the above reproduction code, it seems that this PR fixes all 3 bugs caused by racy-ness in dynamic imports when using multithreading.

@Izzette Izzette marked this pull request as ready for review May 16, 2024 13:40
@coveralls
Copy link

coveralls commented May 16, 2024

Pull Request Test Coverage Report for Build 9187711103

Details

  • 4 of 4 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 9164625446: 0.0%
Covered Lines: 11274
Relevant Lines: 11274

💛 - Coveralls

@Izzette
Copy link
Contributor Author

Izzette commented May 16, 2024

I've updated the code to use a re-entrant lock instead of a thread pool. It has one main advantage over the thread pool strategy: it's re-entry safe. With a thread pool, if the code were ever to be updated so that importing a module using the lazy loader could lazy load another module (I'm not sure why we'd do this), it could deadlock depending on import order when waiting on the future for the work to be executed on the same thread. I don't think this will add any other deadlock potential, and it still passes the minimal reproduction check I shared. For bonus points it reduces the already negligible performance impact of the thread pool strategy to undetectably negilgable (I got Average duration 4621.12 ns with RLock) and it doesn't spawn a new thread, consuming RLIMIT_NPROC limit or the kernel task limit. I think the gopher in me just go carried away ...

@arkid15r
Copy link
Collaborator

Hi @Izzette,

Thanks for going above and beyond in order to identify and fix the issue. I have to admit this isn't a kind of PRs we normally receive. I'm going to look into it this weekend and do my best to squeeze the changes into our May 20th release.

Again, I really appreciate the detailed description.

@Izzette Izzette changed the title Solves #1514 by synchronizing dynamic imports in ThreadPoolExecutor Solves #1514 by synchronizing dynamic imports May 17, 2024
Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks stable, let's see how it goes in the wild.
I had just a minor addition to the PR -- a comment describing RLock.

@Izzette sorry I couldn't process this earlier and include into v0.49

holidays/registry.py Show resolved Hide resolved
@arkid15r arkid15r changed the title Solves #1514 by synchronizing dynamic imports Synchronize entity dynamic imports May 21, 2024
@Izzette Izzette force-pushed the 1514-synchronize-dynamic-import-to-avoid-deadlocks branch from 231ebb1 to b747c69 Compare May 22, 2024 08:11
@Izzette Izzette requested a review from arkid15r May 22, 2024 08:12
Copy link

sonarcloud bot commented May 22, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@KJhellico @PPsyrius any comments/concerns on this?

@KJhellico
Copy link
Collaborator

LGTM.

@PPsyrius
Copy link
Contributor

LGTM

@arkid15r
Copy link
Collaborator

Alright, merging it then.

@Izzette thanks for your contribution!

@arkid15r arkid15r enabled auto-merge May 22, 2024 18:59
@arkid15r arkid15r added this pull request to the merge queue May 22, 2024
Merged via the queue into vacanza:dev with commit 94371ef May 22, 2024
29 checks passed
@KJhellico KJhellico mentioned this pull request Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants