Prevent race condition in guarded_import #124

viktordick · 2022-03-26T14:43:37Z

No description provided.

viktordick · 2022-03-26T15:10:24Z

I accidentally included an rst syntax error in CHANGES.rst, which is why the build failed - but I am unable to push the fix until all checks are finished...
But the change itself could already be reviewed.

dataflake · 2022-03-26T15:19:33Z

I stopped the tests, pushed the fix and now they are running again.

viktordick · 2022-03-26T15:20:06Z

Thanks!

dataflake · 2022-03-26T15:24:40Z

Does your test actually provoke the problem? It does not in my testing. When I remove the fix the test still succeeds.

viktordick · 2022-03-26T15:33:09Z

Hm it does on my system (ArchLinux, 8 cores, Python 3.10.3)

> tox -e py3
py3 develop-inst-noop: /home/viktor/git/AccessControl
py3 installed: -e git+ssh://git@github.com/zopefoundation/AccessControl@b2d26111e746d97731536b4633a6ef2ab4b9fec2#egg=AccessControl,Acquisition==4.10,AuthEncoding==4.3,BTrees==4.10.0,cffi==1.15.0,DateTime==4.4,ExtensionClass==4.6,multipart==0.2.4,Persistence==3.3,persistent==4.9.0,pycparser==2.21,python-gettext==4.0,pytz==2022.1,RestrictedPython==5.2,six==1.16.0,transaction==3.0.1,zExceptions==4.2,zope.browser==2.3,zope.component==5.0.1,zope.configuration==4.4.0,zope.contenttype==4.5.0,zope.deferredimport==4.4,zope.deprecation==4.4.0,zope.event==4.5.0,zope.exceptions==4.5,zope.hookable==5.1.0,zope.i18n==4.9.0,zope.i18nmessageid==5.0.1,zope.interface==5.4.0,zope.location==4.2,zope.proxy==4.5.0,zope.publisher==6.1.0,zope.schema==6.2.0,zope.security==5.2,zope.testing==4.10,zope.testrunner==5.4.0
py3 run-test-pre: PYTHONHASHSEED='3487369832'
py3 run-test: commands[0] | zope-testrunner --test-path=src -vc
Running tests at level 1
Running zope.testrunner.layer.UnitTests tests:
  Set up zope.testrunner.layer.UnitTests in 0.000 seconds.
  Running:
.......Exception in thread Thread-2 (threaded_run):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/home/viktor/git/AccessControl/src/AccessControl/tests/testModuleSecurity.py", line 82, in threaded_run
    self.assertAuth('AccessControl.tests.public_module', ())
  File "/home/viktor/git/AccessControl/src/AccessControl/tests/testModuleSecurity.py", line 48, in assertAuth
    guarded_import(module, fromlist=fromlist, level=level)
  File "/home/viktor/git/AccessControl/src/AccessControl/ZopeGuards.py", line 423, in guarded_import
    module = load_module(None, None, mnameparts, validate, globals, locals)
  File "/home/viktor/git/AccessControl/src/AccessControl/ZopeGuards.py", line 484, in load_module
    nextmodule = secureModule(mname, globals, locals)
  File "/home/viktor/git/AccessControl/src/AccessControl/SecurityInfo.py", line 273, in secureModule
    del _moduleSecurity[mname]
KeyError: 'AccessControl.tests.public_module'


Failure in test testPublicModuleThreaded (AccessControl.tests.testModuleSecurity.ModuleSecurityTests)
Traceback (most recent call last):
  File "/usr/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/home/viktor/git/AccessControl/src/AccessControl/tests/testModuleSecurity.py", line 92, in testPublicModuleThreaded
    self.assertEqual(len(finished), 2)
  File "/usr/lib/python3.10/unittest/case.py", line 845, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/usr/lib/python3.10/unittest/case.py", line 838, in _baseAssertEqual
    raise self.failureException(msg)
AssertionError: 1 != 2

............................................................................................................................................................................................................................................................................................
  Ran 291 tests with 1 failures, 0 errors, 21 skipped in 0.375 seconds.
Tearing down left over layers:
  Tear down zope.testrunner.layer.UnitTests in 0.000 seconds.

Tests with failures:
   testPublicModuleThreaded (AccessControl.tests.testModuleSecurity.ModuleSecurityTests)
ERROR: InvocationError for command /home/viktor/git/AccessControl/.tox/py3/bin/zope-testrunner --test-path=src -vc (exited with code 1)
______________________________________________________ summary ______________________________________________________
ERROR:   py3: commands failed

Not sure how to provoke a race condition consistently. I was actually surprised that it worked on the first try here.

dataflake · 2022-03-26T15:44:31Z

I tried it a few more times and managed to see the error, seems to be a matter of luck with machine speed etc.

viktordick · 2022-03-26T15:51:20Z

Searched a bit and found this, which might provide a framework for somewhat more reliably provoking a race condition in a testing framework.

But if I understand it correctly, the best I can hope for if I do not want to introduce additional control flow into the target function secureModule just for this test, is to at least have all threads started at least to the point right before calling the function and waiting for each other, and then releasing all of them. I guess this will provoke the problem somewhat more reliably, but still not completely consistently.

viktordick · 2022-03-26T16:03:18Z

Thanks for the approval, but could you check if the following test more reliably provokes the problem on your machine?

    def testPublicModuleThreaded(self):
        """
        Import the same module from two threads simultaneously, checking that
        this does not result in a race condition.
        """
        import threading
        lock = threading.Lock()
        num_threads = 2
        all_threads_started = threading.Event()
        threads_may_continue = threading.Event()
        started = []
        finished = []

        def threaded_run():
            with lock:
                started.append(True)
                if len(started) == num_threads:
                    all_threads_started.set()
            threads_may_continue.wait()

            self.assertAuth('AccessControl.tests.public_module', ())
            finished.append(True)

        threads = [
            threading.Thread(target=threaded_run)
            for _ in range(num_threads)
        ]
        [t.start() for t in threads]
        all_threads_started.wait()
        threads_may_continue.set()
        [t.join() for t in threads]

        self.assertEqual(len(finished), 2)

dataflake · 2022-03-26T16:08:30Z

The new code fails maybe 1 out of 4 times in the sandbox here (Python 3.7 on macOS 12.3, MacBook Pro Apple M1 Max, 64GB RAM)

dataflake · 2022-03-26T16:10:19Z

The old test fails more frequently, maybe 1 in 3 times

dataflake · 2022-03-26T16:34:58Z

@viktordick you're free to merge

d-maurer · 2022-03-26T18:15:10Z

Sorry! I just recognized that a `try ... except KeyError: pass` is not sufficient. We must also move it after the `_appliedModuleSecurity[mname] = modsec`. Otherwise, a concurrent thread could run the code after the `del` (and therefore sees `modsec = None`) but before `_applied...` is updated; `secureModule` will (wrongly) return `None` (rather than the module) in this case. Let's get an abstract look at the code and why the changed code is correct. We have two mappings "do_do" (module -> sec_info) (named `_moduleSecurity`) with module security (likely) still to be applied and "done" (module -> sec_info) (named `_appliedModuleSecurity`) with module security already applied. The fixed code (essentially) looks like: ``` modsec = to_do.get("module") if modsec is None: # either no security declarations known or already applied return module if module in done else None else: # security declarations known; probably not yet applied ... apply modsec to module ... done[module] = modsec # it is now safe to remove from `to_do` try: del to_do[module] except KeyError: pass return module ``` Why is this (almost) safe? If we do not have security information for *module*, it will never be in `to_do` and therefore never be in `done`. The module will get rejected. If we have security information for *module*, it will initially be in `to_do`. If the module is actually used, then it will get added to `done` and only then removed from `to_do`. Therefore, it will in this case always be either in `to_do` or in `done` -- and therefore, it will be allowed. This is safe, if applying *modsec* to a module is idempotent (which is very likely true). Why is it only almost safe? The reasoning above assumes that `secureModule` is the only place related to module security prone to potential race conditions. Likely, this is not the case: New modules are added to `to_do` dynamically (when `allow_module` is executed). It is possible that those additions depend on the executed requests. Therefore, it is possible that the allowedness of a module depends from the request order. This can be avoided if all `allow_module` are executed during startup.

viktordick · 2022-03-27T11:49:24Z

OK, if I understand correctly, the info about a module being allowed to be imported is added to _moduleSecurity if allow_module is executed, and once a module is actually used (import ...), this info is removed from _moduleSecurity and the actually imported module is added to _appliedModuleSecurity, correct?

I guess a clean way without race conditions would be to keep the entry in _moduleSecurity and simply add an actually imported module to _appliedModuleSecurity. But this would require a larger rewrite.

The next best fix would be to change the order - first add to _appliedModuleSecurity, then remove from _moduleSecurity - which is your suggestion, correct?

Regarding the "almost safe":
If allow_module is only executed in thread A and not directly at startup before threads are spawned, I would not expect the import to reliably work in another thread B that did not itself execute allow_module and I guess it is unreasonable to expect it. One might expect the constellation where both threads execute allow_module, followed by import to work, but even this might not be a realistic scenario - all examples I know of execute allow_module at startup, before the worker threads are started. Securing this more common scenario should probably still be a win for now.

viktordick · 2022-03-27T11:59:09Z

Somehow I am unable to push another commit to this PR:

> git push
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (7/7), 618 bytes | 618.00 KiB/s, done.
Total 7 (delta 5), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (5/5), completed with 5 local objects.
remote: error: GH006: Protected branch update failed for refs/heads/123-prevent-race-condition.
remote: error: 4 of 4 required status checks are expected.
To github.com:zopefoundation/AccessControl
 ! [remote rejected] 123-prevent-race-condition -> 123-prevent-race-condition (protected branch hook declined)
error: failed to push some refs to 'github.com:zopefoundation/AccessControl'

Maybe this is because the PR is already approved? I will try to push into a separate branch and maybe start a new PR.

d-maurer · 2022-03-27T12:16:25Z

Viktor Dick wrote at 2022-3-27 04:49 -0700:

OK, if I understand correctly, the info about a module being allowed to be imported is added to `_moduleSecurity` if `allow_module` is executed, and once a module is actually used (`import ...`), this info is removed from `_moduleSecurity` and the actually imported module is added to `_appliedModuleSecurity`, correct?

Yes.

I guess a clean way without race conditions would be to keep the entry in `_moduleSecurity` and simply add an actually imported module to `_appliedModuleSecurity`. But this would require a larger rewrite.

You can safely delete *module* from `_moduleSecurityInfo` (with `try: del ... except KeyError: pass`) provided that you do this *after* the addition to `_applied...`. Remaining race conditions do not come from `secureModule` (alone) but from concurrent additions to `_moduleSecurityInfo` (by `allow_module`) and `secureModule` calls. Not changing `_moduleSecurityInfo` in `secureModule` will not help in this case.

The next best fix would be to change the order - first add to `_appliedModuleSecurity`, then remove from `_moduleSecurity` - which is your suggestion, correct?

Yes.

viktordick · 2022-03-27T13:48:14Z

Closing this as the discussed change has been implemented in #125

Prevent race condition in guarded_import

b2d2611

viktordick linked an issue Mar 26, 2022 that may be closed by this pull request

Race condition in secureModule #123

Closed

viktordick enabled auto-merge (squash) March 26, 2022 14:55

viktordick assigned dataflake Mar 26, 2022

viktordick requested a review from dataflake March 26, 2022 15:14

- fix ReST

98ad1a3

dataflake approved these changes Mar 26, 2022

View reviewed changes

viktordick disabled auto-merge March 26, 2022 16:04

viktordick mentioned this pull request Mar 27, 2022

Prevent race condition in guarded_import #125

Merged

viktordick closed this Mar 27, 2022

viktordick deleted the 123-prevent-race-condition branch March 27, 2022 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent race condition in guarded_import #124

Prevent race condition in guarded_import #124

viktordick commented Mar 26, 2022

viktordick commented Mar 26, 2022 •

edited

dataflake commented Mar 26, 2022

viktordick commented Mar 26, 2022

dataflake commented Mar 26, 2022

viktordick commented Mar 26, 2022

dataflake commented Mar 26, 2022

viktordick commented Mar 26, 2022

viktordick commented Mar 26, 2022

dataflake commented Mar 26, 2022

dataflake commented Mar 26, 2022

dataflake commented Mar 26, 2022

d-maurer commented Mar 26, 2022 via email

viktordick commented Mar 27, 2022

viktordick commented Mar 27, 2022

d-maurer commented Mar 27, 2022 via email

viktordick commented Mar 27, 2022

Prevent race condition in guarded_import #124

Prevent race condition in guarded_import #124

Conversation

viktordick commented Mar 26, 2022

viktordick commented Mar 26, 2022 • edited

dataflake commented Mar 26, 2022

viktordick commented Mar 26, 2022

dataflake commented Mar 26, 2022

viktordick commented Mar 26, 2022

dataflake commented Mar 26, 2022

viktordick commented Mar 26, 2022

viktordick commented Mar 26, 2022

dataflake commented Mar 26, 2022

dataflake commented Mar 26, 2022

dataflake commented Mar 26, 2022

d-maurer commented Mar 26, 2022 via email

viktordick commented Mar 27, 2022

viktordick commented Mar 27, 2022

d-maurer commented Mar 27, 2022 via email

viktordick commented Mar 27, 2022

viktordick commented Mar 26, 2022 •

edited