Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in AsyncResult.wait #492

Closed
wants to merge 2 commits into from

Conversation

notEvil
Copy link

@notEvil notEvil commented May 15, 2022

Fixes #354 (and maybe more)

  • The receive lock is extended in both directions to cover AsyncResult.wait
  • data is split to allow lock release as soon as possible

I can consistently reproduce issue 354 and provide an example if requested.

@comrumino
Copy link
Collaborator

I think this PR undoes the fix for deadlocks: Issue #449 related to merge request #455

@comrumino comrumino self-assigned this May 17, 2022
@comrumino comrumino added the Triage Investigation by a maintainer has started label May 17, 2022
@notEvil
Copy link
Author

notEvil commented May 17, 2022

True. However, #463 removed the .wait(), which was the core of #455, so the event was actually equivalent to a simple bool. #455 on the other hand only solved the race condition half way, even indefinitely locking the thread. serve returning True doesn't mean the result is ready, the thread shouldn't simply wait for the result but serve as long as it isn't there (the reason of #463). That said, I strongly believe this PR covers their concerns.

@comrumino
Copy link
Collaborator

I reviewed your changes they weren't quite there. The rlock count was not zero when expected. I pushed a fix that I got to after reviewing your code to master. Lmk your thoughts.

@notEvil
Copy link
Author

notEvil commented May 18, 2022

Thanks, I checked the commit and think that its not covering the issue entirely. The race is still on when a client sends an async request and wait for it later on. Also the core change in AsyncResult introduces a busy loop (234f253#r73909244).

@comrumino
Copy link
Collaborator

I can consistently reproduce issue 354 and provide an example if requested.

Could you add the example as a comment here or in #354

@notEvil
Copy link
Author

notEvil commented May 19, 2022

Run the following with two arguments. The first set to "True" enables ssl which is vital (couldn't reproduce without). The second set to "True" adds a delay which triggers the condition every single time on my setup.

"""
arguments: use_ssl wait
use_ssl == 'True' or != 'True'
wait    == 'True' or != 'True'
"""
KEY = "./localhost.key"
CERT = "./localhost.pem"
VERIFY_CLIENT = (
    False  # False for master because arguments are incorrectly passed to .wrap_socket
)

import rpyc
import rpyc.utils.authenticators as ru_authenticators
import logging
import signal
import ssl
import subprocess
import sys
import time


PORT = 18812

logging.basicConfig(
    level="DEBUG",
    format="{asctime} | {levelname:8} | {message} ({name}, {threadName}, {process})",
    style="{",
)


class Service(rpyc.Service):
    pass


if len(sys.argv) == 3:  # start server and client subprocess
    server_process = subprocess.Popen(
        [sys.executable, __file__, "False"] + sys.argv[1:]
    )
    logging.info("waiting for server to start")
    time.sleep(1)
    client_process = subprocess.Popen([sys.executable, __file__, "True"] + sys.argv[1:])

    client_process.wait()
    server_process.send_signal(signal.SIGINT)
    server_process.wait()

else:
    _, as_client, use_ssl, wait = sys.argv
    as_client = as_client == "True"
    use_ssl = use_ssl == "True"
    wait = wait == "True"

    if as_client:
        connection = (
            rpyc.ssl_connect(
                "localhost",
                PORT,
                keyfile=KEY,
                certfile=CERT,
                ca_certs=CERT if VERIFY_CLIENT else None,
                cert_reqs=ssl.CERT_REQUIRED if VERIFY_CLIENT else None,
            )
            if use_ssl
            else rpyc.connect("localhost", PORT)
        )
        rpyc.BgServingThread(connection)
        if wait:
            logging.info("wait")
            time.sleep(1)
        logging.debug("get root")
        connection.root

    else:
        authenticator = ru_authenticators.SSLAuthenticator(
            keyfile=KEY,
            certfile=CERT,
            ca_certs=CERT if VERIFY_CLIENT else None,
            cert_reqs=ssl.CERT_REQUIRED if VERIFY_CLIENT else None,
        )
        server = rpyc.ThreadedServer(
            Service(), port=PORT, authenticator=authenticator if use_ssl else None
        )
        try:
            server.start()

        except KeyboardInterrupt:
            pass

@comrumino
Copy link
Collaborator

The functional differences have been committed to the master branch. Since this PR does not address the underlying issue of RPyC not respecting threading context/frames in the remote address space, I am closing the PR. Feel free to making threading a first class concept in a new PR. Thanks for all your time and effort!

@comrumino comrumino closed this Jun 15, 2022
@notEvil
Copy link
Author

notEvil commented Jun 16, 2022

Did you try the script I posted? It still waits indefinitely on connection.root. Sure, it wouldn't if there wasn't another thread receiving (e.g. without rpyc.BgServingThread(connection)) and a timeout would ease the pain, but essentially (any-class) multithreading is still broken by this race.

I understand that my fix is lacking simplicity/readability and extending the lock is not without cost, but I still think there is no way around it (except changing the approach entirely like notEvil@46b49e3).

comrumino added a commit that referenced this pull request Jul 2, 2022
…t significantly more likely that the sync request response goes to the correct thread, calling acquire/release here seems to cause a dead lock in some cases like that provided in #492; ultimately RPyC requires the design around threading and locks to be fleshed out... quite a number of users get confused w/ RPyC threading... RPyC offers a number of functions/classes that are temperamental w/ threading
comrumino added a commit that referenced this pull request Jul 2, 2022
…e fixing perilous corner cases like a thread never calling notify_all #492
@comrumino
Copy link
Collaborator

@notEvil, I was able reproduce the issue with your script. Over the last eight commits, the issue would toggle which test case would pass and which would fail.

Here are my findings over the last eight commits:

I greatly appreciate your time and energy!

@notEvil
Copy link
Author

notEvil commented Jul 2, 2022

Great to hear that, and thank you for the effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage Investigation by a maintainer has started
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition when sending messages from another thread
2 participants