-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
multiprocessing.get_logger() logger deadlock on first call by subprocess to logger.info("...") due to internal logger.debug(...) call by multiprocessing.Queue._start_thread #91555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I reproduced in 3.11 latest version on MacOS, both the lockup and success when setting higher debug level. |
see also: |
minimal-mpq-repro.pyimport logging.handlers
import multiprocessing
queue = multiprocessing.Queue()
handler = logging.handlers.QueueHandler(queue)
logger = multiprocessing.get_logger()
logger.addHandler(handler)
logger.setLevel("DEBUG")
logger.debug("deadlock") Queue's have a
We could make the lock recursive, however that will just turn the deadlock into a stack overflow. At present, while a handler is processing a log record for a logger, it must not log anything to that same logger. Arguably this is fair enough: that seems like an inherently bad idea, and perhaps this issue (and #90321) should just be closed as a user error. In fact there is a warning about exactly these sort of possible deadlocks in the As an alternative, we could relatively easily eliminate this whole class of bug by temporarily disabling logging for a given logger while its handlers are being called. I'll push a PR for consideration shortly. Either way, I don't think this is a multiprocessing bug per se. |
Prevent the possibility of re-entrancy and deadlock or infinite recursion caused by logging triggered by logging by disabling logging while the logger is handling log messages.
What value does that internal |
I can't comment on the value of that logging but removing it will indeed fix this issue and #90321. In this case, since I've done a quick code inspection of the other standard logging handlers and I don't think any of them have the same issue, although some do fairly involved things like making HTTP requests. More widely, custom handlers calling into third-party code may run into this issue and either having latent bugs or need tricky work-arounds. E.g. see the Sentry logging integration code: If folks think removing the logging from |
Well,
I only had a quick look, but it seems to just sidestep their own internal logging because it would cause recursion errors.
I would suggest opening a |
OK, will do, thanks!
As it happens I was looking at a different bug in multiprocessing yesterday, and with this fresh in my mind used the built-in logging: it was very useful. It is also a publicly documented part of the module's API. Removing it doesn't feel appropriate to me, but let's see what other people think. |
If not completely removing it, then perhaps consider keeping the fix in |
I disagree that the fix belongs in More over, this is just one example of a whole class of potential bugs, which I think we can and should prevent entirely by making the logging framework more robust, as per the PR. |
Sorry, which lower-level data structure do you mean, and which unrelated module? Sorry if I'm being dense. |
Not at all! "Lower-level" was inaccurate and a poor choice of words on my part, sorry. Let me see if I can explain myself better.
So, I don't think we should treat it as a user error, but I don't think it is a As another example, an external logging handler which sends log messages to an HTTP endpoint will probably be implemented using one of the existing, mature external HTTP modules such as In my opinion, the best and most practical way forward is to deal with this in the logging core itself. My preference is to just drop any such recursive logging (this potentially loses useful log messages). Another option would be to store and send them separately, later (this doesn't work if sending a log message produces another log message, ad infinitum). |
With these changes: diff --git a/Lib/multiprocessing/queues.py b/Lib/multiprocessing/queues.py
index 925f0439000..73f1535581b 100644
--- a/Lib/multiprocessing/queues.py
+++ b/Lib/multiprocessing/queues.py
@@ -70,7 +70,7 @@ def _reset(self, after_fork=False):
if after_fork:
self._notempty._at_fork_reinit()
else:
- self._notempty = threading.Condition(threading.Lock())
+ self._notempty = threading.Condition(threading.RLock())
self._buffer = collections.deque()
self._thread = None
self._jointhread = None
diff --git a/Lib/multiprocessing/util.py b/Lib/multiprocessing/util.py
index b7192042b9c..5d54a3bcb79 100644
--- a/Lib/multiprocessing/util.py
+++ b/Lib/multiprocessing/util.py
@@ -35,33 +35,35 @@
INFO = 20
SUBWARNING = 25
+META_LOGGER_NAME = 'multiprocessing.meta'
LOGGER_NAME = 'multiprocessing'
DEFAULT_LOGGING_FORMAT = '[%(levelname)s/%(processName)s] %(message)s'
_logger = None
+_meta_logger = None
_log_to_stderr = False
def sub_debug(msg, *args):
- if _logger:
- _logger.log(SUBDEBUG, msg, *args, stacklevel=2)
+ if _meta_logger:
+ _meta_logger.log(SUBDEBUG, msg, *args, stacklevel=2)
def debug(msg, *args):
- if _logger:
- _logger.log(DEBUG, msg, *args, stacklevel=2)
+ if _meta_logger:
+ _meta_logger.log(DEBUG, msg, *args, stacklevel=2)
def info(msg, *args):
- if _logger:
- _logger.log(INFO, msg, *args, stacklevel=2)
+ if _meta_logger:
+ _meta_logger.log(INFO, msg, *args, stacklevel=2)
def sub_warning(msg, *args):
- if _logger:
- _logger.log(SUBWARNING, msg, *args, stacklevel=2)
+ if _meta_logger:
+ _meta_logger.log(SUBWARNING, msg, *args, stacklevel=2)
def get_logger():
'''
Returns logger used by multiprocessing
'''
- global _logger
+ global _logger, _meta_logger
import logging
with logging._lock:
@@ -70,6 +72,9 @@ def get_logger():
_logger = logging.getLogger(LOGGER_NAME)
_logger.propagate = 0
+ _meta_logger = logging.getLogger(META_LOGGER_NAME)
+ _meta_logger.propagate = 0
+
# XXX multiprocessing should cleanup before logging
if hasattr(atexit, 'unregister'):
atexit.unregister(_exit_function)
the hang or stack overflow does not occur in
What problems do you see with implementing my suggested changes in Update: tweaked diff to add a missing |
Also, please add a link here to the Never mind, found it: https://discuss.python.org/t/dealing-with-unbounded-recursion-in-logging-handlers/86365 |
Once again, as I see it,
If we decide the user is responsible for configuring logging correctly in order to avoid these issues (i.e. we treat this issue as a "user error") we don't need to do anything, except maybe adding a note to the docs.
It will potentially break existing code that configures the current logger and wants those logs. It doesn't prevent the bug, it just alters the configuration required to trigger it. |
Then we can leave things as they are. Your proposed solution (disabling the logger while handlers are active) seems to potentially leave the logger disabled for a long period of time (indeterminate, depending on what other handlers are configured). Certain handlers (such as
Well, in this case Ultimately, the reason for this problem seems to be that the
But surely it prevents the bug in the simple configurations described in the example reproducers here and in #90321? As things are, they wouldn't get the logs anyway, because of the deadlock problem, right? A number of people who would currently run into the bug would do so no longer, so couldn't that could be considered an improvement? If the configuration required to trigger is more complicated, then someone would have to go out of their way to trigger it, right? What would be the use case for such configuration? I'm not actually proposing to make the changes in the above diff - it's only to show that other approaches are possible and to spark discussion. |
For sure we can. I think we can do better, but I could be wrong, and it isn't a big deal. Treating it as user error is a perfectly reasonable way to resolve the issue.
It disables logging on a per-thread basis and only while the thread runs the message handlers (+filters). Which is to say, it is disabled only while running code that would trigger this bug if it logged a message. Or at least, that is my intent: if I've overlooked anything or there is a bug, please let me know! It doesn't stop other threads from logging to the same logger and/or handlers.
If we cannot find on an acceptable solution to prevent this in the code then I will indeed prepare a documentation patch. I remain optimistic we can prevent it, however :-)
The way I've been thinking of it is: if some code is sending logs to a
Understood, and thanks. I find the feedback very valuable, from you and others. For now, however, I still think suppressing the logging while running the log handling code is the best solution. |
Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages.
…GH-133898) Co-authored-by: Duane Griffin <duaneg@dghda.com>
…GH-133899) Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages. (cherry picked from commit 2561e14) Co-authored-by: Duane Griffin <duaneg@dghda.com>
A Python bug (python/cpython#91555) was preventing logs from the embeddings maintainer process from printing. The bug is fixed in Python 3.14, but a viable workaround is to use the multiprocessing Manager, which better manages mp queues and causes the logging to work correctly.
* use mp Manager to handle logging queues A Python bug (python/cpython#91555) was preventing logs from the embeddings maintainer process from printing. The bug is fixed in Python 3.14, but a viable workaround is to use the multiprocessing Manager, which better manages mp queues and causes the logging to work correctly. * consolidate * fix typing
It used to be that when a logger encountered an error trying to handle / emit a message, it would attempt to log about the error via the same logger. Long ago, this could lead to infinite recursion in Swift's LoggerFileObject, so we fixed it and added a test that included showing the bad stdlib behavior. Recently, stdlib did a similar fix so the recursion doesn't happen. See - python/cpython#91555 and - python/cpython#131812 Of course, now our test breaks (at least, on 3.13.4). Relax the assertion a little. Related-Change: Ia9ecffc88ce43616977e141498e5ee404f2c29c4 Change-Id: Id2f490d4204b8eaf07857cb84ed783bec19b8511
* Update swift from branch 'master' to 28db9bbcddc966ca69fe64604417a82a46e1c918 - tests: Fix test_LoggerFileObject_recursion It used to be that when a logger encountered an error trying to handle / emit a message, it would attempt to log about the error via the same logger. Long ago, this could lead to infinite recursion in Swift's LoggerFileObject, so we fixed it and added a test that included showing the bad stdlib behavior. Recently, stdlib did a similar fix so the recursion doesn't happen. See - python/cpython#91555 and - python/cpython#131812 Of course, now our test breaks (at least, on 3.13.4). Relax the assertion a little. Related-Change: Ia9ecffc88ce43616977e141498e5ee404f2c29c4 Change-Id: Id2f490d4204b8eaf07857cb84ed783bec19b8511
* use mp Manager to handle logging queues A Python bug (python/cpython#91555) was preventing logs from the embeddings maintainer process from printing. The bug is fixed in Python 3.14, but a viable workaround is to use the multiprocessing Manager, which better manages mp queues and causes the logging to work correctly. * consolidate * fix typing
* Set runtime * Use count correctly * Don't assume camera sizes * Use separate zmq proxy for object detection * Correct order * Use forkserver * Only store PID instead of entire process reference * Cleanup * Catch correct errors * Fix typing * Remove before_run from process util The before_run never actually ran because: You're right to suspect an issue with before_run not being called and a potential deadlock. The way you've implemented the run_wrapper using __getattribute__ for the run method of BaseProcess is a common pitfall in Python's multiprocessing, especially when combined with how multiprocessing.Process works internally. Here's a breakdown of why before_run isn't being called and why you might be experiencing a deadlock: The Problem: __getattribute__ and Process Serialization When you create a multiprocessing.Process object and call start(), the multiprocessing module needs to serialize the process object (or at least enough of it to re-create the process in the new interpreter). It then pickles this serialized object and sends it to the newly spawned process. The issue with your __getattribute__ implementation for run is that: run is retrieved during serialization: When multiprocessing tries to pickle your Process object to send to the new process, it will likely access the run attribute. This triggers your __getattribute__ wrapper, which then tries to bind run_wrapper to self. run_wrapper is bound to the parent process's self: The run_wrapper closure, when created in the parent process, captures the self (the Process instance) from the parent's memory space. Deserialization creates a new object: In the child process, a new Process object is created by deserializing the pickled data. However, the run_wrapper method that was pickled still holds a reference to the self from the parent process. This is a subtle but critical distinction. The child's run is not your wrapped run: When the child process starts, it internally calls its own run method. Because of the serialization and deserialization process, the run method that's ultimately executed in the child process is the original multiprocessing.Process.run or the Process.run if you had directly overridden it. Your __getattribute__ magic, which wraps run, isn't correctly applied to the Process object within the child's context. * Cleanup * Logging bugfix (#18465) * use mp Manager to handle logging queues A Python bug (python/cpython#91555) was preventing logs from the embeddings maintainer process from printing. The bug is fixed in Python 3.14, but a viable workaround is to use the multiprocessing Manager, which better manages mp queues and causes the logging to work correctly. * consolidate * fix typing * Fix typing * Use global log queue * Move to using process for logging * Convert camera tracking to process * Add more processes * Finalize process * Cleanup * Cleanup typing * Formatting * Remove daemon --------- Co-authored-by: Josh Hawkins <32435876+hawkeye217@users.noreply.github.com>
Uh oh!
There was an error while loading. Please reload this page.
Python: v3.10.2
Platform: Windows 10
Description: A deadlock occurs when a subprocess uses a logging.Logger returned multiprocessing.get_logger() where the initial logging level at the time of the first call to logger.info is level DEBUG or lower, which causes, during that same initial call to logger.info, an internal call to logger.debug(...) (at top of multiprocessing.Queue._start_thread), all of which causes the same thread to attempt to re-acquire an already held lock.
Workaround: Set logging level to INFO or higher (something above DEBUG) to prevent the logging.debug() statement at the top of Queue._start_thread from attempting to init/lock while init is already in progress.
The following example exhibits the issue when SHOW_THE_DEADLOCK==True. Set SHOW_THE_DEADLOCK==False to observe the workaround.
The following is an annotated stack from the above example when SHOW_THE_DEADLOCK==True.
Linked PRs
The text was updated successfully, but these errors were encountered: