Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiProcessQueue crashes on KeyboardInterrupt #72

Open
tsillus opened this issue Jan 24, 2023 · 5 comments
Open

MultiProcessQueue crashes on KeyboardInterrupt #72

tsillus opened this issue Jan 24, 2023 · 5 comments

Comments

@tsillus
Copy link

tsillus commented Jan 24, 2023

try:
# Unfortunately, the Queue object is not signal-safe,
# so a frequent wakeup is needed to check
# _checkChildren and _shutdownSignalled.
rcvd = self._myInputQ.get(True,
min(run_time_f().view(ct).remainingSeconds() or
QUEUE_CHECK_PERIOD,
QUEUE_CHECK_PERIOD))
except Q.Empty:
if not self._checkChildren and not self._shutdownSignalled:
# Probably a timeout, but let the while loop decide for sure
continue
rcvd = 'BuMP'
if rcvd == 'BuMP':

After a KeyboardInterrupt it's not possible to get Actors from the actor system or send messages to it, so the system can't be shut down properly any more.

Stacktrace

Process SpawnProcess-1:1:
Process ThespianAdmin:
Traceback (most recent call last):
  File "D:\python\python37\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "D:\python\python37\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "D:\~\thespian\system\multiprocCommon.py", line 210, in startAdmin
    admin.run()
  File "D:\~\thespian\system\admin\convention.py", line 758, in run
    delay.remaining())
  File "D:\~\thespian\system\transport\wakeupTransportBase.py", line 71, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "D:\~\thespian\system\transport\wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "D:\~\thespian\system\transport\MultiprocessQueueTransport.py", line 697, in _runWithExpiry
    lambda s=self: s.run_time)
  File "D:\~\thespian\system\transport\MultiprocessQueueTransport.py", line 402, in core_receive
    return self.core_common_receive(incoming_handler, my_address, run_time_f)[1]
  File "D:\~\thespian\system\transport\MultiprocessQueueTransport.py", line 218, in core_common_receive
    QUEUE_CHECK_PERIOD))
  File "D:\python\python37\lib\multiprocessing\queues.py", line 104, in get
    if not self._poll(timeout):
  File "D:\python\python37\lib\multiprocessing\connection.py", line 257, in poll
    return self._poll(timeout)
  File "D:\python\python37\lib\multiprocessing\connection.py", line 330, in _poll
    return bool(wait([self], timeout))
  File "D:\python\python37\lib\multiprocessing\connection.py", line 869, in wait
    ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
  File "D:\python\python37\lib\multiprocessing\connection.py", line 801, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)
KeyboardInterrupt

Possible solution:

add

except KeyboardInterrupt:
    continue

to ignore KeyboardInterrupts and keep the system running, so the thespian user can shut down the system properly. This change worked for me. Not sure if thespian would have to do any other stuff to do in this case or if there is a better place to catch/avoid the Exception.

@kquick
Copy link
Member

kquick commented Jan 24, 2023

For the multiprocQueueBase that you are using, I would prefer to leave process-global issues like the handling of KeyboardInterrupt up to the application, since this is a policy decision that could be addressed in multiple different ways depending on the application's preference.

For the typical case, I'd recommend a try/finally block in your application:

  asys = ActorSystem.ActorSystem(base='multiprocQueueBase', ...)
  try:
      [... main code here ...]
  finally:
    asys.shutdown()

I believe this should be sufficient to allow a normal Thespian shutdown on a keyboard interrupt.

If this is satisfactory, please go ahead and close this issue, but if I've misunderstood or there are additional concerns, please feel free to follow up here.

@tsillus
Copy link
Author

tsillus commented Jan 25, 2023

I 100% agree that the application should handle the KeyboardInterrupt. That's why I opened this issue.

I use thespian behind FastAPI, and my code for startup and shutdown looks something like this:

app = FastAPI()

@app.on_event('startup')
def on_startup():
    asys = ActorSystem(...)
    logger = asys.createActor(..., globalName='logger')
    asys.tell(logger, LoggerInitializeMessage(...))


@app.on_event('shutdown')
def on_shutdown():
    asys = ActorSystem()
    logger = asys.createActor(..., globalName='logger')  # never returns.
    asys.shutdown() # times out after 10 seconds. ActorExitRequest() never reach the Actors

A KeyboardInterrupt triggers FastAPIs shutdown event, but before on_shutdown() is executed, MultiProcessQueue crashes because of the very same KeyboardInterrupt. This has the consequence that I can't shut down the ActorSystem properly, because messages within the ActorSystem don't seem to reach their targets any more.

With the suggested fix on_shutdown() works just as I would expect it to.

@kquick
Copy link
Member

kquick commented Jan 26, 2023

Sorry, I'm not clear on "With the suggested fix ...". Did you mean the fix I suggested, or the fix you suggested in your original message, and if the latter, where was it that you inserted that fix?

Also, can you provide a thespian.log (see Section 13.3 in https://thespianpy.com/doc/using.html#hH-ce55494c-dd7a-4258-a1e8-b090c3bbb1e6) from a short run where you Ctrl-C immediately after startup but without processing any traffic?

@tsillus
Copy link
Author

tsillus commented Jan 27, 2023

I was referring to the fix I suggested.

Here is the thespian.log. I only snipped some file paths:

2023-01-27 10:26:15.873589 p27796 Warn Actor my_project.actors.LoggerAgent @ ActorAddr-Q.ThespianQ.b got signal: 2
2023-01-27 10:26:15.931655 p27796 ERR  Actor my_project.actors.LoggerAgent @ ActorAddr-Q.ThespianQ.b transport run exception: Traceback (most recent call last):
  File "D:\~\.venv37\lib\site-packages\thespian\system\actorManager.py", line 88, in run
    r = self.transport.run(self.handleMessages)
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\wakeupTransportBase.py", line 71, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 697, in _runWithExpiry
    lambda s=self: s.run_time)
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 402, in core_receive
    return self.core_common_receive(incoming_handler, my_address, run_time_f)[1]
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 218, in core_common_receive
    QUEUE_CHECK_PERIOD))
  File "D:\python\python37\lib\multiprocessing\queues.py", line 104, in get
    if not self._poll(timeout):
  File "D:\python\python37\lib\multiprocessing\connection.py", line 257, in poll
    return self._poll(timeout)
  File "D:\python\python37\lib\multiprocessing\connection.py", line 330, in _poll
    return bool(wait([self], timeout))
  File "D:\python\python37\lib\multiprocessing\connection.py", line 869, in wait
    ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
  File "D:\python\python37\lib\multiprocessing\connection.py", line 801, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)
InterruptedError: [Errno 4] Interrupted function call

2023-01-27 10:26:25.959071 p39380 ERR  No response to Admin shutdown request; Actor system not completely shutdown

@tsillus
Copy link
Author

tsillus commented Jan 27, 2023

Now with log level DEBUG:

2023-01-27 12:07:21.788259 p42676 dbg  ++++ Starting Admin from D:\~\.venv37\lib\site-packages\thespian\__init__.py
2023-01-27 12:07:21.789777 p42676 I    ++++ Admin started @ ActorAddr-Q.ThespianQ / gen (3, 10)
2023-01-27 12:07:22.674982 p42676 dbg  Admin of ReceiveEnvelope(from: ActorAddr-Q.ThespianQ.a, <class 'thespian.system.messages.multiproc.LoggerConnected'> msg: <thespian.system.messages.multiproc.LoggerConnected object at 0x000002202D084948>)
2023-01-27 12:07:22.676050 p42676 dbg  actualTransmit of TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999896-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x000002202D0849C8>-quit_0:04:59.999871)
2023-01-27 12:07:22.679150 p31436 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999918-<class 'thespian.system.messages.admin.QueryExists'>-<thespian.system.messages.admin.QueryExists object at 0x0000018DB00C2E08>-quit_0:04:59.999898)
2023-01-27 12:07:22.680112 p42676 dbg  Admin of ReceiveEnvelope(from: ActorAddr-Q.~, <class 'thespian.system.messages.admin.QueryExists'> msg: <thespian.system.messages.admin.QueryExists object at 0x000002202D084D48>)
2023-01-27 12:07:22.680112 p42676 dbg  Attempting intent TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999978-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x000002202D0848C8>-quit_0:04:59.999960)
2023-01-27 12:07:22.681064 p42676 dbg  actualTransmit of TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999542-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x000002202D0848C8>-quit_0:04:59.999529)
2023-01-27 12:07:22.682124 p31436 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999921-<class 'thespian.system.messages.admin.PendingActor'>-PendingActor#1_of_Noneis"process_logger"-quit_0:04:59.999901)
2023-01-27 12:07:22.683116 p42676 dbg  Admin of ReceiveEnvelope(from: ActorAddr-Q.~1, <class 'thespian.system.messages.admin.PendingActor'> msg: PendingActor#1_of_Noneis"process_logger")
2023-01-27 12:07:22.683116 p42676 I    Pending Actor request received for my_project.actors.logger.actor.LoggerAgent reqs None from ActorAddr-Q.~1
2023-01-27 12:07:23.474723 p45660 I    Starting Actor my_project.actors.logger.actor.LoggerAgent at ActorAddr-Q.ThespianQ.b (parent ActorAddr-Q.ThespianQ, admin ActorAddr-Q.ThespianQ, srcHash None)
2023-01-27 12:07:23.475666 p45660 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999909-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x000001A71D7CE808>-quit_0:04:59.999885)
2023-01-27 12:07:23.476686 p42676 dbg  Attempting intent TransportIntent(ActorAddr-Q.~1-pending-ExpiresIn_0:04:59.999977-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 1) errCode None actual ActorAddr-Q.ThespianQ.b-quit_0:04:59.999959)
2023-01-27 12:07:23.477749 p45660 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999903-<class 'logging.LogRecord'>-<LogRecord: my_project.actors.logger.actor, 10, D:\~\my_project\actors\logger\actor.py, 55, "Logg...-quit_0:04:59.999887)
2023-01-27 12:07:23.477749 p42676 dbg  actualTransmit of TransportIntent(ActorAddr-Q.~1-pending-ExpiresIn_0:04:59.999017-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 1) errCode None actual ActorAddr-Q.ThespianQ.b-quit_0:04:59.998997)
2023-01-27 12:07:23.478738 p31436 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.b-pending-ExpiresIn_0:04:59.999756-<class 'my_project.actors.logger.messages.ConfigMessage'>-ConfigMessage(config=<my_project.core.config.Config object at 0x0000018DADB5B508>)-quit_0:04:59.999742)
2023-01-27 12:07:23.479708 p31436 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999951-<class 'thespian.system.messages.admin.PendingActor'>-PendingActor#2_of_Noneis"process_logger"-quit_0:04:59.999935)
2023-01-27 12:07:23.480759 p42676 dbg  Admin of ReceiveEnvelope(from: ActorAddr-Q.~2, <class 'thespian.system.messages.admin.PendingActor'> msg: PendingActor#2_of_Noneis"process_logger")
2023-01-27 12:07:23.481766 p42676 dbg  Attempting intent TransportIntent(ActorAddr-Q.~2-pending-ExpiresIn_0:04:59.999987-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 2) errCode None actual ActorAddr-Q.ThespianQ.b-quit_0:04:59.999974)
2023-01-27 12:07:23.481766 p45660 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-Q.~, <class 'my_project.actors.logger.messages.ConfigMessage'> msg: ConfigMessage(config=<my_project.core.config.Config object at 0x000001A7220FC2C8>))
2023-01-27 12:07:23.482695 p42676 dbg  actualTransmit of TransportIntent(ActorAddr-Q.~2-pending-ExpiresIn_0:04:59.999428-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 2) errCode None actual ActorAddr-Q.ThespianQ.b-quit_0:04:59.999418)
2023-01-27 12:07:23.482695 p45660 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999929-<class 'logging.LogRecord'>-<LogRecord: my_project.actors.logger.actor, 30, D:\~\my_project\actors\logger\actor.py, 152, "Log...-quit_0:04:59.999908)
2023-01-27 12:07:23.483696 p45660 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999963-<class 'logging.LogRecord'>-<LogRecord: my_project.actors.logger.actor, 10, D:\~\my_project\actors\logger\actor.py, 62, "Logg...-quit_0:04:59.999952)
2023-01-27 12:07:23.483696 p31436 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.b-pending-ExpiresIn_0:04:59.999753-<class 'my_project.actors.logger.messages.ConfigMessage'>-ConfigMessage(config=<my_project.core.config.Config object at 0x0000018DADB5B508>)-quit_0:04:59.999734)
2023-01-27 12:07:23.485704 p45660 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-Q.~, <class 'my_project.actors.logger.messages.ConfigMessage'> msg: ConfigMessage(config=<my_project.core.config.Config object at 0x000001A7220FC288>))
2023-01-27 12:07:23.485704 p45660 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999951-<class 'logging.LogRecord'>-<LogRecord: my_project.actors.logger.actor, 30, D:\~\my_project\actors\logger\actor.py, 152, "Log...-quit_0:04:59.999938)
2023-01-27 12:07:28.832438 p45660 Warn Actor my_project.actors.logger.actor.LoggerAgent @ ActorAddr-Q.ThespianQ.b got signal: 2
2023-01-27 12:07:28.897457 p45660 ERR  Actor my_project.actors.logger.actor.LoggerAgent @ ActorAddr-Q.ThespianQ.b transport run exception: Traceback (most recent call last):
  File "D:\~\.venv37\lib\site-packages\thespian\system\actorManager.py", line 88, in run
    r = self.transport.run(self.handleMessages)
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\wakeupTransportBase.py", line 71, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 697, in _runWithExpiry
    lambda s=self: s.run_time)
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 402, in core_receive
    return self.core_common_receive(incoming_handler, my_address, run_time_f)[1]
  File "D:\~\.venv37\lib\site-packages\thespian\system\transport\MultiprocessQueueTransport.py", line 218, in core_common_receive
    QUEUE_CHECK_PERIOD))
  File "D:\python\python37\lib\multiprocessing\queues.py", line 104, in get
    if not self._poll(timeout):
  File "D:\python\python37\lib\multiprocessing\connection.py", line 257, in poll
    return self._poll(timeout)
  File "D:\python\python37\lib\multiprocessing\connection.py", line 330, in _poll
    return bool(wait([self], timeout))
  File "D:\python\python37\lib\multiprocessing\connection.py", line 869, in wait
    ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
  File "D:\python\python37\lib\multiprocessing\connection.py", line 801, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)
InterruptedError: [Errno 4] Interrupted function call

2023-01-27 12:07:28.898457 p45660 dbg  Attempting intent TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999974-<class 'thespian.actors.ChildActorExited'>-ChildActorExited:ActorAddr-Q.ThespianQ.b-quit_0:04:59.999950)
2023-01-27 12:07:28.899457 p45660 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.998867-<class 'thespian.actors.ChildActorExited'>-ChildActorExited:ActorAddr-Q.ThespianQ.b-quit_0:04:59.998849)
2023-01-27 12:07:28.899457 p45660 dbg  Run my_project.actors.logger.actor.LoggerAgent done
2023-01-27 12:07:29.541332 p31436 dbg  actualTransmit of TransportIntent(ActorAddr-/ActorSys-pending-ExpiresIn_0:04:59.999935-<class 'thespian.actors.ActorExitRequest'>-ActorExitRequest-quit_0:04:59.999918)
2023-01-27 12:07:29.541332 p31436 I    ActorSystem shutdown requested.
2023-01-27 12:07:29.542286 p31436 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999943-<class 'thespian.system.messages.admin.SystemShutdown'>-<thespian.system.messages.admin.SystemShutdown object at 0x0000018DB03DBDC8>-quit_0:04:59.999929)
2023-01-27 12:07:39.544817 p31436 ERR  No response to Admin shutdown request; Actor system not completely shutdown
2023-01-27 12:07:39.546833 p31436 I    ActorSystem shutdown complete.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants