Skip to content

bridge-v0.3.0

Choose a tag to compare

@github-actions github-actions released this 17 Apr 13:43
· 85 commits to master since this release

Highlights

Bridge now self-recovers from agent-submitted deadloops — both execute_code REPL loops and pure-Python deadloops inside yade_execute_task can be force-aborted without a bridge restart.

What's new

  • execute_code timeout splits into two statuses:
    • terminated — async-exc abort succeeded, pump is free, YADE state may be partially modified by code that ran before the abort fired
    • timeout — abort failed (stuck in C extension or nested boost::python frame); the bridge may still be blocked
  • interrupt_task gains async-exc path that kills pure-Python task deadloops the flag/PyRunner-tick path cannot reach (e.g. a while True: x += 1 task with no O.run on the stack).
  • New response fields: method (flag_only / flag_and_async_exc / async_exc / stuck_in_c), async_exc_skipped_reason, namespace_preserved, continuation_hint.

Architecture changes

  • Task scripts now run on dedicated script-<task_id> daemon threads instead of the shared main_executor pump queue. An O.run(wait=True) no longer starves execute_code for the task's lifetime.
  • PyRunner tick no longer pumps main_executor. execute_code is confined to mcp-task-pump, which is_safe_to_async_raise accepts. Dummy-N boost::python frames are refused (injection there escapes to C++ → YADE FATAL).
  • MainThread is now accepted as an injection target. In Qt mode the pump tick runs on MainThread, and the stuck body is always deep inside our own QTimer → _process_tick → process_tasks → _execute_code stack — never in the Qt event loop itself.
  • except TaskInterrupt cleanup calls O.pause() + O.wait() so the next task inherits a clean sim state.
  • Atomic unregister_exec_thread before fire_async_exception — a repeat interrupt_task call sees tid=None and becomes a no-op, so the script thread's cleanup runs without being re-interrupted.

Bug fix

  • _execute_code no longer overwrites _current_task_id. Previously a timed-out REPL's own interrupt flag would be misread by PyRunner tick's no-arg is_interrupt_requested() as a task interrupt → O.pause() → the hooked O.run raised InterruptedError → the enclosing script task was spuriously marked interrupted.

Tests

37 new tests (137 → 174 passing). Highlights:

  • test_main_thread_is_accepted — regression guard: prevents a future reader from reinstating the Qt guard that would strand the pump forever.
  • test_execute_code_does_not_clobber_current_task_id — regression guard: pins the _current_task_id invariant above.
  • test_second_interrupt_is_noop_on_async_exc_path — regression guard: prevents double-inject from re-interrupting cleanup.
  • Full SetAsyncExc safety guard coverage (Dummy-N refusal, dead-thread detection, MainThread acceptance).
  • Real-thread end-to-end injection via TestHandleInterruptTask::test_interrupt_running_task_with_registered_thread_fires_async_exc.

Recommended pattern for long-running tasks

Agents now have a recoverable path for force-aborting a task whose script went into a deadloop:

```

1. Try graceful first

yade_interrupt_task(task_id)

2. Inspect the namespace — variables survive

yade_execute_code("O.iter, len(O.bodies), locals().keys()")

3. Continue from where the task left off via a fresh execute_task with remaining logic,

or via execute_code directly against the preserved main state.

```

Full Changelog: bridge-v0.2.4...bridge-v0.3.0