notebook: don’t filter polled instances by PID #4407

wchargin · 2020-11-30T21:17:18Z

Summary:
When the %tensorboard cell magic is invoked, we compute a cache key
for the “hermetic environment”, primarily args to %tensorboard and the
working directory. We first check whether any running TensorBoard
instances match that cache key, and launch a new instance if none do.
But then, while polling for the new instance to have launched, we had a
different matching criterion, checking for a process ID match instead of
a cache key match.

The idea was that “is this TensorBoard instance’s PID equal to the PID
of the subprocess that we just spawned?” would be a more reliable check.
But on Windows ((╯°□°）╯︵ ┻━┻) this is not the case, presumably because
the tensorboard console script has some kind of wrapper process in
certain versions of Python. This manifested as “%tensorboard always
times out on the first invocation, but works immediately when I invoke
it again”, since invoking it again triggers the cache key check rather
than the PID check. So we now just check by cache key in all cases, and
the logic is consistent, if a bit less precise overall.

Fixes #4300.

Test Plan:
Still works for me on Linux, with both new and existing TensorBoard
processes across multiple (concurrent) cache keys. @stephanwlee can
repro the bug and fix on Windows with Python 3.8.

wchargin-branch: notebook-poll-no-pid-filter

Summary: When the `%tensorboard` cell magic is invoked, we compute a cache key for the “hermetic environment”, primarily args to `%tensorboard` and the working directory. We first check whether any running TensorBoard instances match that cache key, and launch a new instance if none do. But then, while polling for the new instance to have launched, we had a different matching criterion, checking for a process ID match instead of a cache key match. The idea was that “is this TensorBoard instance’s PID equal to the PID of the subprocess that we just spawned?” would be a more reliable check. But on Windows ((╯°□°）╯︵ ┻━┻) this is not the case, presumably because the `tensorboard` console script has some kind of wrapper process in certain versions of Python. This manifested as “`%tensorboard` always times out on the first invocation, but works immediately when I invoke it again”, since invoking it again triggers the cache key check rather than the PID check. So we now just check by cache key in all cases, and the logic is consistent, if a bit less precise overall. Test Plan: Still works for me on Linux, with both new and existing TensorBoard processes across multiple (concurrent) cache keys. Would be great for a Windows user to confirm repro and fix… wchargin-branch: notebook-poll-no-pid-filter wchargin-source: 85ff3ae7dd6ecab3f6580ef0c0611a1e09ef0630

stephanwlee

Was able to confirm the fix on Windows (Python 3.8) with the patch. Thanks for the fix.

wchargin · 2020-12-01T18:45:46Z

Awesome; thanks very much for testing!

Summary: CI skew between #4409 and #4407 that I didn’t realize. Test Plan: GitHub CI suffices. wchargin-branch: black-reformat-manager wchargin-source: 78c9403c871588dc6c4d4f93783e1f332a577164

Summary: CI skew between #4409 and #4407 that I didn’t realize. Test Plan: GitHub CI suffices. wchargin-branch: black-reformat-manager

wchargin added type:bug core:backend os:windows labels Nov 30, 2020

wchargin requested a review from stephanwlee November 30, 2020 21:17

google-cla bot added the cla: yes label Nov 30, 2020

stephanwlee approved these changes Dec 1, 2020

View reviewed changes

wchargin merged commit 546d1b6 into master Dec 1, 2020

wchargin deleted the wchargin-notebook-poll-no-pid-filter branch December 1, 2020 18:47

wchargin mentioned this pull request Dec 1, 2020

Starting tensorboard inline within a Jupyter notebook consistently times out #4300

Closed

wchargin mentioned this pull request Dec 1, 2020

black: reformat tensorboard/manager.py #4414

Merged

wchargin added a commit that referenced this pull request Dec 2, 2020

black: reformat tensorboard/manager.py (#4414)

81fc828

Summary: CI skew between #4409 and #4407 that I didn’t realize. Test Plan: GitHub CI suffices. wchargin-branch: black-reformat-manager

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebook: don’t filter polled instances by PID #4407

notebook: don’t filter polled instances by PID #4407

wchargin commented Nov 30, 2020 •

edited

stephanwlee left a comment •

edited

wchargin commented Dec 1, 2020

notebook: don’t filter polled instances by PID #4407

notebook: don’t filter polled instances by PID #4407

Conversation

wchargin commented Nov 30, 2020 • edited

stephanwlee left a comment • edited

Choose a reason for hiding this comment

wchargin commented Dec 1, 2020

wchargin commented Nov 30, 2020 •

edited

stephanwlee left a comment •

edited