Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detox worked, tox -p auto hangs with trial -j tests #1183

Closed
glyph opened this issue Mar 10, 2019 · 21 comments
Closed

detox worked, tox -p auto hangs with trial -j tests #1183

glyph opened this issue Mar 10, 2019 · 21 comments
Labels
bug:normal affects many people or has quite an impact

Comments

@glyph
Copy link

glyph commented Mar 10, 2019

I've been happily running my test suite in detox for the past year or so. Today I upgraded to tox with the --parallel option, and suddenly my unit-tests environment hangs, despite mypy, lint, and integration-tests (which uses the same test runner and similar options) all passing and exiting nicely.

Here are some hopefully relevant details:

  • The test runner in question is trial -j 8, so the subprocess has subprocesses of its own, which may be confounding things.

  • unit-tests is the longest running environment.

  • When the tests hang, I see a --installpkg process as well as all of my trial worker processes hanging. As such, I tried --parallel--safe-build just in case, but it didn't change the behavior at all.

If submitting a BUG please provide:

  • Minimal reproducible example or detailed description, assign "bug"

Sorry to say that I can't produce a minimal reproducer; thus far I've only managed to produce this on a proprietary test suite.

  • OS and pip list output

macOS 10.14.3

Package    Version
---------- -------
filelock   3.0.10
pip        18.1
pluggy     0.9.0
py         1.8.0
setuptools 40.5.0
six        1.12.0
toml       0.10.0
tox        3.7.0
virtualenv 16.4.3
wheel      0.32.2
@glyph glyph changed the title detox worked, tox -p auto hangs detox worked, tox -p auto hangs with trial -j tests Mar 10, 2019
@glyph
Copy link
Author

glyph commented Mar 10, 2019

Also, when I hit Control-C after the tests hang, I see this error:

Traceback (most recent call last):
  File "/Users/glyph/.local/bin/tox", line 11, in <module>
    sys.exit(cmdline())
  File "/Users/glyph/.local/venvs/tox/lib/python3.6/site-packages/tox/session.py", line 47, in cmdline
    main(args)
  File "/Users/glyph/.local/venvs/tox/lib/python3.6/site-packages/tox/session.py", line 54, in main
    retcode = build_session(config).runcommand()
  File "/Users/glyph/.local/venvs/tox/lib/python3.6/site-packages/tox/session.py", line 467, in runcommand
    return self.subcommand_test()
  File "/Users/glyph/.local/venvs/tox/lib/python3.6/site-packages/tox/session.py", line 591, in subcommand_test
    retcode = self._summary()
  File "/Users/glyph/.local/venvs/tox/lib/python3.6/site-packages/tox/session.py", line 739, in _summary
    status = venv.status
AttributeError: 'VirtualEnv' object has no attribute 'status'
^CException ignored in: <module 'threading' from '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py'>
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1294, in _shutdown
    t.join()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

@glyph
Copy link
Author

glyph commented Mar 10, 2019

Happy to provide additional details if I can.

@gaborbernat
Copy link
Member

Do a run with the live output enabled and triple verbosity and submit the output like that too. Thanks!

@glyph
Copy link
Author

glyph commented Mar 10, 2019

@gaborbernat I just finished my first run with live output (I assume you mean the -o option?) The problem doesn't occur in that case -- so I guess I've found a workaround at least!

@glyph
Copy link
Author

glyph commented Mar 10, 2019

I'm trying a run with -vvv now, but without -o, to see if I get any interesting information that way.

@gaborbernat
Copy link
Member

gaborbernat commented Mar 10, 2019

Yes -o, not any closer to reproduce the issue though ☹️

@glyph
Copy link
Author

glyph commented Mar 10, 2019

@gaborbernat I suspect that this is an infelicitous interaction between whatever trial is doing to manage its subprocesses and whatever tox is doing to manage its own. Is there anything you can think of that I could look for in the implementation of trial, or things we might be doing in our test suite, which might trigger this behavior?

@gaborbernat
Copy link
Member

I'm not familiar with trial at all at the moment, so can't think of anything now.

@glyph
Copy link
Author

glyph commented Mar 10, 2019

I'm asking more from the perspective of things that might tickle bugs in tox - I wouldn't expect you to dive too deeply into trial to understand it before we have at least a vague idea of what's going on here :).

@gaborbernat
Copy link
Member

I can't think of anything.

@glyph
Copy link
Author

glyph commented Mar 10, 2019

After repeated testing, two additional facts to report:

  • this time integration-tests hung as well
  • passing -vvv resulted in only this one additional interesting line of output when I hit control-C:
cleanup /Users/glyph/Projects/${NAME_OF_PROJECT}/.tox/.tmp/package/1/${NAME_OF_PACKAGE}-${VERSION}.zip

I'll do a few more runs with -o to see if I can reproduce it, if the hang is intermittent...

@gaborbernat gaborbernat added the needs:reproducer ideally a failing test marked as xfail. If that is not possible exact instructions to reproduce label Mar 11, 2019
@asottile
Copy link
Contributor

@glyph is the project in question open source so we could poke at it?


Also (unrelated)

(copying from code-quality mailing list)

@sigmavirus24

Hi all,

I noticed #1183 and I suspect the
problem is in how tox and trial are using sub-processes for
parallel work. I know that Flake8 uses sub-processes as well (via
multiprocessing) so I'd be unsurprised if this eventually shows up in
Flake8's issue tracker. I don't recall how pylint does parallel
processing, but if it's anything like Flake8, I'm guessing they might
see it soon too for large enough code-bases.

I wanted to give y'all a heads up in case you notice or get reports about this.

Cheers,
Ian

@gaborbernat
Copy link
Member

#1186 might fix this as a side-effect - let's see once gets merged 👍

@sigmavirus24
Copy link

@asottile Glyph mentioned this was a proprietary codebase in the description of the bug

@asottile
Copy link
Contributor

oh dangit, I missed that 🤦‍♂️

@chrisrink10
Copy link

I've also been experiencing a similar situation in an (unfortunately!) proprietary codebase, except the offending testrunner is pytest (with pytest-xdist running parallel child workers). I have had success with -o as suggested above, however.

@asottile
Copy link
Contributor

The problem is the use of subprocess.PIPE + proc.wait()

This fills up the pipe buffer and then hangs indefinitely

res = process.wait()

Here's a minimal reproduction:

[tox]
envlist = e1,e2
skipsdist = true

[testenv:e1]
commands =
    python -c '[print("hello world") for _ in range(5000)]'

[testenv:e2]
commands =
    python -c '[print("hello world") for _ in range(5000)]'

The usual fix is to not use PIPE but to write to a temporary file and then read the file when completed

@asottile
Copy link
Contributor

Please try out the branch in #1202 and see if it fixes your issues!

@asottile asottile added bug:normal affects many people or has quite an impact and removed needs:reproducer ideally a failing test marked as xfail. If that is not possible exact instructions to reproduce labels Mar 22, 2019
@glyph
Copy link
Author

glyph commented Mar 22, 2019

Hooray for open source!

@glyph
Copy link
Author

glyph commented Mar 24, 2019

Since this is now on master, I am trying 7a084a1 with tox -p auto.

@glyph
Copy link
Author

glyph commented Mar 24, 2019

It worked!

@tox-dev tox-dev locked and limited conversation to collaborators Jan 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug:normal affects many people or has quite an impact
Projects
None yet
Development

No branches or pull requests

5 participants