Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treon chokes on notebooks that refer to relative paths #12

Closed
natanlao opened this issue Aug 8, 2019 · 4 comments
Closed

Treon chokes on notebooks that refer to relative paths #12

natanlao opened this issue Aug 8, 2019 · 4 comments

Comments

@natanlao
Copy link

natanlao commented Aug 8, 2019

We're using treon for HumanCellAtlas/data-consumer-vignettes, and ran into a problem where notebooks that refer to relative paths can unexpectedly fail when tested with treon.

These notebooks expect the current working directory to be the directory that the notebook resides in. When using treon, this is not likely to be the case, since treon searches recursively for notebooks to test. (In this case, the current working directory is the directory where treon was invoked.)

I've been able to solve this locally by changing the working directory to the directory in which the notebook resides before testing the notebook. That said, this solution only works if treon is limited to a single thread (or only testing one notebook), since the current working directory is shared across all threads.

There are a few approaches to this that I can think of:

  1. Refactor treon to use multiprocessing instead of multithreading. Skimming the source code, it seems like this change would be more or less trivial. Using multiprocessing would have the benefit of working directory isolation, in addition to potential performance improvements for testing CPU-bound notebooks by circumventing the GIL, at the cost of some performance overhead.

  2. Add an option to perform the same directory-switching that I used above that limits the number of parallel threads to one.

  3. Ignore this problem - our workaround is to handle the directory-switching ourselves, running treon with one notebook at a time, while still achieving parallelism with xargs.

I'm not sure if this is a widespread use case, and I'm happy to take a shot at either of these myself.

@amit1rrr
Copy link
Member

amit1rrr commented Aug 9, 2019

Thanks for reporting @natanlao

Given your repository is open source, can you please tell us a way to reproduce this problem with smallest possible set of notebooks maybe? There are a few things I'd like to try out.

Refactor treon to use multiprocessing instead of multithreading

Please pardon my lack of knowledge here but what is it about processes that would make it possible instead of threads. Is it that the processes can have different working directory but threads cannot (as threads derive working directory from the parent process)?

@f4lco
Copy link
Contributor

f4lco commented Aug 9, 2019

If I'm understanding correctly, it is not so much about the treon internals (if it uses threads or processes), but rather about adding the correct working directory parameter when starting the Jupyter kernel the notebook will run in.

I think it is a good change to set the current working directory to the folder containing the notebook. My primary use-case is importing a Python module containing common setup-code, which lives in the same directory as the notebooks.

While tinkering with treon, I already made this change once; it is rather straightforward, see test_execution.py#execute_notebook. If desired, filing a PR would be no problem. We need to decide if this behavior just changed, or if it must be configurable. Maybe some folks already rely on the current behavior to keep the working directory of treon's invocation.

@amit1rrr
Copy link
Member

amit1rrr commented Aug 9, 2019

adding the correct working directory parameter when starting the Jupyter kernel the notebook will run in

ah, absolutely right.

We need to decide if this behavior just changed, or if it must be configurable. Maybe some folks already rely on the current behavior to keep the working directory of treon's invocation.

Our default should match the default of Jupyter UI. From this comment, The kernel started for a notebook has a working directory of where the notebook file is so we should also use this as default.

amit1rrr added a commit that referenced this issue Aug 9, 2019
Set working directory to the folder containing the notebook (fixes #12)
@natanlao
Copy link
Author

natanlao commented Aug 9, 2019

Thanks! I didn't know that that solution was possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants