-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add classes for fetching a BIDS-compliant study on S3 #290
Conversation
Hello @richford! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-08-09 19:59:14 UTC |
Looks great! Regarding this
Once you have these keys in hand, it's a call to |
Agreed. So we probably don't need to include it in AFQ.data. They already have the list of s3 keys. So maybe I'll check that off the todo list and assume that we put the rest of that responsibility on the user. |
Still a WIP. I did some significant restructuring after realizing that HBN is not BIDS compliant. We now have an
We could also consider creating an |
@arokem Is the documentation build failing because the PR is coming from my fork and not from yeatmanlab? |
That looks like the return of numpy/numpydoc#268. Is this branch rebased on current master? |
Nope. Just rebased. Thanks for the tip on the 🧟 bug! |
Hmm, no such luck. Anyway, low priority since this is still a WIP. We can come back to it once I implement the other things. |
I tried updating |
Thanks @36000 for fixing the docbuild issue! |
Added tests using moto. They pass locally. Let's see if they pass here. Then I think this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! I had some small comments/suggestions.
You might also want to rebase this, as there has been some restructuring of the library in the recently merged #322
AFQ/data.py
Outdated
study : AFQ.data.S3BIDSStudy | ||
The S3BIDSStudy for which this subject was a participant | ||
|
||
site : str, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For generality, might want to point out that this idiosyncrasy designed to support use of HBN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this particular part is an HBN idiosyncrasy. Rather, I'm just trying to record some notion of the parent study object. On the other hand, the parameter _subject_class
described on line 814 is an idiosyncrasy designed to support HBN, but I think I document that reasonably well there. What do you think?
pbar_idx=idx, | ||
) for idx, sub in enumerate(self.subjects)] | ||
|
||
compute(*results, scheduler='threads') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you rather use the dask progressbar construct here as well? Instead of the pbar input to sub.download
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had trouble getting the nesting to work with the Dask progress bar since we download multiple subjects all at once. So I went with this instead. I admit it's weird to mix the dask progress bar with the tqdm progress bar but I didn't want to spend too much time on that part. But by all means, commits to this PR are welcome.
|
||
|
||
@pytest.fixture | ||
def temp_data_dir(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this better in some way than https://github.com/yeatmanlab/pyAFQ/blob/master/AFQ/tests/test_api.py#L217 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, cool. I didn't know about nibabel.tmpdirs
. The main difference I see is that nbtmp.InTemporaryDirectory()
also changes the current working directory. Also, using is as a pytest.fixture
with a yield statement will make sure that we delete the directory even if the test fails. I'm not sure what happens to the temporary directory if the test you linked in test_AFQ_data
fails in the middle. but maybe it also deletes the directory.
There could also be an upload function similar to the download function, which uploads derivatives generated by AFQ back into the same folder. So the cloudknot workflow would look like:
Thereby pyAFQ would achieve its peak API potential |
@36000, regarding the upload function. This would presuppose that the s3 bucket where we got the input data is the same one where we want to put the results. But for public datasets, this might not be true. Two options: (1) we could assume that it will work and then throw an error if it doesn't (or rather, let |
Indeed. This has moved into setup.cfg.
Could you please rebase onto master? You can sort the dependencies
alphabetically in there, if you want :-)
…On Sat, Aug 8, 2020 at 6:12 AM Adam Richie-Halford ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In requirements.txt
<#290 (comment)>:
> templateflow==0.6.2
-pybids==0.11.1
+toml==0.10.0
+toolz==0.9.0
+tqdm==4.32.2
But It appears that this is all happening in setup.cfg now anyway.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#290 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA46NXROE3ERA2KHEMAINDR7VFLFANCNFSM4OOQCICQ>
.
|
… more general S3BIDSStudy
Co-authored-by: Ariel Rokem <arokem@gmail.com>
Well. This is annoying. Not as elegant, perhaps, but you can still add a |
I went with the |
Yay! Tests finally pass. |
Great! I believe this is ready to go? Maybe we can merge this and follow up with issues/PRs on things that came up but were left unresolved? |
Agreed. I think it's ready to go. I believe the only outstanding issue is the upload functionality that @36000 mentioned. However, I may have missed something else in the back-and-forth. |
This PR adds classes to AFQ.data that can query and download subjects from a BIDS-compliant study hosted on Amazon S3. It is related to #255 but is much narrower in scope and does not resolve that issue. The main use case is to create a subset BIDS study for use on distributed computing elements.
Here's some example usage using the
HBN
class, which inherits fromS3BIDSStudy
:Remaining questions or todo items:
multisite=True
in the S3BIDSStudy parameters, we assume that all of the top-level directories are Site IDs. But maybe there's some other top level data in there. It'd be nice to be able to differentiate between "Site-RU" and a top-level "participants.tsv," for example.