Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply some heuristics to make certain datasets look more like a binder #376

Merged
merged 1 commit into from Jan 7, 2020

Conversation

Xarthisius
Copy link
Collaborator

The World is full of data. Some of that data can actually be binder compatible. The way we currently register the data, often prevents us from easily creating a Tale from a binder/tale-like dataset. This PR introduces some magic to make WT a better place!

As reminder POST /tale/import can take a url to an external dataset. If combined with asTale=True this dataset ends up in a workspace directory. This PR performs some modifications to the content of the workspace, namely:

  1. If the workspace contains a single directory, move all content of that directory a level up, then remove the directory
  2. If the workspace contains a singe zip/tar file, unpack it, then remove the zip/tar file

Modifications are performed recursively, so if you watched too much Inception and try to throw at me a dataset containing a single directory, that contains a single zipfile, that contains a single directory, that contains a binder... (https://zenodo.org/record/3523527) I've got you covered!

How to test?

  1. Using swagger API and POST /tale/import register https://dev2.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/C074IQ as Tale
  2. Using swagger API and POST /tale/import register https://zenodo.org/record/3523527
  3. Using swagger API and POST /tale/import be creative!

@codecov
Copy link

codecov bot commented Dec 13, 2019

Codecov Report

Merging #376 into master will increase coverage by 0.12%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #376      +/-   ##
=========================================
+ Coverage   90.67%   90.8%   +0.12%     
=========================================
  Files          45      45              
  Lines        3250    3274      +24     
=========================================
+ Hits         2947    2973      +26     
+ Misses        303     301       -2
Impacted Files Coverage Δ
server/tasks/import_binder.py 92.76% <100%> (+1.35%) ⬆️
server/rest/tale.py 95.72% <0%> (+0.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0cc56c...a8e2cc1. Read the comment docs.

Copy link

@craig-willis craig-willis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor points, but otherwise works for me:

  • The dev2.dataverse.org example highlighted that we don't have Jupyter + R support today, so I filed Support composability of Jupyter with R repo2docker_wholetale#6. Not a problem with this PR, but this PR made the problem evident.
  • The swagger entry for POST /tale/import has a default={} for taleKwargs, which causes the UI to display [Object object]. Also not caused by this PR.
  • While the heuristics do mean that published Binders are more likely to be correctly organized in the workspace for image builds, it doesn't actually detect binders -- so really any Git repo published to Zenodo will be treated the same way. I'm not sure this is a bad thing, but do wonder if we should really only do this with real "binders".

@ThomasThelen
Copy link
Member

When trying with

imageId: 5dddaa07c979bbcc32f049b6
url: https://dev.nceas.ucsb.edu/view/doi:10.5072/FK2F76FS7M
lookupKwargs: {"base_url":"https://dev.nceas.ucsb.edu/knb/d1/mn/v2"}

I ran across the following. This may be a mis-configuration on my end with webdav

Copying files to workspaceTypeError: TypeError('invalid file: None',)
[<FrameSummary file /girder/plugins/wholetale/server/tasks/import_binder.py, line 155 in run>, 
<FrameSummary file /usr/local/lib/python3.5/dist-packages/fs/copy.py, line 45 in copy_fs>, 
<FrameSummary file /usr/local/lib/python3.5/dist-packages/fs/copy.py, line 300 in copy_dir>, 
<FrameSummary file /usr/local/lib/python3.5/dist-packages/fs/_bulk.py, line 132 in copy>, 
<FrameSummary file /usr/local/lib/python3.5/dist-packages/fs/copy.py, line 175 in copy_file_internal>, 
<FrameSummary file /girder/plugins/wholetale/server/tasks/import_binder.py, line 300 in openbin>]

@Xarthisius
Copy link
Collaborator Author

It fails in code not touched by this PR. My guess is that it has problem with fetch.txt that has 0 size. I'll take a look tomorrow.

Copy link
Member

@bodom0015 bodom0015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Aside: I tried to get creative, but it didn't end well. Not clear if this is even a supported case, but I attempted to import a zip file (containing single directories nested to a depth of 10) as a Tale: https://filebin.net/ymjppkdgesc3vd81/d1.zip?t=4jpl8ceh

The link is ephemeral, but I can provide the zip if it would be helpful.

[2020-01-07 17:22:56,918: ERROR/ForkPoolWorker-3] Task gwvolman.tasks.build_tale_image[419f3414-2768-4805-ac32-615cb563d2c0] raised unexpected: ValueError('Cannot build image for a Tale in error state.',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/girder_worker/task.py", line 148, in __call__
    results = super(Task, self).__call__(*_t_args, **_t_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/gwvolman/gwvolman/tasks.py", line 369, in build_tale_image
    raise ValueError("Cannot build image for a Tale in error state.")
ValueError: Cannot build image for a Tale in error state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants