Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GCSFS walk() method #561

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

megaserg
Copy link
Contributor

Resolves #558

@codecov
Copy link

codecov bot commented Jun 14, 2020

Codecov Report

Patch coverage has no change and project coverage change: +0.05 🎉

Comparison is base (589d40c) 86.08% compared to head (b87919b) 86.13%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #561      +/-   ##
==========================================
+ Coverage   86.08%   86.13%   +0.05%     
==========================================
  Files          87       87              
  Lines        4972     4969       -3     
  Branches      793      792       -1     
==========================================
  Hits         4280     4280              
+ Misses        563      560       -3     
  Partials      129      129              
Impacted Files Coverage Δ
petastorm/gcsfs_helpers/gcsfs_wrapper.py 17.94% <0.00%> (+1.28%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@megaserg
Copy link
Contributor Author

@selitvin @aleks-djuric Not sure what does the complaint from codecov/patch mean. Would you like to review?


yield path, directories, files
obj_path = obj['name']
if obj_path == path:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check still fails to catch the desired directory. obj_path has an extra '/' on the end which path does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was able to reproduce, but for some reason only when I provided a directory that didn't contain files, only other directories.

@selitvin
Copy link
Collaborator

@aleks-djuric ,If you are familiar with gcs nuances, perhaps you could help review this PR?

@alekswithakayy
Copy link

I fixed this by just doing obj_path.strip('/'). Not familiar with how this process works. Should I create a new pull request?

@selitvin
Copy link
Collaborator

Either @megaserg incorporates the fix you propose (I think it's preferable since we'll keep only one PR for the issue), or create a separate PR and we can review and land that.

@megaserg : we don't have tests for GCSFS and not sure if we can set them up easily. We will be able to land the PR regardless.

@selitvin
Copy link
Collaborator

@megaserg, as a last ask, since we don't have a proper unittest setup, can you please paste an example of how the walk function output looks like to the comment of this PR for a sanity check?

Copy link
Collaborator

@selitvin selitvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mocking up self.fs's ls to emulate some small directory structure appears not too hard. Would you be able to add couple of unit-tests to test the function? Let me know if you need help with that.

files.add(obj_path)

rel_files = sorted([posixpath.split(f)[1] for f in files
if f not in directories])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of checking if f not in directories? Could there be an entry in files that is also in directories?


yield path, directories, files
obj_path = obj['name']
if obj_path.strip('/') == path.strip('/'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use more conservative rstrip.


rel_files = sorted([posixpath.split(f)[1] for f in files
if f not in directories])
rel_directories = sorted([posixpath.split(x[:-1])[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More self explanatory, IMO: x[:-1] -> x.rstrip('/').

rel_directories = sorted([posixpath.split(x[:-1])[1]
for x in directories])

yield path, rel_directories, rel_files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When path is invalid, the function still yields once. This does not match os.walk semantics which would not yield at all in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

walk method in GCSFSWrapper returns empty string as one of filenames
3 participants