Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_default_delete_dir_handler throws error when using default handler #716

Closed
manjuransari-zz opened this issue Aug 19, 2021 · 8 comments
Closed

Comments

@manjuransari-zz
Copy link
Contributor

We have option to overwrite the method _default_delete_dir_handler. But when we don't overwrite the method, it throws error when used. As we already have fsspec integrated, we can extend fsspec to enable deletion of files.

@selitvin
Copy link
Collaborator

Not sure I understand. What is the ask here?

@manjuransari-zz
Copy link
Contributor Author

manjuransari-zz commented Aug 20, 2021

image
We can add our own delete handler. But when we use default handler, as above, the else code fails. We are using fsspec file system but not passing url that is acceptable by fsspec.

For example, we pass dataset_url = 'abfs://synapsemlfs/temp/file.csv'
parsed becomes ParseResult(scheme='abfs', netloc='synapsemlfs', path='/temp/file.csv', params='', query='', fragment='')

When we check fs.exist(parsed.path), we are actually checking for the url = '/temp/file.csv'. This url does not have container/bucket information. The url should be (parsed.netloc + parsed.path) to work flawlessly with fsspec. Also, fsspec support urls with storage_options within url. That particular case is also not handled.

Plus we also have support for local filesystem in fsspec, hence we would have straight implementation for both.

The acceptable code should be as follow:

from fsspec.core import strip_protocol

def _default_delete_dir_handler(dataset_url):
    resolver = FilesystemResolver(dataset_url)
    fs = resolver.filesystem()
    _dataset_url = strip_protocol(dataset_url)
    if fs.exists(_dataset_url ):
       fs.delete(_dataset_url , recursive=True)

@selitvin
Copy link
Collaborator

Would you like to create a PR for this?

@manjuransari-zz
Copy link
Contributor Author

yes, I am working on it. Will raise soon.

@manjuransari-zz
Copy link
Contributor Author

@selitvin is there any plans for a release anytime soon?

@selitvin
Copy link
Collaborator

selitvin commented Sep 3, 2021 via email

@selitvin
Copy link
Collaborator

selitvin commented Sep 3, 2021

petastorm==0.11.3rc0 is out

@selitvin
Copy link
Collaborator

selitvin commented Sep 5, 2021

0.11.3 released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants