New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for uploading datasets #1409
Comments
@ethanwhite @henrykironde let me know your thoughts on this. |
I do think this is a good idea and I do really like it. We could start with integrating with Kaggle and see were we go from there. Also about the versioning, we do have a tool to track that, we shall see how we advance to support this feature further. |
I think the idea of cloud storage backends is a good one. If storing into a cloud database this should already be supported through the existing ability to pass remote database locations, so this issue would be about adding flat file storage, which I agree would be useful. I think this could be combined with https://github.com/weecology/retriever/wiki/GSoC-2020-Project-Ideas#data-retriever-add-support-for-more-raw-data-formats to make a project that is basically about adding new types of data as sources and new backends for data to be converted into. |
I believe combining
|
Sizes of datasets are increasing and so is the computational power required for processing them. Most data scientists and other users prefer to use cloud-based solutions for storing the dataset. This issue is opened for the discussion about a feature, which does the following:
The motivation for this feature can be explained by the following:
git lfs
for better versioning/history of each upload.This would require some research in knowing and structuring the configuration of the respective cloud solutions and also a good design to implement proper versioning of datasets
The text was updated successfully, but these errors were encountered: