-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
This ticket is to keep track of required further improvements for Google Drive remote implementation after #2551 will be merged into master:
-
Processing of auth exceptions and printing more meaningful error message on failure. GDrive remote support #2551 (comment)
-
Implement
gccommand support forgdrive(def remove(self, path_info)overloading ) -
Validate resolved remote file object to have expected title (
def resolve_remote_item_from_path(self, path_parts, create):) -
Simplify a few things using new and shiny
@wrap_propandfilter_errorsparam of@retryfrom funcy 1.14 :) -
Make name -> id deterministic by choosing minimum id GDrive remote support #2551 (comment)
-
Reimplement caching to have 2 entry points .path_info_to_ids() and .id_to_path_info(). The idea is to remove cache dictionaries completely, but it will require introduction of helper method which accepts parent_id, title and returns the actual id ( this method introduces 1 -> 1 relation between input and resulting remote id ). GDrive remote support #2551 (comment)
-
Simplify
resolve_remote_item_from_pathGDrive remote support #2551 (comment) -
Protect
create_remote_dirwith lock. GDrive remote support #2551 (comment) -
Organize methods, params in ordered and unified way. GDrive remote support #2551 (comment)
-
Enhance retrieving of HTTP error from GDrive API exceptions maxhora@783d8b6#r36184990
-
Create Iterative Google Drive account and Project to share client id and secret with final DVC users. Highest possible API usage limits quotas should be requested. Probably, it will be needed to have separate Google Project for CI. GDrive remote support #2551 (comment)
-
Stop creating a path to the DVC remote root if it does not exist. Since Google Drive allows multiple folders with the same name (at least on in My Drive and one in Shared With Me) in case a path like
gdrive://root/storageis used to access, collaborators see astorageempty folder after the firstdvc pullattempt`. Instead we should just throw path does not exist. -
Hide
/Users/ivan/Projects/test-gdrive/.env/lib/python3.7/site-packages/oauth2client/_helpers.py:255: UserWarning: Cannot access /Users/ivan/Projects/test-gdrive/.dvc/tmp/gdrive-user-credentials.json: No such file or directory warnings.warn(_MISSING_FILE_MESSAGE.format(filename))on the first auth. -
Reconsider
self.no_traverse = False. Now even to pull a single file we run the full traversal. We can at least start listing only prefixes we need (e.g.remote/0c/*if we need to check if file0c1234...efexists), we can use a parallelexistsif we need to check less than 256 files, etc. -
Put notes in docs that path like
gdrive://rootis not accessible by other people - that ID must be used to actually share data with other team members. -
We should store one credentials file per remote
-
Do not allow empty root DVC remote path (except shared?). I think it can prevent a lot of strange issues. More on this here remote add: should gdrive://root be an error? #3586 -
Add
_downloadprogress Google Drive support further enhancements #2865 (comment) -> gdrive: download: stream & add progress #3722 -
Support file streaming in
dvc.api.open()function (more details in api: support streaming from Google Drive remotes #3408) and and update docs -
Add
import-urlsupport -
Check that
close()is handles in thedvc.api.open()context manager. -
Support show URL in
dvc getfor Gdrive. We can generate a HTTPS link that can be used in a browser if user is authed. The same link as you would get if you download a file from UI. -
Fix credentials management for external repos (when we do
dvc get, etc). We should have a way to cache them. -
Support external dependencies and outputs
-
Check that
dvc get-urlfunctions properly. Need to come up with some credentials management. -
Review and add more tests if needed. Starting from API tests.
-
Add an explanation comment about how path vs ids work, all non 1-1 stuff. Someone running into GDrive for first time will appreciate GDrive remote support #2551 (review). Explain how we deal with it, how caching is involved too.
-
Notify user on retries, keep the message on the screen if they are happening at a certain rate
-
Consider raising an exception if there are multiple remote root directories with the same name
-
Move
_gdrive_*helpers into a separate moduleapito simplify testing and reading the code -
When it's more or less stable make it trusted by default
-
Check if there is a way to generate URLs for GDrive publicly available files to download them w/o auth. Examples: https://github.com/NVlabs/stylegan/blob/master/pretrained_example.py and https://github.com/NVlabs/stylegan/blob/master/dnnlib/util.py .
dvc getideally should work w/o asking to Auth for public objects. -
Improve performance, pass
fields, consider using batch for exists call - https://developers.google.com/drive/api/v2/performance , see example here docs: How to get specific fields when listing files. iterative/PyDrive2#42