Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Drive files download failure #1482

Open
Conchylicultor opened this issue Feb 19, 2020 · 8 comments
Open

Better Drive files download failure #1482

Conchylicultor opened this issue Feb 19, 2020 · 8 comments
Labels
enhancement New feature or request

Comments

@Conchylicultor
Copy link
Member

Conchylicultor commented Feb 19, 2020

Download of drive urls sometimes fails with NonMatchingChecksumError: Artifact https://drive.google.com/... has wrong checksum.

Explanation: Drive sometimes reject the download attempt, and the rejection page is downloaded instead of the data:

  • If the user is based in china (should use VPN)
  • If there is too many downloads of the same file.

The best solution currently is to manually download the data (https://www.tensorflow.org/datasets/overview#manual_download_if_download_fails), rather than using the automated download which got rejected by drive.

Otherwise:

  • Try the download latter on.
  • Try on a different computer
  • Rather than downloading the file in each colab connection, load the dataset from a GCS bucket. See instructions.

Not sure there can be a solution on Google Drive side, while preventing abuse.
On TFDS side, we could make the error message more explicit when we detect a drive URL.

@dhirensr
Copy link
Contributor

@Conchylicultor : do we need to just change the error message in this ticket? could you guide me a little bit so that I could work on it.

@jpgard
Copy link

jpgard commented Mar 4, 2020

Is there a way for users to e.g. make a copy of the files into our own Google Drive for a dataset, manually download them to the correct location, and proceed from there? Or any other manual workaround using the publicly-available celeba data?

@ChanchalKumarMaji
Copy link
Contributor

For drive links downloads can be done by extracting the id and creating the download link as -
https://drive.google.com/uc?id=0B7EVK8r0v71pZjFTYXZWM3FlRnM

For now celeb_a download link shows -

Sorry, you can't view or download this file at this time.

Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.

@liqinglin54951
Copy link

celeb_a tfrecord files:
https://drive.google.com/drive/folders/1MKQ9sRwr5OOFk3OBzLz91SsgF3MBqvtP?usp=sharing
OR
you can follow " Create tfrecord files for 'test', 'train', 'validation' " on cp13_Parallelizing NN Training w TF_printoptions(precision)_squeeze_shuffle_batch_repeat_image process_map_tfrecords (https://blog.csdn.net/Linli522362242/article/details/112386820)

@ghost
Copy link

ghost commented Apr 24, 2021

celeb_a tfrecord files:
https://drive.google.com/drive/folders/1MKQ9sRwr5OOFk3OBzLz91SsgF3MBqvtP?usp=sharing
OR
you can follow " Create tfrecord files for 'test', 'train', 'validation' " on cp13_Parallelizing NN Training w TF_printoptions(precision)_squeeze_shuffle_batch_repeat_image process_map_tfrecords (https://blog.csdn.net/Linli522362242/article/details/112386820)

Is there an easier way to install the celeb_a dataset? I am trying the "download" manually method, but it is not helping at all.

@johnny-brav0
Copy link

Has anyone tried executing the code cell twice?
I'm getting the same error for the "paws_wiki" dataset, using tfds.load('paws_wiki'). But apart from the error, the data does gets downloaded and as soon as I execute the cell again it works and imports the data to my environment.

@aryan-f
Copy link

aryan-f commented Jan 1, 2022

Looking for a workaround for this issue, I ended up finding a routine in the library that checks for the files on your own machine before attempting to download them. It was ~/tensorflow_datasets/downloads/manual (the download_dir in download_and_prepare). You can manually download the file from Drive and place it in one of the directories it checks. Once you log in, high traffic is no longer an issue.

The database I intended to use was the CaltechBirds2010 and I found the Drive link here.

@Abdulrasheedar
Copy link

I get this below error while I was trying to use deep_weeds dataset with this code " data_train, info = tfds.load("deep_weeds", with_info=True, split='train[:60%]',as_supervised=True) "

NonMatchingChecksumError Traceback (most recent call last)
in <cell line: 1>()
----> 1 data_train, info = tfds.load("deep_weeds", with_info=True, split='train[:60%]',as_supervised=True)
2 data_valid = tfds.load("deep_weeds",split='train[60%:80%]',as_supervised=True)
3 data_test = tfds.load("deep_weeds", split='train[80%:]',as_supervised=True)
4 # file_path ="/content/images/"
5 # dataset = tfds.load(name='deep_weeds', data_dir=file_path)

20 frames
/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/download/download_manager.py in _validate_checksums(url, path, computed_url_info, expected_url_info, force_checksums_validation)
769 'https://www.tensorflow.org/datasets/overview#fixing_nonmatchingchecksumerror'
770 )
--> 771 raise NonMatchingChecksumError(msg)
772
773

NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=1xnK3B6K6KekDI55vwJ0vnc2IGoDga9cj, downloaded to /root/tensorflow_datasets/downloads/ucexport_download_id_1xnK3B6K6KekDI55vwJ0vnc2ITDlCjLc2rcwnx4HX2m4DkEyLfA722UJqaLRkfNhB6ec.tmp.68dd982dd0fd4809b12f3ef885ebe32f/download, has wrong checksum:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants