Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Quickdraw Sketch RNN Dataset #361

Closed
wants to merge 13 commits into from
Closed

Add Quickdraw Sketch RNN Dataset #361

wants to merge 13 commits into from

Conversation

mr-ubik
Copy link

@mr-ubik mr-ubik commented Mar 27, 2019

dataset_info.json

Add the Quickdraw Dataset used to train Sketch RNN.

Close one of the TODOs of #337

Caveats:

  • I took the liberty and created a sequence directory to hold this data
  • I added the test but it is still red
  • Manually testing the dataset from TF worked as intended

@googlebot googlebot added the cla: yes Author has signed CLA label Mar 27, 2019
tensorflow_datasets/sequence/quickdraw.py Outdated Show resolved Hide resolved
tensorflow_datasets/__init__.py Outdated Show resolved Hide resolved
@us
Copy link
Contributor

us commented Mar 27, 2019

Thanks for dataset.

You need to add fake_examples.

@mr-ubik
Copy link
Author

mr-ubik commented Mar 27, 2019

I am also changing the number of shards per split going from 1-1-1 to 20-5-5, I will update the gist accordingly.

@mr-ubik
Copy link
Author

mr-ubik commented Mar 27, 2019

@us How should fake examples work? Are they simply placeholders or should I add 3 complete .npz files?

@us
Copy link
Contributor

us commented Mar 27, 2019

You can add .npz files to default. You should think like a that's your extracted dir. Also you can give the _split_generators outputs details

tensorflow_datasets/testing/quickdraw_test.py Outdated Show resolved Hide resolved
@mr-ubik
Copy link
Author

mr-ubik commented Mar 28, 2019

Updated the dataset_info.json with the correct BibTeX citation.

- Fix a bug due to rstrip
- Add builtin conversion to from stroke-3 and stroke-5
- Update testing
@mr-ubik
Copy link
Author

mr-ubik commented Apr 2, 2019

One thing I have seen during testing is that if we want to be able to reproduce the technique of the original paper we should probably invert the notation for the end of the drawing. In the original paper, the authors pad each sketch with (0,0,0,0,1), if we want to leverage the power of the tf.data.Dataset "pipeline" it could be more useful to use (0,0,0,0,0) as the end stroke, doing this allows using tf.padded_batch and potentially tf.data.experimental.bucket_by_sequence_length

In following the content of the paper, the padding step
aimed at making each label sequences of same length is
added as part of the examples generation.

- Fixed the python2 syntax error
- Format respecting the code-style using yapf --style=chromium
@mr-ubik
Copy link
Author

mr-ubik commented Apr 9, 2019

One thing I have seen during testing is that if we want to be able to reproduce the technique of the original paper we should probably invert the notation for the end of the drawing. In the original paper, the authors pad each sketch with (0,0,0,0,1), if we want to leverage the power of the tf.data.Dataset "pipeline" it could be more useful to use (0,0,0,0,0) as the end stroke, doing this allows using tf.padded_batch and potentially tf.data.experimental.bucket_by_sequence_length

Contrary to what said above I have moved the padding step to the examples generation, the final user will thus just need to use tf.data.Dataset.filter() to filter for the various labels.

@mr-ubik
Copy link
Author

mr-ubik commented Apr 18, 2019

I have modified the preprocess step adding the stroke signaling the start of a sketch as they do in here.

EDIT: 🤔 apparently the py2-tf2 test is failing. Also updated the dataset info since I have increased the training set shards from 20 to 30.

@Conchylicultor Conchylicultor added the dataset request Request for a new dataset to be added label Apr 25, 2019
@mr-ubik
Copy link
Author

mr-ubik commented May 7, 2019

Fixed an error in the padding function. The TF 2 Py2.7 Test will fail due to a known Keras issue.
https://stackoverflow.com/a/55903975/8050556

@us
Copy link
Contributor

us commented Jul 17, 2019

@mr-ubik still continue?

@mr-ubik
Copy link
Author

mr-ubik commented Jul 21, 2019

I had stopped due to the issue with NumPy and Keras I had referenced earlier while I have been working on making sure the data format and pre-processing were accurately reflecting the one done in the paper.

@us
Copy link
Contributor

us commented Jul 21, 2019

Did you open an issue by tagging this pr?

@mr-ubik
Copy link
Author

mr-ubik commented Jul 24, 2019

The issue should be fixed now. I will look at the code next week and start pushing new updates again. ❤️

@us
Copy link
Contributor

us commented Jul 24, 2019

Okay! It'll be awesome :)

@us
Copy link
Contributor

us commented Jul 29, 2019

@mr-ubik hey don't forget !

@mr-ubik
Copy link
Author

mr-ubik commented Jul 31, 2019

@us Just sorting through issues at work, contributions will resume ASAP

@cyfra cyfra added the community:is_reviewing This PR is being reviewed by community member. label Feb 8, 2020
@ageron
Copy link
Contributor

ageron commented Mar 21, 2020

Hi there! This PR looks great, is it still active?

@tfds-bot tfds-bot added author:please_respond Author - please respond to the recent comments. and removed community:is_reviewing This PR is being reviewed by community member. labels Mar 21, 2020
@mr-ubik
Copy link
Author

mr-ubik commented Jul 17, 2020

Hi @ageron! I actually had to stop working on it due to other priorities, but I'd like to resume it. It should probably be updated with the new API of tfds if I am not mistaken.
There's also a discussion to be had on whether to embed the preprocessing done in Sketch-RNN into tfds pipeline or leaving it up to the user.

In an internal fork of tfds we are working on at @zurutech/ml we are going to try and see if we can implement this kind of behavior via the use of several BUILDER_CONFIG; if this work out it could be used for this dataset as well.

@tfds-bot tfds-bot added community:is_reviewing This PR is being reviewed by community member. and removed author:please_respond Author - please respond to the recent comments. labels Jul 17, 2020
@osbm
Copy link

osbm commented Jul 16, 2022

is this PR active

@mr-ubik mr-ubik closed this by deleting the head repository Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Author has signed CLA community:is_reviewing This PR is being reviewed by community member. dataset request Request for a new dataset to be added
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants