-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for Cartoon Set #436
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your checksum file is empty. Please add --register_checksums
parameter to download_and_prepare
script and create fill it. Check this [link](Your checksum file is empty. Please add --register_checksums
parameter to download_and_prepare script.).
@us Since, a google account is needed to download the dataset. The dataset needs to be added manually. So, no checksum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not "Heavy Implementation" using BuilderConfigs ? I see similar features being exposed in both the classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the fake_examples = 3+30
?
Since, the directory structure is slightly different I am unsure if I can use BuilderConfig to handle it. If it can be done I would require some pointers.
I forgot to commit them. Added them now. |
sorry i missed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the 100k version the files are divided into subfolders from 0-9 with each subfolder containing 10,000 images. |
use |
@ChanchalKumarMaji
I figured out a way to do that without an if-else statement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sklan .
@rsepassi @cyfra @Conchylicultor , I think this dataset is ready, please see. Thanks. |
@sklan can you solve the conflicts, I think it's forgotten pr. Check please @Conchylicultor @cyfra @pierrot0 |
Yea sure I'll look into fixing the merge conflicts |
Seems that kokoro is failing wiith: E ImportError: dlopen: cannot load any more object with static TLS |
I'll look into it |
name, dtype = file.split('.') | ||
if dtype == 'png': | ||
image = tfds.core.lazy_imports.skimage.io.imread( | ||
path + '/' + name + '.png') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should need to os.path.join(path, name)
, because /
is not dynamic type.
DATASET_CLASS = cartoonset.Cartoonset | ||
BUILDER_CONFIG_NAMES_TO_TEST = ["cartoonset100k"] | ||
SPLITS = { | ||
"train": 30, # Fake training examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30 test images is too much 3-4 is enough for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - there are far too many test files in this PR.
DATASET_CLASS = cartoonset.Cartoonset | ||
BUILDER_CONFIG_NAMES_TO_TEST = ["cartoonset100k"] | ||
SPLITS = { | ||
"train": 30, # Fake training examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - there are far too many test files in this PR.
|
||
import tensorflow as tf | ||
|
||
import tensorflow_datasets as tfds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import tensorflow.compat.v2 as tf
# There is no predefined train/val/test split for this dataset. | ||
path = dl_manager.manual_dir | ||
if not tf.io.gfile.exists(path): | ||
msg = 'You must download the dataset files manually and place them in: ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must also set MANUAL_DOWNLOAD_INSTRUCTIONS field (see other datasets like c4)
return [ | ||
tfds.core.SplitGenerator( | ||
name=tfds.Split.TRAIN, | ||
num_shards=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove num_Shards
features_dict = dict() | ||
name, dtype = file.split('.') | ||
if dtype == 'png': | ||
image = tfds.core.lazy_imports.skimage.io.imread( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have to read using tf.gfile (to be compatible with non-local filesystems):
with tf.io.gfile.GFile(os.path.join(root, fname), "rb") as png_f:
mask = tfds.core.lazy_imports.cv2.imdecode(
np.fromstring(png_f.read(), dtype=np.uint8), flags=0)
I added support for the Cartoon Set datasets.
I also fixed a spelling mistake in
download_and_prepare.py
.Gist: https://gist.github.com/sklan/3aa0a3cc9036224b0e93ca01dc6d66f5