Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for Cartoon Set #436

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

sklan
Copy link
Contributor

@sklan sklan commented Apr 6, 2019

I added support for the Cartoon Set datasets.
I also fixed a spelling mistake in download_and_prepare.py.
Gist: https://gist.github.com/sklan/3aa0a3cc9036224b0e93ca01dc6d66f5

@googlebot googlebot added the cla: yes Author has signed CLA label Apr 6, 2019
Copy link
Contributor

@us us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your checksum file is empty. Please add --register_checksums parameter to download_and_prepare script and create fill it. Check this [link](Your checksum file is empty. Please add --register_checksums parameter to download_and_prepare script.).

@sklan
Copy link
Contributor Author

sklan commented Apr 6, 2019

@us Since, a google account is needed to download the dataset. The dataset needs to be added manually. So, no checksum.

Copy link
Contributor

@ChanchalKumarMaji ChanchalKumarMaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not "Heavy Implementation" using BuilderConfigs ? I see similar features being exposed in both the classes.

Copy link
Contributor

@ChanchalKumarMaji ChanchalKumarMaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the fake_examples = 3+30 ?

@sklan
Copy link
Contributor Author

sklan commented Apr 6, 2019

@ChanchalKumarMaji

Why not "Heavy Implementation" using BuilderConfigs ? I see similar features being exposed in both the classes.

Since, the directory structure is slightly different I am unsure if I can use BuilderConfig to handle it. If it can be done I would require some pointers.

Where is the fake_examples = 3+30 ?

I forgot to commit them. Added them now.

@us
Copy link
Contributor

us commented Apr 6, 2019

@us Since, a google account is needed to download the dataset. The dataset needs to be added manually. So, no checksum.

sorry i missed it.

Copy link
Contributor

@ChanchalKumarMaji ChanchalKumarMaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sklan
I can see the codes in both the classes are same. So, just merge them into one and create a BuilderConfig class. You can see this, this, this, etc..

Just describe your directory structure here if it is possible, I mean how they differ or which folders each contains ?

@sklan
Copy link
Contributor Author

sklan commented Apr 6, 2019

@ChanchalKumarMaji
screenshot

For the 100k version the files are divided into subfolders from 0-9 with each subfolder containing 10,000 images.
So I had to add an extra for loop to _generate_examples in the CartoonSet100k version.
Now, that I look at it again I realise it might actually be trivial. I'll switch to BuilderConfig.

@ChanchalKumarMaji
Copy link
Contributor

@ChanchalKumarMaji
screenshot

For the 100k version the files are divided into subfolders from 0-9 with each subfolder containing 10,000 images.
So I had to add an extra for loop to _generate_examples in the CartoonSet100k version.
Now, that I look at it again I realise it might actually be trivial. I'll switch to BuilderConfig.

use self.builder_config.name to distinguish by using a simple if-else statements.

@sklan
Copy link
Contributor Author

sklan commented Apr 7, 2019

@ChanchalKumarMaji
I have updated to BuilderConfig.

use self.builder_config.name to distinguish by using a simple if-else statements.

I figured out a way to do that without an if-else statement.
I am however unsure about the tests. Could you have a look?

tensorflow_datasets/image/cartoonset.py Outdated Show resolved Hide resolved
tensorflow_datasets/image/cartoonset.py Outdated Show resolved Hide resolved
tensorflow_datasets/image/cartoonset_test.py Show resolved Hide resolved
Copy link
Contributor

@ChanchalKumarMaji ChanchalKumarMaji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sklan .

@ChanchalKumarMaji
Copy link
Contributor

@rsepassi @cyfra @Conchylicultor , I think this dataset is ready, please see. Thanks.

@Conchylicultor Conchylicultor added the dataset request Request for a new dataset to be added label Apr 25, 2019
@us
Copy link
Contributor

us commented Jul 3, 2019

@sklan can you solve the conflicts,

I think it's forgotten pr. Check please @Conchylicultor @cyfra @pierrot0

@sklan
Copy link
Contributor Author

sklan commented Jul 3, 2019

@sklan can you solve the conflicts,

I think it's forgotten pr. Check please @Conchylicultor @cyfra @pierrot0

Yea sure I'll look into fixing the merge conflicts

@cyfra
Copy link
Contributor

cyfra commented Jul 29, 2019

Seems that kokoro is failing wiith:

E ImportError: dlopen: cannot load any more object with static TLS
E It seems that scikit-image has not been built correctly.
E
E Your install of scikit-image appears to be broken.
E Try re-installing the package following the instructions at:
E https://scikit-image.org/docs/stable/install.html
E Tried importing %s but failed. See setup.py extras_require. The dataset you are trying to use may have additional dependencies.

@sklan
Copy link
Contributor Author

sklan commented Jul 30, 2019

Seems that kokoro is failing wiith:

E ImportError: dlopen: cannot load any more object with static TLS
E It seems that scikit-image has not been built correctly.
E
E Your install of scikit-image appears to be broken.
E Try re-installing the package following the instructions at:
E https://scikit-image.org/docs/stable/install.html
E Tried importing %s but failed. See setup.py extras_require. The dataset you are trying to use may have additional dependencies.

Seems that kokoro is failing wiith:

E ImportError: dlopen: cannot load any more object with static TLS
E It seems that scikit-image has not been built correctly.
E
E Your install of scikit-image appears to be broken.
E Try re-installing the package following the instructions at:
E https://scikit-image.org/docs/stable/install.html
E Tried importing %s but failed. See setup.py extras_require. The dataset you are trying to use may have additional dependencies.

I'll look into it

name, dtype = file.split('.')
if dtype == 'png':
image = tfds.core.lazy_imports.skimage.io.imread(
path + '/' + name + '.png')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should need to os.path.join(path, name), because / is not dynamic type.

DATASET_CLASS = cartoonset.Cartoonset
BUILDER_CONFIG_NAMES_TO_TEST = ["cartoonset100k"]
SPLITS = {
"train": 30, # Fake training examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 test images is too much 3-4 is enough for testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - there are far too many test files in this PR.

@cyfra cyfra added the tfds:please_review TFDS team: please review this PR. label Feb 8, 2020
DATASET_CLASS = cartoonset.Cartoonset
BUILDER_CONFIG_NAMES_TO_TEST = ["cartoonset100k"]
SPLITS = {
"train": 30, # Fake training examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - there are far too many test files in this PR.


import tensorflow as tf

import tensorflow_datasets as tfds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import tensorflow.compat.v2 as tf

# There is no predefined train/val/test split for this dataset.
path = dl_manager.manual_dir
if not tf.io.gfile.exists(path):
msg = 'You must download the dataset files manually and place them in: '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must also set MANUAL_DOWNLOAD_INSTRUCTIONS field (see other datasets like c4)

return [
tfds.core.SplitGenerator(
name=tfds.Split.TRAIN,
num_shards=2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove num_Shards

features_dict = dict()
name, dtype = file.split('.')
if dtype == 'png':
image = tfds.core.lazy_imports.skimage.io.imread(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to read using tf.gfile (to be compatible with non-local filesystems):

      with tf.io.gfile.GFile(os.path.join(root, fname), "rb") as png_f:
        mask = tfds.core.lazy_imports.cv2.imdecode(
            np.fromstring(png_f.read(), dtype=np.uint8), flags=0)

@cyfra cyfra removed the tfds:please_review TFDS team: please review this PR. label Feb 8, 2020
@cyfra cyfra added the author:please_respond Author - please respond to the recent comments. label Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author:please_respond Author - please respond to the recent comments. cla: yes Author has signed CLA dataset request Request for a new dataset to be added
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants