Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Caltech10k Web Faces dataset #324

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

mandroid6
Copy link

@googlebot googlebot added the cla: yes Author has signed CLA label Mar 24, 2019
"landmarks": {name: tf.int64 for name in LANDMARK_HEADINGS}
}),
urls=['http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/'],
citation=""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datasets must be have citation. Please add citation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

"""
class Caltech10K_WebFaces(tfds.core.GeneratorBasedBuilder):

VERSION = tfds.core.Version("0.1.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation should be 2 as in other files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"landmarks": {name: tf.int64 for name in LANDMARK_HEADINGS}
}),
urls=['http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/'],
citation=""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed now

name=tfds.Split.TRAIN,
num_shards=10,
gen_kwargs={
"file_id": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't use file_id why returning that ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove it after updating logic

name=tfds.Split.VALIDATION,
num_shards=4,
gen_kwargs={
"extracted_dirs": extracted_dirs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. It seems that you are passing the same gen_kwargs to your validations and test set. So all train/valid/test sets will contains the same examples no ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Conchylicultor some incorrect replication of logic on my end. This dataset doesn't have any original splits given, so will return a single tfds.Split.TRAIN split.

Fixed now

Copy link
Member

@Conchylicultor Conchylicultor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, unfortunately your tests are not working (ex: you don't have landmarks inside fake data,...). Please fix. Make sure the tests are running correctly.

@Conchylicultor Conchylicultor added the kokoro:run Run Kokoro tests label Mar 25, 2019
@kokoro-team kokoro-team removed the kokoro:run Run Kokoro tests label Mar 25, 2019
@Conchylicultor Conchylicultor added the dataset request Request for a new dataset to be added label Apr 25, 2019
@cyfra cyfra added the tfds:is_reviewing TFDS team: PTAL label Feb 8, 2020
from __future__ import print_function

import os
import tensorflow as tf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import tensorflow.compat.v2 as tf

return [
tfds.core.SplitGenerator(
name=tfds.Split.TRAIN,
num_shards=5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove - num_shards is no longer needed.

for file_name in sorted(files):
path = os.path.join(filedir, file_name)

yield {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the S3 is the default now - so you have to yield a tuple: of (file_name and everything else)


landmarks = self._process_caltech10k_config_file(landmarks_path)

for file_name in sorted(files):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doesn't have to be sorted.

row_values = line.strip().split()
# Each row start with the 'file_name' and then space-separated values.
values[row_values[0]] = [v for v in row_values[1:]]
return keys, values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain more what are the "keys" and "values" here..

keys should be the same as "LANDMARK_HEADINGS" right ?

SPLITS = {
"train": 10,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the test pass ? the CSV file is not present in the fake_examples dir.

@cyfra cyfra added author:please_respond Author - please respond to the recent comments. and removed tfds:is_reviewing TFDS team: PTAL labels Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author:please_respond Author - please respond to the recent comments. cla: yes Author has signed CLA dataset request Request for a new dataset to be added
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants