New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Caltech10k Web Faces dataset #324
base: master
Are you sure you want to change the base?
Conversation
"landmarks": {name: tf.int64 for name in LANDMARK_HEADINGS} | ||
}), | ||
urls=['http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/'], | ||
citation="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datasets must be have citation. Please add citation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
""" | ||
class Caltech10K_WebFaces(tfds.core.GeneratorBasedBuilder): | ||
|
||
VERSION = tfds.core.Version("0.1.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation should be 2 as in other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"landmarks": {name: tf.int64 for name in LANDMARK_HEADINGS} | ||
}), | ||
urls=['http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/'], | ||
citation="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed now
name=tfds.Split.TRAIN, | ||
num_shards=10, | ||
gen_kwargs={ | ||
"file_id": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't use file_id why returning that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to remove it after updating logic
name=tfds.Split.VALIDATION, | ||
num_shards=4, | ||
gen_kwargs={ | ||
"extracted_dirs": extracted_dirs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. It seems that you are passing the same gen_kwargs to your validations and test set. So all train/valid/test sets will contains the same examples no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Conchylicultor some incorrect replication of logic on my end. This dataset doesn't have any original splits given, so will return a single tfds.Split.TRAIN
split.
Fixed now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, unfortunately your tests are not working (ex: you don't have landmarks inside fake data,...). Please fix. Make sure the tests are running correctly.
from __future__ import print_function | ||
|
||
import os | ||
import tensorflow as tf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import tensorflow.compat.v2 as tf
return [ | ||
tfds.core.SplitGenerator( | ||
name=tfds.Split.TRAIN, | ||
num_shards=5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove - num_shards is no longer needed.
for file_name in sorted(files): | ||
path = os.path.join(filedir, file_name) | ||
|
||
yield { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the S3 is the default now - so you have to yield a tuple: of (file_name and everything else)
|
||
landmarks = self._process_caltech10k_config_file(landmarks_path) | ||
|
||
for file_name in sorted(files): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: doesn't have to be sorted.
row_values = line.strip().split() | ||
# Each row start with the 'file_name' and then space-separated values. | ||
values[row_values[0]] = [v for v in row_values[1:]] | ||
return keys, values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain more what are the "keys" and "values" here..
keys should be the same as "LANDMARK_HEADINGS" right ?
SPLITS = { | ||
"train": 10, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the test pass ? the CSV file is not present in the fake_examples dir.
Dataset Request #318
@rsepassi @cyfra @Conchylicultor