oxford_flowers102 bad splits #3022

TeaPearce · 2021-02-06T22:12:03Z

The train/val/test splits in the tfds oxford_flowers102 don't match up with established splits.

Training on the train and val splits, only acheive around 91% accuracy with finetuning. Should acheive 98%, e.g. table 6 here. If one reads in the entire dataset and creates a random split, this is acheivable. Has also been noted on stackoverflow here.

jatin-code777 · 2021-02-07T06:42:56Z

This is not an issue within TFDS itself but perhaps a bug in the dataset itself.
I redownloaded the original dataset and confirmed that the splits given in the dataset match those in TFDS.

These also match with the table 6 in the paper:

In my opinion, the fix for this should come from the dataset itself.

vijayphoenix · 2021-02-07T06:57:25Z

A workaround would be to do something like this

>>> import tensorflow_datasets as tfds
>>> test, train, validation = tfds.load('oxford_flowers102', split=['train', 'test', 'validation'])

>>> sum(1 for _ in train)
6149

Perhaps, we can add a warning in the dataset description.

For warning example, see:

datasets/tensorflow_datasets/image_classification/imagenet_resized.py

Lines 41 to 46 in d9b91c5

    
           WARNING: The integer labels used are defined by the authors and do not match 
        
           those from the other ImageNet datasets provided by Tensorflow datasets. 
        
           See the original [label list](https://github.com/PatrykChrabaszcz/Imagenet32_Scripts/blob/master/map_clsloc.txt), 
        
           and the [labels used by this dataset](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/image_classification/imagenet_resized_labels.txt). 
        
           Additionally, the original authors 1 index there labels which we convert to 
        
           0 indexed by subtracting one.

Conchylicultor · 2021-02-08T09:14:23Z

TFDS provide the datasets as close as the original datasets authors. As pointed out above, TFDS splits match the splits as defined by the Oxford author. So I'm making this bug as working as intended.

Note: Our documentation already provide the number of examples: https://www.tensorflow.org/datasets/catalog/oxford_flowers102

Or programatically:

info = tfds.builder('oxford_flowers102)
info.split['test'].num_examples

Or

test, train, validation = tfds.load('oxford_flowers102', split=['train', 'test', 'validation'])
print(len(train))

TeaPearce · 2021-02-08T10:39:29Z

Thank all, sounds sensible. I don't know the history of when/how/why the dataset splits evolved, but wanted to document it somewhere.

vijayphoenix · 2021-02-08T12:06:26Z

You can find the splitting and slicing doc here
https://www.tensorflow.org/datasets/splits

TeaPearce added the bug Something isn't working label Feb 6, 2021

vijayphoenix added the contributions welcome label Feb 7, 2021

jatin-code777 mentioned this issue Feb 7, 2021

Add warning to oxford_flowers102 dataset #3023

Merged

Conchylicultor closed this as completed Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oxford_flowers102 bad splits #3022

oxford_flowers102 bad splits #3022

TeaPearce commented Feb 6, 2021

jatin-code777 commented Feb 7, 2021 •

edited

vijayphoenix commented Feb 7, 2021 •

edited

Conchylicultor commented Feb 8, 2021

TeaPearce commented Feb 8, 2021

vijayphoenix commented Feb 8, 2021

oxford_flowers102 bad splits #3022

oxford_flowers102 bad splits #3022

Comments

TeaPearce commented Feb 6, 2021

jatin-code777 commented Feb 7, 2021 • edited

vijayphoenix commented Feb 7, 2021 • edited

Conchylicultor commented Feb 8, 2021

TeaPearce commented Feb 8, 2021

vijayphoenix commented Feb 8, 2021

jatin-code777 commented Feb 7, 2021 •

edited

vijayphoenix commented Feb 7, 2021 •

edited