Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions research/deeplab/datasets/build_cityscapes_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,17 +113,23 @@ def _get_files(data, dataset_split):

Args:
data: String, desired data ('image' or 'label').
dataset_split: String, dataset split ('train', 'val', 'test')
dataset_split: String, dataset split ('train_fine', 'val_fine', 'test_fine')

Returns:
A list of sorted file names or None when getting label for
test set.
"""
if data == 'label' and dataset_split == 'test':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this, since test_fine does not have label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this, since test_fine does not have label.

The files themselves are there, but they don't contain useful information:

> find gtFine/train/ -type f -name '*.json' | wc -l
2975
> find gtFine/test/ -type f -name '*.json' | wc -l
1525

Example of a list of labels from the train set:

> grep label gtFine/train/zurich/zurich_000121_000019_gtFine_polygons.json | sort -u
            "label": "bicycle", 
            "label": "building", 
            "label": "car", 
            "label": "ego vehicle", 
            "label": "license plate", 
            "label": "motorcycle", 
            "label": "out of roi", 
            "label": "person", 
            "label": "pole", 
            "label": "rider", 
            "label": "road", 
            "label": "sidewalk", 
            "label": "sky", 
            "label": "static", 
            "label": "terrain", 
            "label": "traffic light", 
            "label": "traffic sign", 
            "label": "vegetation", 
            "label": "wall", 

Example of a list of labels from the test set:

> grep label gtFine/test/munich/munich_000152_000019_gtFine_polygons.json | sort -u
            "label": "ego vehicle", 
            "label": "out of roi", 

Basically, they are all the same in the test set (as far as I understand):

> openssl sha1 gtFine/test/munich/munich_00000*_000019_gtFine_polygons.json
SHA1(gtFine/test/munich/munich_000000_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000001_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000002_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000003_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000004_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000005_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000006_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000007_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000008_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946
SHA1(gtFine/test/munich/munich_000009_000019_gtFine_polygons.json)= 349dd446df6a4066cd47bc898b5d72360dad4946

If you add this line to build_cityscapes_data.py:

print('split: {}, images: {}, labels: {}'.format(dataset_split, num_images, num_labels))

Output will be:

split: train_fine, images: 2975, labels: 2975
split: val_fine, images: 500, labels: 500
split: test_fine, images: 1525, labels: 1525

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for double-checking it, ruslo!

return None
if dataset_split == 'train_fine':
split_dir = 'train'
elif dataset_split == 'val_fine':
split_dir = 'val'
elif dataset_split == 'test_fine':
split_dir = 'test'
else:
raise RuntimeError("Split {} is not supported".format(dataset_split))
pattern = '*%s.%s' % (_POSTFIX_MAP[data], _DATA_FORMAT_MAP[data])
search_files = os.path.join(
FLAGS.cityscapes_root, _FOLDERS_MAP[data], dataset_split, '*', pattern)
FLAGS.cityscapes_root, _FOLDERS_MAP[data], split_dir, '*', pattern)
filenames = glob.glob(search_files)
return sorted(filenames)

Expand All @@ -132,7 +138,7 @@ def _convert_dataset(dataset_split):
"""Converts the specified dataset split to TFRecord format.

Args:
dataset_split: The dataset split (e.g., train, val).
dataset_split: The dataset split (e.g., train_fine, val_fine).

Raises:
RuntimeError: If loaded image and label have different shape, or if the
Expand All @@ -152,7 +158,7 @@ def _convert_dataset(dataset_split):
label_reader = build_data.ImageReader('png', channels=1)

for shard_id in range(_NUM_SHARDS):
shard_filename = '%s_fine-%05d-of-%05d.tfrecord' % (
shard_filename = '%s-%05d-of-%05d.tfrecord' % (
dataset_split, shard_id, _NUM_SHARDS)
output_filename = os.path.join(FLAGS.output_dir, shard_filename)
with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
Expand Down Expand Up @@ -183,8 +189,8 @@ def _convert_dataset(dataset_split):


def main(unused_argv):
# Only support converting 'train' and 'val' sets for now.
for dataset_split in ['train', 'val']:
# Only support converting 'train_fine', 'val_fine' and 'test_fine' sets for now.
for dataset_split in ['train_fine', 'val_fine', 'test_fine']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it back to

for dataset_split in ['train_fine', 'val_fine']:

It seems that we need to do more changes in order to support test_fine (e.g., skip reading groundtruth).
Let's change it back, since this should be orthogonal to your this pull request.
Sorry about going back and forth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it back

I can change it back, but as I said, the vis.py is working fine. That's the only place where I can see the test_fine can be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, you are right. I totally forgot that Cityscapes has default groundtruth (all ignore labels) for test set.
Thanks for the clarification, ruslo!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Thanks for double-checking it, ruslo!

_convert_dataset(dataset_split)


Expand Down