Skip to content

deeplab eval.py fails with assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] #8138

@jazberna1

Description

@jazberna1

Hello Tensorflow team,

This is my system information for the issus I have explained below:

System information

  • What is the top-level directory of the model you are using:
    models-master/research/deeplab
    I have the latest commit (6fb5646) of the default branch (master).
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Red Hat Enterprise Linux Server 7.5 (Maipo)
  • TensorFlow installed from (source or binary):
    binary
  • TensorFlow version (use command below):
    1.15.2
  • Bazel version (if compiling from source): No
  • CUDA/cuDNN version: No
  • GPU model and memory: No
  • Exact command to reproduce:

The exact command that fails:
python eval.py
--eval_crop_size='513,513'
--logtostderr
--eval_split="val"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--dataset="cells"
--checkpoint_dir=/mnt/lustre/LOGDIR
--eval_logdir=/mnt/lustre/LOGDIREVAL
--dataset_dir=/mnt/lustre/tfrecord

Describe the problem

I'm failing to run the eval.py script as above. The error I get is:

tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [3 3 3...] [y (mean_iou/Cast_1:0) = ] [3] [[node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert (defined at /jorgeenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Prior to that I have:
1. Created the files in the tfrecord folder using build_voc2012_data.py for the --dataset_dir argument of both train.py and eval.py
My original images are 500X333 png files. The corresponding masks are 500X333 indexed png
files. There are three indexes 0,1,2, where 0 is the background. For testing purposes I have two images, one for training and one for validation. I have uploaded an example. Therefore in the datasets/data_generator.py script I have added:
_CELLS_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1,
'trainval': 2,
'val': 1,
},
num_classes=3,
ignore_label=0
)

_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'ade20k': _ADE20K_INFORMATION,
'cells': _CELLS_INFORMATION,
}

2. successfully run the train.py script like this:
python train.py
--initialize_last_layer=False
--last_layers_contain_logits_only=False
--logtostderr
--dataset="cells"
--training_number_of_steps=1
--train_split="train"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--train_crop_size="513,513"
--train_batch_size=1
--tf_initial_checkpoint=/mnt/lustre/xception/model.ckpt
--train_logdir=/mnt/lustre/LOGDIR
--dataset_dir=/mnt/lustre/tfrecord

3. run the eval.py script like this, which produces the error:
python eval.py
--eval_crop_size='513,513'
--logtostderr
--eval_split="val"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--dataset="cells"
--checkpoint_dir=/mnt/lustre/LOGDIR
--eval_logdir=/mnt/lustre/LOGDIREVAL
--dataset_dir=/mnt/lustre/tfrecord

The error again is:
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [3 3 3...] [y (mean_iou/Cast_1:0) = ] [3] [[node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert (defined at /jorgeenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Many Thanks
Jorge
Screenshot 2020-02-14 at 10 25 42

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions