-
Notifications
You must be signed in to change notification settings - Fork 45.3k
Description
Hello Tensorflow team,
This is my system information for the issus I have explained below:
System information
- What is the top-level directory of the model you are using:
models-master/research/deeplab
I have the latest commit (6fb5646) of the default branch (master). - Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No - OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Red Hat Enterprise Linux Server 7.5 (Maipo) - TensorFlow installed from (source or binary):
binary - TensorFlow version (use command below):
1.15.2 - Bazel version (if compiling from source): No
- CUDA/cuDNN version: No
- GPU model and memory: No
- Exact command to reproduce:
The exact command that fails:
python eval.py
--eval_crop_size='513,513'
--logtostderr
--eval_split="val"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--dataset="cells"
--checkpoint_dir=/mnt/lustre/LOGDIR
--eval_logdir=/mnt/lustre/LOGDIREVAL
--dataset_dir=/mnt/lustre/tfrecord
Describe the problem
I'm failing to run the eval.py script as above. The error I get is:
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [3 3 3...] [y (mean_iou/Cast_1:0) = ] [3] [[node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert (defined at /jorgeenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
Prior to that I have:
1. Created the files in the tfrecord folder using build_voc2012_data.py for the --dataset_dir argument of both train.py and eval.py
My original images are 500X333 png files. The corresponding masks are 500X333 indexed png
files. There are three indexes 0,1,2, where 0 is the background. For testing purposes I have two images, one for training and one for validation. I have uploaded an example. Therefore in the datasets/data_generator.py script I have added:
_CELLS_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1,
'trainval': 2,
'val': 1,
},
num_classes=3,
ignore_label=0
)
_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'ade20k': _ADE20K_INFORMATION,
'cells': _CELLS_INFORMATION,
}
2. successfully run the train.py script like this:
python train.py
--initialize_last_layer=False
--last_layers_contain_logits_only=False
--logtostderr
--dataset="cells"
--training_number_of_steps=1
--train_split="train"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--train_crop_size="513,513"
--train_batch_size=1
--tf_initial_checkpoint=/mnt/lustre/xception/model.ckpt
--train_logdir=/mnt/lustre/LOGDIR
--dataset_dir=/mnt/lustre/tfrecord
3. run the eval.py script like this, which produces the error:
python eval.py
--eval_crop_size='513,513'
--logtostderr
--eval_split="val"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--dataset="cells"
--checkpoint_dir=/mnt/lustre/LOGDIR
--eval_logdir=/mnt/lustre/LOGDIREVAL
--dataset_dir=/mnt/lustre/tfrecord
The error again is:
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [3 3 3...] [y (mean_iou/Cast_1:0) = ] [3] [[node mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert (defined at /jorgeenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
