Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deeplab] Training deeplab model with ADE20K dataset #3730

Open
walkerlala opened this Issue Mar 24, 2018 · 78 comments

Comments

Projects
None yet
@walkerlala
Copy link
Contributor

walkerlala commented Mar 24, 2018

System information

  • What is the top-level directory of the model you are using: deeplab
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.6.0
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version: 9.0/7.0.4
  • GPU model and memory: 1080Ti * 2 , 10Gb * 2
  • Exact command to reproduce:

Describe the problem

This is a feature request. I am trying to train the deeplab model with the ADE20K dataset (see this presentation). I have finished the data format conversion and "successfully" train the model on a small subset of ADE20K. Below is the modification to file research/deeplab/datasets/segmentation_dataset.py which is used to extract segmentation data.

diff --git a/research/deeplab/datasets/segmentation_dataset.py b/research/deeplab/datasets/segmentation_dataset.py
index a777252..8648fb2 100644
--- a/research/deeplab/datasets/segmentation_dataset.py
+++ b/research/deeplab/datasets/segmentation_dataset.py
@@ -85,10 +85,22 @@ _PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
     ignore_label=255,
 )
 
+_ADE20K_INFORMATION = DatasetDescriptor(
+    splits_to_sizes = {
+        'train': 40,
+        'val': 5,
+    },
+    # TODO temporarily change it to 21 otherwise dimension mismatch
+    num_classes=21,
+    ignore_label=255,
+)
+
 
 _DATASETS_INFORMATION = {
     'cityscapes': _CITYSCAPES_INFORMATION,
     'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
+    'ade20k': _ADE20K_INFORMATION,
 }
 
 # Default file pattern of TFRecord of TensorFlow Example.

The problem is, in the ADE20K dataset there are 150 classes, which is different from that in the VOC or cityspace dataset. That brings problem w.r.t the checkpoint file. Currently there are only pretrained model on the VOC and cityspace dataset. So we have two choices here:

  1. Do not use the checkpoint file. In this case, there is an error:
absl.flags._exceptions.IllegalFlagValueError: flag --tf_initial_checkpoint=None: Flag --tf_initial_checkpoint must be specified.
  1. set num_classes=21 to use those two provided checkpoint files

Are there any alternatives to these?

If anyone have any workable solution for the ADE20K dataset it would be really appreciated.

@aquariusjay

This comment has been minimized.

Copy link
Contributor

aquariusjay commented Mar 24, 2018

  1. You could modify the code here so that the exclude_list only includes the `_LOGITS_SCOPE_NAME' and also set the flag initialize_last_layer = False. (Note you still want to restore the variables in ASPP, decoder and so on). By doing so, only the weights in the last classification layer is not initialized (then you could use a classification layer with 150 classes).

  2. You need to explore the min_resize_value and max_resize_value (set resize_factor = output_stride) for ADE20K which contains images of huge various scales (e.g., dimension ranges from 50 to 2000). In that case, by setting min_resize_value and max_resize_value, you are able to resize the images on-the-fly to the similar range (or you could do that manually by yourself while pre-processing the dataset). Note however these hyper-parameters may affect the performance, and we have not yet explored that carefully.

@walkerlala

This comment has been minimized.

Copy link
Contributor Author

walkerlala commented Mar 25, 2018

@aquariusjay Thanks for the hints. Now I have started the training, using the provided VOC model checkpoint, setting fine_tune_batch_norm to False, using the mobilenet_v2 variant and a batch size of 8. Hopefully that the loss will drop after several hours...

There are still two things confusing me:

  1. the segmentation annotation images within the ADE20K dataset have trhee channels, but I am reading it with label_reader = build_data.ImageReader('png', channels=1) , as for what we have done for the VOC dataset (in datasets/build_voc2012_data.py). Will that be a problem?

  2. why do we have the resize_factor parameters?

@walkerlala

This comment has been minimized.

Copy link
Contributor Author

walkerlala commented Mar 25, 2018

Oh, will it be OK to prepare a pull request for the ADE20K dataset?

@aquariusjay

This comment has been minimized.

Copy link
Contributor

aquariusjay commented Mar 26, 2018

Regarding your previous questions:

  1. The groundtruth images should contain only 1 channel with values = semantic labels.
  2. You could check the code for details.

We currently do not have any plan to prepare that.
However, note that one should be able to do that by using the provided code/model/script.
Also, any contributions for extra dataset to the codebase is welcome.

Cheers,

@brett-whitford

This comment has been minimized.

Copy link

brett-whitford commented Mar 30, 2018

@aquariusjay,

I'm currently having similar issues attempting to train with a custom dataset and was hoping you could offer some insight.

You could modify the code here so that the exclude_list only includes the `_LOGITS_SCOPE_NAME' and also set the flag initialize_last_layer = False.

The link you included "here" appears to need a Google SSO to login. I am assuming that was a link to the train_util.py script. Here are the changes I have currently made to implement your architecture on my custom dataset:

  1. segmentation_dataset.py
  • I added the information for my "toy_dataset"
_TOY_DATASET_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 800,
        'trainval': 1000,
        'val': 200,
    },
    num_classes=10,
    ignore_label=255,
)

_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'toy_dataset': _TOY_DATASET_INFORMATION,
}
  1. train.py
  • I do not initialize the final layer of the network.
  • I point training to the directory containing my custom "toy_dataset"
flags.DEFINE_boolean('initialize_last_layer', False,
                     'Initialize the last layer.')

flags.DEFINE_string('dataset', 'toy_dataset',
                    'Name of the segmentation dataset.')
  1. train_utils.py
  • I modify the code here so that the exclude_list only includes the `_LOGITS_SCOPE_NAME', as you stated above.
  exclude_list = ['_LOGITS_SCOPE_NAME']
  if not initialize_last_layer:
    exclude_list.extend(last_layers)
  1. eval.py
  • I point evaluation to my custom "toy_dataset".
flags.DEFINE_string('dataset', 'toy_dataset',
                    'Name of the segmentation dataset.')

However, when I run this my code appears to successfully train, but then running into an issues with the the confusion matrix during evaluation (I include the traceback below for reference). Any tips/suggestions on how to fix this?

Thanks for your help!
Brett

Error Traceback:

~/brett/wss-python/models/research/deeplab$ sh local_test_custom.sh 
Converting toy dataset...
>> Converting image 50/200 shard 0
>> Converting image 100/200 shard 1
>> Converting image 150/200 shard 2
>> Converting image 200/200 shard 3
>> Converting image 250/1000 shard 0
>> Converting image 500/1000 shard 1
>> Converting image 750/1000 shard 2
>> Converting image 1000/1000 shard 3
>> Converting image 200/800 shard 0
>> Converting image 400/800 shard 1
>> Converting image 600/800 shard 2
>> Converting image 800/800 shard 3
--2018-03-30 12:33:03--  http://download.tensorflow.org/models/deeplabv3_pascal_train_aug_2018_01_04.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.217.8.176, 2607:f8b0:4009:80d::2010
Connecting to download.tensorflow.org (download.tensorflow.org)|172.217.8.176|:80... connected.
HTTP request sent, awaiting response... 416 Requested range not satisfiable

    The file is already fully retrieved; nothing to do.

toy_dataset
INFO:tensorflow:Training on trainval set
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/losses/losses_impl.py:731: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
INFO:tensorflow:Ignoring initialization; other checkpoint exists
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py:736: __init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
INFO:tensorflow:Restoring parameters from /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train/model.ckpt-11
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 11.
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
toy_dataset
INFO:tensorflow:Evaluating on val set
INFO:tensorflow:Performing single-scale test.
INFO:tensorflow:Eval num images 200
INFO:tensorflow:Eval batch size 1 and num batch 200
INFO:tensorflow:Waiting for new checkpoint at /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train
INFO:tensorflow:Found new checkpoint at /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train/model.ckpt-12
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/training/python/training/evaluation.py:303: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /home/makbar/brett/wss-python/models/research/deeplab/datasets/toy_dataset/exp/train_on_trainval_set/train/model.ckpt-12
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting evaluation at 2018-03-30-16:35:58
Traceback (most recent call last):
  File "/home/makbar/brett/wss-python/models/research/deeplab/eval.py", line 175, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/makbar/brett/wss-python/models/research/deeplab/eval.py", line 168, in main
    eval_interval_secs=FLAGS.eval_interval_secs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/evaluation.py", line 301, in evaluation_loop
    timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/training/python/training/evaluation.py", line 452, in evaluate_repeatedly
    session.run(eval_ops, feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 546, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1022, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1113, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1098, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1170, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 950, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1355, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [255 255 255...] [y (mean_iou/ToInt64_2:0) = ] [10]
	 [[Node: mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_2)]]

Caused by op u'mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert', defined at:
  File "/home/makbar/brett/wss-python/models/research/deeplab/eval.py", line 175, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/makbar/brett/wss-python/models/research/deeplab/eval.py", line 142, in main
    predictions, labels, dataset.num_classes, weights=weights)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/metrics_impl.py", line 1009, in mean_iou
    num_classes, weights)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/metrics_impl.py", line 263, in _streaming_confusion_matrix
    labels, predictions, num_classes, weights=weights, dtype=dtypes.float64)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/confusion_matrix.py", line 183, in confusion_matrix
    message='`predictions` out of bound')],
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/check_ops.py", line 579, in assert_less
    return control_flow_ops.Assert(condition, data, summarize=summarize)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 118, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 177, in Assert
    guarded_assert = cond(condition, no_op, true_assert, name="AssertGuard")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2027, in cond
    orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1868, in BuildCondBranch
    original_result = fn()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 175, in true_assert
    condition, data, summarize, name="Assert")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 48, in _assert
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [`predictions` out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [255 255 255...] [y (mean_iou/ToInt64_2:0) = ] [10]
	 [[Node: mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_2, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_1, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/data_4, mean_iou/confusion_matrix/assert_less_1/Assert/AssertGuard/Assert/Switch_2)]]
@walkerlala

This comment has been minimized.

Copy link
Contributor Author

walkerlala commented Mar 31, 2018

@wonderit

This comment has been minimized.

Copy link
Contributor

wonderit commented Apr 3, 2018

@walkerlala

I am trying to train the deeplab model with the ADE20k datasets.
I'm having some problem with data format conversion.
Would you mind sharing the code for ADE20k datasets? It would be really appreciated.

@shipengai

This comment has been minimized.

Copy link

shipengai commented Apr 3, 2018

@brett-whitford When I use my data .I have the same error with you . Can you share your solution?
Thank you very much .I 'm looking forword to your reply

@walkerlala

This comment has been minimized.

Copy link
Contributor Author

walkerlala commented Apr 3, 2018

@wonderit Of course. Please wait for a while until I have access to my GPU server.

@walkerlala

This comment has been minimized.

Copy link
Contributor Author

walkerlala commented Apr 3, 2018

@wonderit Here is the patch for converting training data and training deeplabv3 on ADE20K.

https://gist.github.com/walkerlala/82d978e68407e65158e8825cd470d7e1

(it can also be found at http://fastdrivers.org/misc/patch-for-ade20k.patch )

You can apply this patch on top of commit 1d38a22 or 5281c9a without conflict.

Note:

  1. you can to manually adjust the path in train_ade20k.py for training and supply correct path of the training data for converting the data, as documented in the doc

  2. training data can be found at: http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip

I am also going to submit a PR to get these into the repo. However, I don't have enough GPU to get a good pretrained model (only get two Nvidia 1080...) If you can obtain a decent pretrained model, please share!

@walkerlala

This comment has been minimized.

Copy link
Contributor Author

walkerlala commented Apr 3, 2018

Also, anyone interested in add ADE20K to deeplabv3 can take a look at this PR I just created: #3853

@shipengai

This comment has been minimized.

Copy link

shipengai commented Apr 4, 2018

@walkerlala When use val.py, did you have the error 'predictions' out of bound?just same with the @brett-whitford ' question.
Thank you

@shipengai

This comment has been minimized.

Copy link

shipengai commented Apr 8, 2018

@walkerlala Can you share your eval script?

@hhwxxx

This comment has been minimized.

Copy link

hhwxxx commented Apr 8, 2018

@walkerlala @aquariusjay
Hi, I am confused about the exclude_list and initialize_last_layer.

I am not sure whether I understand it correctly:
If one want to fine-tune deeplab-v3+ on another dataset, only _LOGITS_SCOPE_NAME need to be excluded?

If so, following @aquariusjay 's suggestion, in "train_utils.py":

exclude_list = [_LOGITS_SCOPE_NAME]
if not initialize_last_layer:
    exclude_list.extend(last_layers)

if set initialize_last_layer=false, then exclude_list will include the last_layers. In "train.py" last_layers is the list [_LOGITS_SCOPE_NAME, _IMAGE_POOLING_SCOPE, _ASPP_SCOPE, _CONCAT_PROJECTION_SCOPE, _DECODER_SCOPE, ].
So all variables in the list will be excluded. This seems inconsistent.

Shouldn't it be the following?
initialize_last_layer=true and exclude_list = [_LOGITS_SCOPE_NAME]

@lydialixia

This comment has been minimized.

Copy link

lydialixia commented Apr 9, 2018

Hi, I'm training on my own dataset as well (only two classes).

When I set initialize_last_layer=false and

exclude_list = ['logits']
if not initialize_last_layer:
    exclude_list.extend(last_layers)

Then when I run vis.py, it gives me all black images (not binary).

When I only set initialize_last_layer=false, I got binary images (result is not good, but at least show some learning). But it gives me this when run train.py:

INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 6390723.
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.

when training_number_of_steps=100000

Anyone knows why this happens? Thanks!

@hhwxxx

This comment has been minimized.

Copy link

hhwxxx commented Apr 10, 2018

@lydialixia
Hello.
You should add 'global_step' in exclude_list:

exclude_list = ['global_step']

But I am still confused about whether one should set initialize_last_layer=false when to fine-tune deeplab-v3+ on another task.

@aquariusjay

This comment has been minimized.

Copy link
Contributor

aquariusjay commented Apr 10, 2018

When you want to fine-tune DeepLab on other datasets, there are a few cases:

  1. You want to re-use ALL the trained weigths: set initialize_last_layer = True (last_layers_contain_logits_only does not matter in this case).

  2. You want to re-use ONLY the network backbone (i.e., exclude ASPP, decoder and so on): set initialize_last_layer = False and last_layers_contain_logits_only = False.

  3. You want to re-use ALL the trained weights EXCEPT the logits (since the num_classes may be different): set initialize_last_layer = False and last_layers_contain_logits_only = True.

@georgosgeorgos

This comment has been minimized.

Copy link

georgosgeorgos commented Apr 10, 2018

Hi @walkerlala: did you manage to finetune the ADE20K dataset?
I'm trying to finetune on a dataset of the same size, but without success: after the first ~2K iterations the loss stops to decrease and starts to oscillate (~20K iterations).
I tried different learning rates, removed the regularization, but for the moment no improvement.

@walkerlala

This comment has been minimized.

Copy link
Contributor Author

walkerlala commented Apr 12, 2018

@georgosgeorgos No I can't eventually fine tune the model on ADE20K dataset. I don't have enough GPU. Every time I try to fine tune the batch normalization parameters the model blow up throwing out out-of-memory error. So I freeze the batch normalization layers when training. Finally I only got a model with only "modest" performance:

Here is the original image (too large to display here): http://www.fastdrivers.org/misc/stuffseg-origin.jpg

Here is the segmentation result:
result

However I can get a satisfying result with PSPNet:

mmexport_1_473_seg

According to the slides from the 2017 Coco + Places Workshop, deeplabv3 should also be able to do that, but I haven't got any luck to fine-tune that. Hopefully Google can provide a fine-tuned pre-trained model in the future @aquariusjay .

@cfosco

This comment has been minimized.

Copy link

cfosco commented Apr 15, 2018

@brett-whitford - Hi Brett, I am having the exact same problem as you. How did you end up solving it?

@cfosco

This comment has been minimized.

Copy link

cfosco commented Apr 15, 2018

@shipeng-uestc - Hi shipeng, did you manage to solve the issue? I am currently using exclude_list=[_LOGITS_SCOPE_NAME] with _LOGITS_SCOPE_NAME imported from deeplab.model as @walkerlala suggested but I am still having the same error as Brett.

@jiyongma

This comment has been minimized.

Copy link

jiyongma commented Apr 16, 2018

when I run
python deeplab/eval.py
--logtostderr
--eval_split="val"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--eval_crop_size=513
--eval_crop_size=513
--dataset="ade20k"
--checkpoint_dir="./deeplab/datasets/ADE20K/exp/train_on_train_set/train"
--eval_logdir="./deeplab/datasets/ADE20K/exp/train_on_train_set/eval"
--dataset_dir="./deeplab/datasets/ADE20K/tfrecord"

NotFoundError (see above for traceback): Key aspp1_depthwise/BatchNorm/beta not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_299 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
please help me !!!thanks

@qmy612

This comment has been minimized.

Copy link

qmy612 commented Apr 19, 2018

@hhwxxx Hello, in your answer to lydialixia, do you mean in train_util.py, exclude_list should be like this:
exclude_list = ['global_step']
exclude_list = ['logits']

but I still can't start training, the information is:
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 30000.
INFO:tensorflow:Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.

I have also tried exclude_list = ['_LOGITS_SCOPE_NAME'], this doesn't work.
When just set exclude_list = ['global_step'], the model will achieve mean iu = 0.93 after 10000 iteractions, I don't know whether this is wrong.
Waitting online, thank you!

@hhwxxx

This comment has been minimized.

Copy link

hhwxxx commented Apr 19, 2018

@qmy612

Hello. Maybe you can try this:
exclude_list = ['global_step', 'logits']

As to the _LOGITS_SCOPE_NAME, it is defined in "model.py", so you should use like this: model._LOGITS_SCOPE_NAME.

And I have no idea about miou=0.93.

@BeSlower

This comment has been minimized.

Copy link

BeSlower commented May 1, 2018

Just set set initialize_last_layer = False and last_layers_contain_logits_only = True works for me, if you wanna train on your own dataset with different num classes.

@holyprince

This comment has been minimized.

Copy link

holyprince commented May 3, 2018

@BeSlower , yes, the solution is work for me but there is another problem that the result is all black and no other label , but during the training process , the loss is decrease. Can anyone help me ?

@xianshunw

This comment has been minimized.

Copy link

xianshunw commented May 5, 2018

@qmy612 Did you get the problem solved? I am having the exacting problem as you

@qmy612

This comment has been minimized.

Copy link

qmy612 commented May 6, 2018

@xiangjinwu Yes, the answer of hhwxxx is work.
exclude_list = ['global_step', 'logits']

@bleedingfight

This comment has been minimized.

Copy link

bleedingfight commented Jun 10, 2018

@Soulempty
Hi,Soulempty,I train to train deeplabv3 on my dataset,but something wrong,I don't know why,please help me.
My environment;
cuda V9.0.176,cudnn 7.0,tensorflow-gpu 1.8,Titan X x8
My data:20000 num picture,2labels,object is lane(4 pixle width ),other is background.Every picture size is[512,512],segmentation annotation picture is grayscale picture,lane pix is 1,other's(background) is 0.
my alter include:

    splits_to_sizes={
        'train': 18000,
        'trainval': 20000,
        'val': 2000,
    },
    num_classes=2,
    ignore_label=255,
)```
and add:
```_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'lane_seg': _LANE_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
}

train:

NUM_ITERATIONS=10000
python "${WORK_DIR}"/train.py \
  --logtostderr \
  --initialize_last_layer=False \
  --num_clones=1 \
  --last_layers_contain_logits_only=False \
  --dataset='lane_seg' \
  --train_split="trainval" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=4 \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --fine_tune_batch_norm=true \
  --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${LANE_DATASET}"

python "${WORK_DIR}"/eval.py \
  --logtostderr \
  --eval_split="val" \
  --dataset="lane_seg" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --eval_crop_size=513 \
  --eval_crop_size=513 \
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --eval_logdir="${EVAL_LOGDIR}" \
  --dataset_dir="${LANE_DATASET}" \
  --max_number_of_evaluations=1

python "${WORK_DIR}"/vis_lane.py \
  --logtostderr \
  --vis_split="val" \
  --dataset="lane_seg" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --vis_crop_size=513 \
  --vis_crop_size=513 \
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --vis_logdir="${VIS_LOGDIR}" \
  --dataset_dir="${LANE_DATASET}" \
  --max_number_of_iterations=1
python "${WORK_DIR}"/export_model.py \
  --logtostderr \
  --checkpoint_path="${CKPT_PATH}" \
  --export_path="${EXPORT_PATH}" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --num_classes=2 \
  --crop_size=513 \
  --crop_size=513 \
  --inference_scales=1.0

I alter the train_utils.py's exclude_list=['global_step',logits]
when i train the dataset,i error list:

Input checkpoint '/home/mc12/models/research/deeplab/datasets/lane_seg/exp/train_on_trainval_set/train/model.ckpt-10000' doesn't exist!

my questions is:

  • why i set NUM_ITERATIONS=10000,but when i run train script,it with not train,and use the weight of script download fron internet,model.3000.
INFO:tensorflow:Restoring parameters from /home/mc12/models/research/deeplab/datasets/lane_seg/exp/train_on_trainval_set/train/model.ckpt-30005

  • when I vis the picture,The result is like:
    i think it only use pretrain of voc to inference,so the result is so bad,Is my opinion right?
  • my dataset mybe the data samples may be strongly biased to background,How can i finetune the weight of loss ?
  • @parachutel

Your num_classes should be greater than the max pixel value in the images
I don't understand it,my label(0:background,1:lane),This means i should set num_class is 1,i shoud set 2,3,4,...,(In my case,i set it to 2 for background and lane)

Thanks for your help!

@wenouyang

This comment has been minimized.

Copy link

wenouyang commented Jun 10, 2018

@hhwxxx,

Would you like to explain more on your previous answer in this thread. What does logits here stand for? Thanks.

**@qmy612

Hello. Maybe you can try this:
exclude_list = ['global_step', 'logits']**

@Soulempty

This comment has been minimized.

Copy link

Soulempty commented Jun 11, 2018

The line "tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" decide what weight it use.
You can set class to 2,and label of lane is 0,background is1

@hhwxxx

This comment has been minimized.

Copy link

hhwxxx commented Jun 11, 2018

@wenouyang hello.
logits is the last feature maps before softmax.

Maybe this can help you.

The vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class.

@Blackpassat

This comment has been minimized.

Copy link

Blackpassat commented Jun 12, 2018

@Soulempty Hi, have you solved the issue of assigning different weights for different classes? I have tried the advice from the contributor, but there seems to be something wrong as my loss keeps oscillating around 0.11.

@weehe91325

This comment has been minimized.

Copy link

weehe91325 commented Jun 19, 2018

Hi

Currently I’m struggling with improving the results using deeplab trained on my own dataset.
I’ve trained deeplab successfully a few times using different pretrained models from the model zoo, all based on xception_65, but my results keep staying in the same miou range, somewhere around this interval [10, 11].
I have only one GPU at my disposal with 11GB GPU memory.
My dataset has 8 classes with various object sizes, from little to big, and is quite unbalanced.
Here are the label weights: [1, 4, 4, 17, 42, 36, 19, 20].
In my dataset I have 757 instances for training and 100 validation.
I’ve tried to adjust parameters like learning rate, last_layer_gradient_multiplier, weight decay.
I’ve also tried some kind of weighting using the above weights in this formula

weights = tf.to_float(tf.equal(scaled_labels, 0)) * 1 +
tf.to_float(tf.equal(scaled_labels, 1)) * 4 +
tf.to_float(tf.equal(scaled_labels, 2)) * 4 +
tf.to_float(tf.equal(scaled_labels, 3)) * 17 +
tf.to_float(tf.equal(scaled_labels, 4)) * 42 +
tf.to_float(tf.equal(scaled_labels, 5)) * 36 +
tf.to_float(tf.equal(scaled_labels, 6)) * 19 +
tf.to_float(tf.equal(scaled_labels, 7)) * 20 +
tf.to_float(tf.equal(scaled_labels, ignore_label)) * 0.0

but it turns out the algorithm won’t converge.
I’ve trained without fine tuning the batch normalization parameters. Although I tried training those parameters with a 321 crop size in order to be able to fit a batch size of 12 in my GPU.
I’ve tried training on various sizes 321, 513, 769.
The point being I need some tips to figure out what I can do to improve those results.
What do you guys think? Do I need more data in order to increase my miou or hardware?

@shanyucha

This comment has been minimized.

Copy link

shanyucha commented Jun 19, 2018

@weehe91325 i'm afraid you are doing it wrong. if the ratio between classes are 1:4 for example, then the weight should be 4:1 instead of 1:4.

@weehe91325

This comment has been minimized.

Copy link

weehe91325 commented Jun 19, 2018

@shanyucha my bad, I wanted to say label weights not ratios. I updated the comment.

@xm1112

This comment has been minimized.

Copy link

xm1112 commented Sep 20, 2018

Hello, I input the picture of 513*513, there is the following error. How can I solve it?
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [320] rhs shape= [2048] [[Node: save/Assign_8 = Assign[T=DT_FLOAT, _class=["loc:@aspp1_depthwise/BatchNorm/beta"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](aspp1_depthwise/BatchNorm/beta, save/RestoreV2:8)]]

@apolo74

This comment has been minimized.

Copy link

apolo74 commented Oct 25, 2018

Hi guys,
It's now been a couple of weeks trying to adapt DeepLabV3+ to a TestSet with two classes: circles and squares, below you can see one example of my images and the respective annotation.

I've tried all instructions and suggestions from this thread but all I get are black images as output... I really don't understand where and what I'm doing wrong so I hope you guys can give me a hand. Below I'll summarize all my info:

1. TestSet

I created 1000 images and split 800 for training and 200 for validation. Images are 512x512 RGB (3 channels) JPGs and annotations are 512x512 gray (single channel) PNGs where circles have an intensity of 250, squares 150 and background 0. The dataset folder is organized in the same fashion as ADE20K:

2. datasets > segmentation_dataset.py

When setting ignore_label = 0 predictions come back as a red or green image; and when setting ignore_label = 255 predictions come back as a black image

_TESTSET_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 800,  # num of samples in images/training
        'val': 200,  # num of samples in images/validation
    },
    num_classes=3,
    ignore_label=255,
)
_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
    'testset': _TESTSET_INFORMATION,
}

3. utils > train_utils.py

  # Variables that will not be restored.
  exclude_list = ['logits']

Although I've also tried exclude_list = ['global_step'] and exclude_list = ['global_step', 'logits'] but the results are the same.
I've also played a little bit with line 72 and the loss_weight as suggested by @aquariusjay but that doesn't seem to help either.

weights = tf.to_float(tf.equal(scaled_labels, 0)) * 1.0 + 
          tf.to_float(tf.equal(scaled_labels, 1)) * 2.0 + 
          tf.to_float(tf.equal(scaled_labels, ignore_label)) * 0.0
not_ignore_mask = tf.to_float(tf.not_equal(scaled_labels, ignore_label)) * weights
# === original lines ===
# not_ignore_mask = tf.to_float(tf.not_equal(scaled_labels, ignore_label)) * loss_weight

4. eval.py

I added the lines suggested HERE to solve the ['predictions' out of bound] error in the evaluation stage:

# Define the evaluation metric.
metric_map = {}
# ============ Added by B.A.D. =====================
indices = tf.squeeze( tf.where( tf.less_equal(labels, dataset.num_classes-1) ), 1 )
labels = tf.cast( tf.gather( labels, indices ), tf.int32 )
predictions = tf.gather( predictions, indices )
# ==============================================
metric_map[predictions_tag] = tf.metrics.mean_iou(
        predictions, labels, dataset.num_classes, weights=weights)

5. main_TestSet.sh

The relevant sections of this shell script are shown below. Training start with a loss around 1.32 and at the end of 5000 iterations gets to 0.31. It performs the evaluation stage without errors or warnings and at the end it prints: miou_1.0[1]. Finally, it runs the visualization script with the 200 images but all predictions are just black images. Please give me a hand!

NUM_ITERATIONS=5000
CKPT_NAME="deeplabv3_mnv2_pascal_train_aug"
CKPT_PATH="${TRAIN_LOGDIR}/model.ckpt-${NUM_ITERATIONS}"
EXPORT_PATH="${EXPORT_DIR}/frozen_inference_graph.pb"

# === Train the network ===
python "${WORK_DIR}"/train.py \
  --logtostderr \
  --train_split="train" \
  --model_variant="mobilenet_v2" \
  --output_stride=16 \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=4 \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --dataset="testset" \
  --tf_initial_checkpoint="${INIT_FOLDER}/${CKPT_NAME}/model.ckpt-30000.index"\
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${FINAL_DATASET}" \
  --initialize_last_layer=False \
  --last_layers_contain_logits_only=False \
  --fine_tune_batch_norm=False

# === Run evaluation ===
python "${WORK_DIR}"/eval.py \
  --logtostderr \
  --eval_split="val" \
  --model_variant="mobilenet_v2" \
  --eval_crop_size=513 \
  --eval_crop_size=513 \
  --dataset="testset" \
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --eval_logdir="${EVAL_LOGDIR}" \
  --dataset_dir="${FINAL_DATASET}" \
  --max_number_of_evaluations=1

# === Visualize the results ===
python "${WORK_DIR}"/vis.py \
  --logtostderr \
  --vis_split="val" \
  --model_variant="mobilenet_v2" \
  --vis_crop_size=513 \
  --vis_crop_size=513 \
  --dataset="testset" \
  --checkpoint_dir="${TRAIN_LOGDIR}" \
  --vis_logdir="${VIS_LOGDIR}" \
  --dataset_dir="${FINAL_DATASET}" \
  --max_number_of_iterations=1

# === Export the trained checkpoint ===
python "${WORK_DIR}"/export_model.py \
  --logtostderr \
  --checkpoint_path="${CKPT_PATH}" \
  --export_path="${EXPORT_PATH}" \
  --model_variant="mobilenet_v2" \
  --num_classes=3 \
  --crop_size=513 \
  --crop_size=513 \
  --inference_scales=1.0
@Kedron007

This comment has been minimized.

Copy link

Kedron007 commented Oct 29, 2018

hello,I have a question.
1.I change the exclude_list only includes the `_LOGITS_SCOPE_NAME' and also set the flag initialize_last_layer = False , last_layers_contain_logits_only=True ,and train with xception_65,
and min_resize_value=50 ,max_resize_value=2000 . the loss is high and never down,like this:
INFO:tensorflow:global step 570: loss = 8.3546 (0.559 sec/step)
INFO:tensorflow:global step 580: loss = 8.5703 (0.548 sec/step)
INFO:tensorflow:global step 590: loss = 8.9560 (0.543 sec/step)
INFO:tensorflow:global step 600: loss = 8.2486 (0.512 sec/step)
INFO:tensorflow:global step 610: loss = 8.1094 (0.508 sec/step)
INFO:tensorflow:global step 620: loss = 8.2317 (0.520 sec/step)
INFO:tensorflow:global step 630: loss = 7.9649 (0.511 sec/step)
INFO:tensorflow:global step 640: loss = 8.2240 (0.517 sec/step)
INFO:tensorflow:global step 650: loss = 7.9889 (0.517 sec/step)
INFO:tensorflow:global step 660: loss = 8.0038 (0.507 sec/step)
INFO:tensorflow:global step 670: loss = 8.0465 (0.529 sec/step)
could anyone help me with it? the result is not good.

@Kedron007

This comment has been minimized.

Copy link

Kedron007 commented Oct 29, 2018

hello,I have a question.
1.I change the exclude_list only includes the `_LOGITS_SCOPE_NAME' and also set the flag initialize_last_layer = False , last_layers_contain_logits_only=True ,and train with xception_65,
and min_resize_value=50 ,max_resize_value=2000 . the loss is high and never down,like this:
INFO:tensorflow:global step 570: loss = 8.3546 (0.559 sec/step)
INFO:tensorflow:global step 580: loss = 8.5703 (0.548 sec/step)
INFO:tensorflow:global step 590: loss = 8.9560 (0.543 sec/step)
INFO:tensorflow:global step 600: loss = 8.2486 (0.512 sec/step)
INFO:tensorflow:global step 610: loss = 8.1094 (0.508 sec/step)
INFO:tensorflow:global step 620: loss = 8.2317 (0.520 sec/step)
INFO:tensorflow:global step 630: loss = 7.9649 (0.511 sec/step)
INFO:tensorflow:global step 640: loss = 8.2240 (0.517 sec/step)
INFO:tensorflow:global step 650: loss = 7.9889 (0.517 sec/step)
INFO:tensorflow:global step 660: loss = 8.0038 (0.507 sec/step)
INFO:tensorflow:global step 670: loss = 8.0465 (0.529 sec/step)
could anyone help me with it? the result is not good.

I want to train ade20K

@Kedron007

This comment has been minimized.

Copy link

Kedron007 commented Oct 29, 2018

I retrained deeplab with Ade20K dataset in my Google Colab notebook, below results with MobileNet-v2 and Xception_65 as initial checkpoint, anyway I couldn't fine tune because of OOM error. May be others can share parameters for training to get better results?

MobileNet-v2
ade20k-mobile-2000iter-2batch

Xception_65
ade20k-xception-2000iter-2batch

could you tell me about your training_number_of_steps?and your parameters, and whether you use the xception65_ade20k_train to train your image ?
my result is too bad ,thanks you !have good day!

@Work-jk-l

This comment has been minimized.

Copy link

Work-jk-l commented Jan 9, 2019

@apolo74 I wonder Do you have solve the Problem ? I meet the same question you menthoned above

@heiheiya

This comment has been minimized.

Copy link

heiheiya commented Mar 13, 2019

@BeSlower , yes, the solution is work for me but there is another problem that the result is all black and no other label , but during the training process , the loss is decrease. Can anyone help me ?

I have the same problem, have you solve it?

@heiheiya

This comment has been minimized.

Copy link

heiheiya commented Mar 13, 2019

I have the same problem, have you solve it?

@apolo74

This comment has been minimized.

Copy link

apolo74 commented Mar 14, 2019

Hi guys, sorry I've been disconnected from this thread... the black output is related to 2 very important settings:

  1. Assuming that you are re-training on your own data that, for example, has 2 classes... in my toy case I mentioned I created a dataset with circles and squares. Then I have 2 classes BUT the parameter called "--num-classes" should be 4 because: 2 (own classes) + 1 (background) + 1 (ignore_label)
  2. The pixel values in your "background" class are supposed to be 0, pixel values for your first class should be 1, for your second class should be 2 and so on... DON'T save your classes with other values like 100 or 224, you have to save your class images following the order from 1 to N
    Hope this helps
    /B
@ajinkya933

This comment has been minimized.

Copy link

ajinkya933 commented Apr 10, 2019

@apolo74 how do I set The pixel values in "background" class = 0. I don't see this option in train.py, or segmentation_dataset.py file

@apolo74

This comment has been minimized.

Copy link

apolo74 commented Apr 10, 2019

@apolo74 how do I set The pixel values in "background" class = 0. I don't see this option in train.py, or segmentation_dataset.py file

When you are creating your training dataset you have to create the "annotations" in grayscale, like the example I shared before:

The image on the left is a normal input to the system, but the image in the right is manually created (Photoshop, Paint, any image processing or in my case I generated this test images with a python script). The main idea is that you put everything that you are not instered in detecting as background with value 0 (all the black area inside the right image), and the different classes with values starting from 1.
The image on the right was created before I realized about this very important point... that is why the rectangle has a light gray color and the circle an even ligther gray. When setting all pixels of the rectangle equal to 1 and all pixels of the circle equal to 2 they wouldn't show clearly in this example but that's how they are supposed to be: background=0, rectangle=1 and circle=2... and ignore_label=255
I hope it's clear now

@CarlosL96

This comment has been minimized.

Copy link

CarlosL96 commented Apr 10, 2019

@apolo74 Hi, hope you could help me, as you said the class number should be 0, 1 , 2 and so on... where do I specify that? Thank you.

@ajinkya933

This comment has been minimized.

Copy link

ajinkya933 commented Apr 11, 2019

@apolo74 thank you for getting me one step closer. Please take a look at my image:

Screenshot 2019-04-11 at 11 05 47 AM

The image on right is segmentation mask, it has background=zero and object to be detected is a very dark shade of grey (I am not sure what is the value for this shade of gray)
The image on left is the original image.

Would you recommend to Convert the image on right such that the dark shade of grey is converted to white color ?

Or is there any other way I can accept the existing right hand side image by modifying my code ?

@apolo74

This comment has been minimized.

Copy link

apolo74 commented Apr 11, 2019

@apolo74 Hi, hope you could help me, as you said the class number should be 0, 1 , 2 and so on... where do I specify that? Thank you.

Hola Carlos, the idea is that you CREATE your segmentation images with these values as pixels... for example let's say that I have a color image of 10x8 pixels, this means your input is going to be a 10x8x3 (where 3 represents color channels R, G, B). Let's say you want to detect squares and in this example there is a small square in the bottom right, then your segmentation mask will be a single 10x8 matrix with VALUES:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 0 0 0

@apolo74

This comment has been minimized.

Copy link

apolo74 commented Apr 11, 2019

@apolo74 thank you for getting me one step closer. Please take a look at my image:

The image on right is segmentation mask, it has background=zero and object to be detected is a very dark shade of grey (I am not sure what is the value for this shade of gray)
The image on left is the original image.

Would you recommend to Convert the image on right such that the dark shade of grey is converted to white color ?

Or is there any other way I can accept the existing right hand side image by modifying my code ?

Hi again @ajinkya933, I'm glad to hear you are doing some progress... about your questions:

  1. no, don't convert your object mask to white because that means your class pixels will have a value of 255. From what I see they have a very low value but you need to be sure that it's 1 assuming that's the first class you want to detect.
  2. I used Photoshop to create my segmentation masks, but I'm sure you can do this in any other program: Matlab, Paint, even Excel.
    You could even create a very simple script in Python to open the image and print out the value of your pixels in specific positions so you'll be sure what values they have. Don't believe what your eyes see :)
@CarlosL96

This comment has been minimized.

Copy link

CarlosL96 commented Apr 12, 2019

@apolo74 Hi, hope you could help me, as you said the class number should be 0, 1 , 2 and so on... where do I specify that? Thank you.

Hola Carlos, the idea is that you CREATE your segmentation images with these values as pixels... for example let's say that I have a color image of 10x8 pixels, this means your input is going to be a 10x8x3 (where 3 represents color channels R, G, B). Let's say you want to detect squares and in this example there is a small square in the bottom right, then your segmentation mask will be a single 10x8 matrix with VALUES:
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 0 0 0

Thank you so much, I just trained DeepLab with satisfactory results so far :D!

@ajinkya933

This comment has been minimized.

Copy link

ajinkya933 commented Apr 16, 2019

@apolo74 Thanks I got the output now

@apolo74

This comment has been minimized.

Copy link

apolo74 commented Apr 16, 2019

@apolo74 Thanks I got the output now

Happy to hear that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.