Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have few questions. #5

Closed
GodOfSmallThings opened this issue May 2, 2018 · 38 comments
Closed

I have few questions. #5

GodOfSmallThings opened this issue May 2, 2018 · 38 comments

Comments

@GodOfSmallThings
Copy link

I have few questions noted below.

  1. Have you tried training with SynthText dataset? If yes, does benchmark improved?
  2. Your model uses aligned rectangle, rather than 8-coordinate bounding box. Would the performance be improved if we use tight bounding box?

Thank you!

@BowieHsu
Copy link

BowieHsu commented May 3, 2018

@GodOfSmallThings
1.On VGG backbone,I tried train on SynthText about 6W iter using 0.01 learning rate then finetune on IC15 using 0.001 learning rate about 3W iter, the F-measure score is about 77%. But when I train on IC15 about 10W iter, F-measure score is nearly 82%
2.I tried replace backbone network by resnet50,with same SGD policy after 10W iter training,F-measure is about 75% .
3.Pretrain model can not make the final score higher on this network architecture, under my experiment.
really interesting.

@GodOfSmallThings
Copy link
Author

@BowieHsu
Thank you for the answer on first question!
I am also curious about the way you use ground truth, which is my second question. As I mentioned above, this model changes 8-coords bounding box information to aligned bounding box. I guessed it would contain certain amount of background information. Would it perform better if we use 8-coord tight bounding box rather than 4-coord aligned box?

@BowieHsu
Copy link

BowieHsu commented May 3, 2018

@GodOfSmallThings It may help, you can try to do some experiment.

@dengdan
Copy link
Member

dengdan commented May 3, 2018

Pretraining on SynthText will help. @BowieHsu: when SynthText is added for pretraining, an fmean of 85% can be achieved. http://rrc.cvc.uab.es/?ch=4&com=evaluation&task=1&e=1&f=1&d=0&p=0&s=1

However, quite a lot more iterations are needed.

@GodOfSmallThings
Copy link
Author

Thank you very much !

@lizzyYL
Copy link

lizzyYL commented May 6, 2018

@BowieHsu @dengdan
I trianed on ICDAR15 about 10W iter, set learning rate 0.001, and batchsize is 12.
But F-measure score is 78.9%(R=77.7% P=80.2%). Obviously lower than your grades.
Are there any parameters need to be adjusted?

Thank you!

@BowieHsu
Copy link

BowieHsu commented May 6, 2018

dude,1e-2 learning rate will improve F-score

@lizzyYL
Copy link

lizzyYL commented May 6, 2018

1e-2 learning rate caused loss = Nan, so I change it to 1e-3....

@BowieHsu
Copy link

BowieHsu commented May 6, 2018

@lizzyYL so you need train in 1e-3 learning rate for 100 iter and then just train it in 1e-2.

@lizzyYL
Copy link

lizzyYL commented May 6, 2018

@BowieHsu Yes, that's what I did the first time. Running train.sh, but loss=nan at 3w iters.
Becasue of bach size? I set it = 8 the first time.

@GodOfSmallThings
Copy link
Author

GodOfSmallThings commented May 7, 2018

@lizzyYL I also tried at batch 8 for first time, and got Nan. When batch 24, Nan didn't occurred and performance went up to 82.4%.

@lizzyYL
Copy link

lizzyYL commented May 7, 2018

@GodOfSmallThings Understand! Thank you for your reply~

@tsing-cv
Copy link

@lizzyYL @dengdan @BowieHsu @GodOfSmallThings @comzyh
I just want to test on my own images and ICDAR2015,using the two pretrained model.
But an amazing is occured, all the predicted images only changed their color without any bboxes. All the predicted txt files have no coordinates.
Could you solve the problem for me?

@BowieHsu
Copy link

@tsing-cv so,have you print the pixellink results in test_any_image.py?

@GodOfSmallThings
Copy link
Author

@tsing-cv
I've encountered same problem as you. It seems like the network has reached non-opitimal local minima. (It may not be diverged, because loss don't goes to infinity or nan) You can try few checkpoints from the last to earlier one, and there may be the checkpoint that is not trapped on non-optimal point. Then you can erase checkpoints after it, modify checkpoint(txt), and resume training/test.

@GodOfSmallThings
Copy link
Author

GodOfSmallThings commented May 14, 2018

@dengdan
Can you share details on how you achieved 85% hmean? (learning rate, number of iterations, number of batch, etc.)
I've used SynthText DB for pretrained model, but it achieved only ~82.x%.
Thanks.

@BowieHsu
Copy link

@GodOfSmallThings I meet the same problem, After training on SynthText for about 10W iter using 1e-2 learning rate and then finetune on ICDAR2015, the F-score stay 0 forever.

@BowieHsu
Copy link

@dengdan @GodOfSmallThings another thing, I tried add BN ops after every conv layer which leads F-score drop about 5~6%.

@GodOfSmallThings
Copy link
Author

GodOfSmallThings commented May 15, 2018

@tsing-cv
I think sharing the weight won't help you much, so I'll just tell the method.

  1. Just train 3~40000 iters.
  2. See the result and resume training.

You'll probably see at some point training is working. Find the point where training fails, checking the result every 10000 times.

@BowieHsu
I think it's the problem of optimizer. I've tried Adam, it performed bad, but it did not fail. Seems like SGD has high probability of trapping on local minima. What I did was, I tried to find the point right before the training fails, and resume training from that checkpoint, and it worked.
And for the batch norm, Let's figure it out why.

@BowieHsu
Copy link

@GodOfSmallThings have you tried to use staircase learning rate instead of fix it at 1e-2, I also tried Adam, perform bad, I think the reason should be upscale 11 convolution, the channel of those 11 convolution are too few which leads network unstable.

@GodOfSmallThings
Copy link
Author

@BowieHsu
I thought similarly. I was guessing Adam performs well when the network is deep enough. Anyway, I don't think staircase will solve the problem. I think problem comes from that this model inherently generates big loss, since the input bbox is aligned rectangle, which contains lot of background. I think modifying data augmentation, which is to make bbox tight to the words, will help.

@tsing-cv
Copy link

tsing-cv commented May 16, 2018

Anybody encountered this problem?

Traceback (most recent call last):
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 383, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 303, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got 8.0 of type 'float' instead.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_pixel_link.py", line 294, in <module>
    tf.app.run()
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_pixel_link.py", line 287, in main
    batch_queue = create_dataset_batch_queue(dataset)
  File "train_pixel_link.py", line 153, in create_dataset_batch_queue
    capacity = 500)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 927, in batch
    name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 722, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
    component_types=component_types, timeout_ms=timeout_ms, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 519, in _apply_op_helper
    repr(values), type(values).__name__))
TypeError: Expected int32 passed to parameter 'n' of op 'QueueDequeueManyV2', got 8.0 of type 'float' instead.

@GodOfSmallThings @dengdan @lizzyYL

@GodOfSmallThings
Copy link
Author

@tsing-cv
I didn't encounter the problem above. Seems like type mismatch problem.
However, thing is that you are using python3. The code is based on python2. Note that python version can behave in different way. And check tensorflow version, too.

@tsing-cv
Copy link

tsing-cv commented May 18, 2018

Why it always give these notes, while training? All Pred BBoxes cannot drawn in the picture? @GodOfSmallThings, @dengdan, @BowieHsu
Bounding box (-19,538,78,592) is completely outside the image and will not be drawn.

@GodOfSmallThings
Copy link
Author

@tsing-cv Sorry, I don't know. I didn't dig to that problem yet.

@small-wong
Copy link

@tsing-cv #6

@tsing-cv
Copy link

tsing-cv commented May 18, 2018

@small-wong thank you

@Jyouhou
Copy link

Jyouhou commented May 22, 2018

Hi, can anyone obtain the result of 83.7%?

@small-wong
Copy link

@tsing-cv I have trained the model to detect Chinese text, you can have a try~

@cjt222
Copy link

cjt222 commented May 24, 2018

i am training the model to detect English text in documents, how can i make sure model has converged? i try to finetune the model author provided by my images, what can loss reach? it converges so slow... @GodOfSmallThings @BowieHsu @dengdan @lizzyYL @tsing-cv

@BowieHsu
Copy link

@cjt222 you can download icdar accuracy calculate bash to test whether your model has converged.

@tsing-cv
Copy link

@small-wong Thank you !

@cjt222
Copy link

cjt222 commented May 25, 2018

how can i train model by vgg pretrain model?i download vgg model from
https://github.com/tensorflow/models/tree/master/research/slim,but it can not load it,some layers can not be found @tsing-cv

@tsing-cv
Copy link

@cjt222 If you want to use pretrained model, you should write a loading model code file.

@af258963
Copy link

@small-wong can you share the model again? the model to detect Chinese text

@aravinthmuthu
Copy link

@tsing-cv Bounding box (-19,538,78,592) is completely outside the image and will not be drawn.
issue is because of data augmentation. The authors randomly crop boxes in the the scales 0.1 to 1. So naturally a few GT boxes are partially in the image.

@jisheng047
Copy link

@GodOfSmallThings At the time you reaching the result of 83.7%, Can you share the loss? I still get around 0.4-0.5 and when i used model to predict, it predict empty box.

@Lanme
Copy link

Lanme commented Dec 4, 2019

Anybody encountered this problem?

Traceback (most recent call last):
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
    value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 383, in make_tensor_proto
    _AssertCompatible(values, dtype)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 303, in _AssertCompatible
    (dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got 8.0 of type 'float' instead.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_pixel_link.py", line 294, in <module>
    tf.app.run()
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_pixel_link.py", line 287, in main
    batch_queue = create_dataset_batch_queue(dataset)
  File "train_pixel_link.py", line 153, in create_dataset_batch_queue
    capacity = 500)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 927, in batch
    name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 722, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
    component_types=component_types, timeout_ms=timeout_ms, name=name)
  File "/home/a??/anaconda2/envs/caffe_tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 519, in _apply_op_helper
    repr(values), type(values).__name__))
TypeError: Expected int32 passed to parameter 'n' of op 'QueueDequeueManyV2', got 8.0 of type 'float' instead.

config.batch_size_per_gpu = int(config.batch_size_per_gpu)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests