Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on tilt and perspective texts #8

Closed
Xiangyu-CAS opened this issue Nov 28, 2016 · 23 comments
Closed

Performance on tilt and perspective texts #8

Xiangyu-CAS opened this issue Nov 28, 2016 · 23 comments

Comments

@Xiangyu-CAS
Copy link

Xiangyu-CAS commented Nov 28, 2016

Dear Tianzhi:
I tried you demo, and obtained an exactly same reuslt on ICDAR 2013 Challenge 2 as you submited . It works perpectly ! BTW, OpenCV 3, CUDA 7.5 is compatible for this project.
Now I am trying to test the performance on ICDAR 2015 Challenge 4, which is constitute of many tilt and perspective texts, but the boudingbox returned by your method is a rectangle of whole textline, instead of separated words represented by 8 coordinates.
Did you submited the rectangle (4 coordinates) of whole textline in Challenge 4 as you did in Challenge 2 ? If not , what kind of adjustment is applied ? The publication did not mentioned any stuff about tilt and perpsective texts , so I got a little confused.

Best Regards

@Xiangyu-CAS Xiangyu-CAS changed the title Performance on dataset (ICDAR 2013 Challenge 2.1) Performance on tilt and perspective texts Nov 28, 2016
@junedgar
Copy link

@398766201
hi,I had a trouble when i compiled the caffe with cuDNN5.0, the problem described in #9.
Did you have the same problem when you compiled the caffe with cuDNN5.0?
Thank you!

@Xiangyu-CAS
Copy link
Author

Xiangyu-CAS commented Nov 28, 2016

Yes, I encounter the same problem. As the author said, cuDNN 5.0 is not compatible , this projects is based on cuDNN 3.0. So I did not use cuDNN, just comment use-cuDNN statement in makefile.config.
using without cuDNN will not reduce performance and processing speed, the only cost is much more GPU memory

@junedgar
Copy link

@Xiangyu-CAS
thank you very much for your answer! 😁

@tianzhi0549
Copy link
Owner

@Xiangyu-CAS For the ICDAR15, we used a simple sampling strategy that allows the network to be able to output word-level bounding boxes directly. We increased the number of negative samples collected from the spaces between words. For example, we controlled the ratio of positive samples, negative ones from background and between words as (0.5, 0.4, 0.1) in each batch. This encourages the model to directly output word-level bboxes without further post-processing. In this work, we aim to provide a fundamental solution for text detection. We believe that the performance on ICDAR15 could be improved considerably by using a more powerful approach for word splitting, and enabling our method to handle multi-oriented texts. Thank you:-).

@Xiangyu-CAS
Copy link
Author

@tianzhi0549 Thank you very much for your reply, it's so kind of you!

@crazylyf
Copy link

crazylyf commented Feb 8, 2017

@tianzhi0549 Thanks a lot for your work, and your kindly reply!
If I read this right, when two or more tilted lines close to each other, the word-splitting style solution may still result bbox containing nearby characters, which will affect the recognition accuracy. Also, for Chinese texts, the lines are always quite long and without any space in-between, thus the bbox will be full of background noise or nearby characters. Is it possible to get tightly bounded bbox in those cases? Thank you!

@tianzhi0549
Copy link
Owner

@crazylyf it is still an open problem to handle these complicated cases perfectly. The method cannot produce bounding polygons and therefore it cannot fit the text line well if the text line is too inclined. If your goal is to detect multi-oriented text, I suggest that you could try the methods that are originally designed for multi-oriented text. Thank you:-).

@crazylyf
Copy link

crazylyf commented Feb 9, 2017

Get it. Thank you~

@LearnerInGithub
Copy link

LearnerInGithub commented Mar 29, 2017

@tianzhi0549 Hello! I also faced this problem, very wondering how to sample the space regions between words? Collect them by hand cropping or using some algorithms?

@Xiangyu-CAS
Copy link
Author

@LearnerInGithub In my implementation, I obtained space region by algorithms. If the two ground truth boxes are approximately in a line (judge by IoU in vertical), and there's no words in the region between them, the region is selected as space region. I implement this algorithm by two naive loops, traverse all the gt boxess.

@LearnerInGithub
Copy link

@Xiangyu-CAS Thank your for your sharing algorithm, it's looks very intuitive, I will try to add it to my CTPN code.

@LearnerInGithub
Copy link

@Xiangyu-CAS A new question comes, how to feed the picked space reegion into the minibatches? I find in the origin Faster-RCNN implementation, it only store the non-background class bbox in gt_boxes, so how can I add my picked space regions into the input mini-batches?

@Xiangyu-CAS
Copy link
Author

@LearnerInGithub The same way as non-background bbox. First, You get gt box of space region. Second, anchors which overlap with space gt box > 0.5 was labeled as negative anchors. Third, the ratio of space negative anchor is 10% and ratio background anchor is 40%.

@LearnerInGithub
Copy link

@Xiangyu-CAS Aquestion about testing, I test the CTPN pre-trained model on ICDAR2013, however, it only give 0.002 AP, so I am very confusing about this, have you tested your model on ICDAR2013 and give me advices?

@Xiangyu-CAS
Copy link
Author

Xiangyu-CAS commented Mar 30, 2017

@LearnerInGithub In test code provided by CTPN, it resize image to a fixed sized, so does bbox coordinates. revise demo.py in this way, you will obtained right bboxes in orignial size.

im, f=resize_im(im, cfg.SCALE, cfg.MAX_SCALE)

write_result(RESULT_DIR,im_name,text_lines/f)

@LearnerInGithub
Copy link

@Xiangyu-CAS Yes, now the testing result improved, about 20%, but still far from the paper reported 88%, so what need I to do if I want to get the 80%+ testing result by using the pre-trained model?

@Xiangyu-CAS
Copy link
Author

@LearnerInGithub That‘s the only modificatioin I had done on test code and I got 87.5% directly. I suppose it casuse by bbox coordinates mismatch, may be you should check it carefully

@LearnerInGithub
Copy link

@Xiangyu-CAS I switched to testing by using CTPN test module, and now it works fine, but still have problem with using the test module of py-faster-rcnn, maybe they have different evaluating standard.

@LearnerInGithub
Copy link

@Xiangyu-CAS Now I want to trained the model on ICDAR2015, and I convert the GT of ICDAR2015 from (x1, y1, x2, y2, x3, y3,x4,y4) to (xmin, ymin, W, H), and the test result of the trained model on ICDAR2015 is abnormal low, only about 10%, and I visualize the detection results, found that there many redundant space between the text and detected bbox, so I am wondering how you handle the ICDAR2015's GT to let CTPN could train on it?

@Xiangyu-CAS
Copy link
Author

Xiangyu-CAS commented Apr 10, 2017

@LearnerInGithub. To be honest, my model failed on ICDAR2015 too, only 50%. I got confused by your description "redundant space between the text and detected bbox,". Do you mean the detection bbox detect target text correctly, but failed on accurate localization? You can try to dived GT into tilt bbox sequence. . Space sample is gona be help too. However , the performance still far away from 60%. I think tianzhi owe us a lot of details.
Faster RCNN is much more promising than CTPN in ICDAR2015. Few papers had been released to deal with ICDAR2015.
"Arbitrary-Oriented Scene Text Detection via Rotation Proposals"
"Detecting oriented text in natural images by linking segments"
"Deep Direct Regression for Multi-Oriented Scene Text Detection" strongly recomend, it achieved 83%
on ICDAR2015 91% on ICDAR2013,which is state-of-the-art, ranked first on competition website.

@LearnerInGithub
Copy link

@Xiangyu-CAS Yes, that's my meaning. I watched the detection results one by one, the detected bbox too large even though they put text region inside.

@LearnerInGithub
Copy link

@Xiangyu-CAS I have downloaded the paper, and roughly look through it, the result really seems good! But I also notice that a team from CASIA called NLPR_CASIA, they got 82.76 % 84.76 % 83.75 %, now it's the No.1. Not make sure whether the paper "Deep Direct Regression for Multi-Oriented Scene Text Detection" is their work...

@Xiangyu-CAS
Copy link
Author

@LearnerInGithub As I mentioned , you might trained your CTPN model by horizontal bbox sequences, as a result you obtained detecting result in horizontal bbox sequences. BTW, the proposal connection function should be revised to output tilt rectangle.

That paper is the publication of NLPR_CASIA, you can check out the organization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants