Implementation detail of training code #10

Xiangyu-CAS · 2016-12-05T09:51:52Z

Hi, tianzhi,
I tried to implement CTPN training code on the framework of py-faster-rcnn (by RBG), but the results were different from yours (of course worse) .

Loss function. Did you revise the loss function (eg: SmoothL1Loss) of training code ?
vertical proposals heights in a textline. A complete textline constitue of several vertical anchors in sequnce, and the heights of them vary slightly in your implementation, however the heights in my implementation vary enormously. Sometime, the proposal fit tightly to the boundry of single character, if the heights of characters in a textline differs greatly, heights of proposals differs too. So I want to ask the question : how did you make the heights and y coordinate of proposal sequence uniformed ? Via lstm? Or, other kind of change in python layer? If the answer is lstm , does that mean lstm not working in my implementation ? FTPN (CTPN with No RNN) seems have the same problem.

tianzhi0549 · 2016-12-05T13:05:40Z

@Xiangyu-CAS Thank you for your interest.

We didn't modify the loss functions.
Yes, lstm can utilize context information to align these text proposals. Maybe your network isn't trained sufficiently or something.

Thank you:-).

Xiangyu-CAS · 2016-12-07T11:49:51Z

Thank you ! I figured it out! It's not the problem of lstm, lstm works well after 40k iterations with trainset consitutes of 20k images . It's RCNN which is responsable. It's seems that RCNN had weeken the sequential information(The input of RCNN is conv5_3 instead of lstm_output). After discardinging RCNN, I got a result which looks reansonable. Comparing to Faster-rcnn, CTPN discard RCNN , did you ever test the perfermance of (CTPN + RCNN) ?

BTW, with your help , I trained my own CTPN model ! From the result, it looks better than faster-rcnn, but still worse than yours. The evaluation result on ICDAR2013 is (R= 81.08 % P= 81.61 % F= 81.34 %) , comparing to my own Faster-rcnn model which end up with (R=75.21 % P=85.85 % F=80.18 % ).
Of course, there are still some problem exist. First , boudingbox is larger than gt, especially small textlines. Second, the boundingbox tend to exceed the begining and end of texline at most time. I think is might be the fault in training set, maybe they did not do a refine work in labeling gt box.

I have never see a person who is as paticent as you in teaching a stranger ! ^_^ Thank you again for your kindness.

tianzhi0549 · 2016-12-08T01:06:50Z

@Xiangyu-CAS Thank you. We didn't try to add Fast RCNN behind CTPN because of time and memory cost. Also, I'm very glad that you have got these good results.

Jayhello · 2017-01-23T09:32:36Z

@Xiangyu-CAS
Can you give a example of the image you train? And the box config file?
Thank you very much

Xiangyu-CAS · 2017-02-09T07:27:30Z

@Jayhello The implementation is based on the original py-faster-rcnn by Ross Girshick, and the trainset is ICDAR2013 and 20000 labeled picture and then tranformed into VOC fromat. The only difference is that I divided a textline boundingBox GT by fixed width (16 pixel) ,just like Figure 2(Right) in the CTPN paper

melody-rain · 2017-02-27T14:29:56Z

Hi @Xiangyu-CAS

Do you do the offset regression in your code?

melody-rain · 2017-02-27T14:33:35Z

@Xiangyu-CAS
Also in icdar datasets, the GT is labeled word by word. But with CTPN, the words in a row are grouped together. How do separate the words in a group?

Xiangyu-CAS · 2017-03-01T15:40:09Z

@melody-rain offset regression? you mean side refinement? No,I did not.In CTPN, Lstm tend to group words togther, but you can overcome this by adding space between words as negative GT.

leftstone2015 · 2017-03-16T08:04:11Z

hi, @Xiangyu-CAS
Thank you for sharing your implements in details. I want to ask 2 questions:

how many images used for training in a batch？ RPN used 1 image in a batch, am I right?
negative samples are random selected in order to make the sum of num_bg and num_fg equals 128.
I think the strategy of selecting negative and positive samples in SSD may be better. Are you following CTPN's paper or others?

thank you in advance.

Xiangyu-CAS · 2017-03-17T08:34:34Z

@leftstone2015 The parameters including image per batch and bg_fg num are not a principle thing that matters, I suppose. All this parameters follow the default value in py-faster-rcnn.

thisismohitgupta · 2017-03-17T09:14:32Z

@leftstone2015 I am too not able to converge this. If I only try the class scores they tend to converge but when the coordinates are added the network just wont converge at any learning rate.

LearnerInGithub · 2017-03-22T02:26:21Z

@Xiangyu-CAS Hi, Xiangyu! I have a question that I add a regression layer which for vertical coordinate(cter_y, height), but in the deploy file it output normal 4 element bbox targets, so am confusing how to handle this inconsistence... Thank you in advance!

Xiangyu-CAS · 2017-03-22T05:19:23Z

@LearnerInGithub The regression layer apply offsets(cter_y,height) to generated anchors (x,y,h,w) instead of outputing as bbox coordinates. Even if you don't use regression layer, you can get anchors as bbox targets directly.

LearnerInGithub · 2017-03-22T06:23:39Z

@Xiangyu-CAS Thanks for your explanation, and I look the network architecture file again, yes! This is not the problem, I think I need change other place. Much appreciate for your explain!

willyangyang · 2017-03-22T09:58:29Z

@Xiangyu-CAS did you change the rpn-data layers?Due to the difference of generating anchors and the new size of output ,I change the rpn-data layers from the py-faster-rcnn.Unfortunately I find it is hard to converge .Appreciate for your reply!

LearnerInGithub · 2017-03-27T09:20:46Z

@Xiangyu-CAS Another question, how could we testing our implemented CTPN on test set of some dataset like ICDAR2013 or ICDAR2015? The test.py in faster_rcnn module seems prefer to test the trained fast_rcnn model, so how could I modify the faster_rcnn_test.pt to make the testing normal running?

Xiangyu-CAS · 2017-03-28T02:19:20Z

@willyangyang rpn-data layer? you mean roi-data perhaps ? I don't know what's wrong with your code, but I did change rot-data layer. Two kinds of modification are applied: (1) divide gt to vertical gt (2) revise anchor to vertical anchor. I guess you did not divide gt to fixed width vertical gt

Xiangyu-CAS · 2017-03-28T02:32:42Z

@LearnerInGithub That's easy. In test.py , faster-rcnn obtains roi score from blob "cls_prob" and applys BBOX_REG for boxes. You can get RPN output directly by blob "roi_scores" and "rois" without BBOX_REG. CTPN is a novel RPN. thus you can get CTPN result in this way. Additionaly , the result you get is sequences vertical proposals without connection, you need to connect these to a complete text proposal. The conection function is provide in Tianzhi's test code.

LearnerInGithub · 2017-03-28T07:16:16Z

@Xiangyu-CAS I can't access the blob named 'roi_scores' in test.py, the program will stop at this line:scores = blobs_out['roi_scores'], and give me an error: scores = blobs_out['roi_scores']
KeyError: 'roi_scores'.....

LearnerInGithub · 2017-03-28T07:23:00Z

@Xiangyu-CAS And also can't find the blob named "cls_prob", error: scores = blobs_out['cls_prob']
KeyError: 'cls_prob'

LearnerInGithub · 2017-03-29T11:37:39Z

@Xiangyu-CAS @tianzhi0549 I want ask how to test the pre-trained model on ICDAR2013? Seems CTPN always output one line of text bbox, but the ICDAR2013 only mark the text word region. Could we improve the detection-level to word instead of text line?

Xiangyu-CAS · 2017-03-29T12:08:05Z

@LearnerInGithub It is the innate defect of CTPN I suppose. RNN tends to connect words togther. You can add space as negative samples to improve it as tianzhi said.

Xiangyu-CAS · 2017-03-29T12:12:04Z

@LearnerInGithub blob can't find : you can refer to model file to find out that blob. 'roi_scores' is in proposal_layer, commented by default.

willyangyang · 2017-03-29T12:34:20Z

@Xiangyu-CAS Have you ever changed the SmoothL1Loss and SoftmaxWithLoss?I find it's hard to converge

LearnerInGithub · 2017-03-29T12:49:54Z

@Xiangyu-CAS Thank you very much for your suggestions and answering!

Xiangyu-CAS · 2017-03-29T13:02:04Z

@willyangyang You dont need to change any loss function. SmoothL1Loss and SoftmaxWithLoss are fundamental loss function. Some works change SmoothL1Loss to SmoothL2Loss and or so called 'SmoothLn' to converge in a better way. However, when you change your learning goal, you dont need to change these functions. For example, you dont need to change SGD algorithm when you training a recognition task or a segmentation task.

willyangyang · 2017-03-29T13:26:59Z

@Xiangyu-CAS my rpn-score loss fluctuate around 0.3,after 30k iteration.I choose coco-text for my training data, is there a better choice？

thisismohitgupta · 2017-03-29T15:21:39Z

@willyangyang indeed my score loss too revolves around 0.3 and smoothL1 loss at ~0.05 on MS-COCO. Even on a very small dataset I was trying to overfit just to test it. @Xiangyu-CAS any help or suggestions?

Xiangyu-CAS · 2017-03-30T05:27:54Z

@willyangyang @thisismohitgupta We trained our model on private dataset which consists of focused scene text images similar to ICDAR2013. I suggest you train your model on ICDAR2017 MLT DATASET, MLT DATASET consist of focused Latin languages including Italian , French and German. We provide 1,000 images for each of them. Some of them had been published, and the rest will come soon. You can find it on the competition websites.

thisismohitgupta · 2017-03-30T06:15:34Z

@willyangyang just got it working last night. I changed the weight distribution method originally used in paper and a higher learning rate of 0.1 and per channel mean subtraction. I dint test it on whole dataset just on a small subset so theres still work to be done.

willyangyang · 2017-03-30T13:13:19Z

@thisismohitgupta how many iterations have you trained when convergence is met?

thisismohitgupta · 2017-04-01T12:26:55Z

@Xiangyu-CAS about GT calculation
I start with defining a numpy array of zeros with the shape of (number of proposals, score) score=0 for now

Then I iterate over each GT annotations which Ive aready sliced to 16 and then calculate the IOU of gt sliced and propsal box

If its greater than 0.7 is mark the score as 1 for that particular proposal

Loop goes on and the proposals that do not fall in the category of > 0.7 remain zero
am I doing it right??

Xiangyu-CAS · 2017-04-04T11:35:19Z

@thisismohitgupta From your description, it seems right, and that's what py-faster-rcnn had done except dividing width flexible GT to width fixed (16) GT

thisismohitgupta · 2017-04-06T21:23:26Z

@Xiangyu-CAS you were right MSCOCO text wont work here becuase more refined word level annotations are needed. So the reason I couldn't get past 50%acc in the score predictions which is even worse than random so I am trying your suggeted dataset of MLT

athmey · 2017-06-19T08:24:44Z

Hi, tianzhi,

May I have your email address?

I have sth in details to talk with you.

Sincerely waiting for your response.

Thank you

AnshulBasia · 2017-07-27T04:45:45Z

@thisismohitgupta @Xiangyu-CAS For training purposes, can proposals produced by class AnchorText (locate_anchors ) be used ? If not, am I supposed to take the image, feed it to py-faster rcnn , get those rpn propsals, and label them with IOU with gt. Would that be my ground truth for training?

tonmoyborah · 2018-03-27T11:32:13Z

Hi @Xiangyu-CAS . What if I want to club words in a sentence together. If there is more space, this ctpn implementation tries to separate the sections. Even when I trained it on a custom dataset, ctpn predicts separate boxes for words some spaces apart. Is there a parameter I can tune to not do this? ie club words in a straight line together in one box? Thanks a lot for the help!

Xiangyu-CAS · 2018-03-31T20:44:10Z

@tonmoyborah The best way is to re-train this model with annotations labeled in sentence rather than word. What you get from NN heavily depends on what is GT looks like. If your GT are labeled in sentence level, then you get sentence level predcition.

Xiangyu-CAS closed this as completed Dec 8, 2016

Implementation detail of training code #10

Implementation detail of training code #10

Comments

Xiangyu-CAS commented Dec 5, 2016

tianzhi0549 commented Dec 5, 2016 • edited

Xiangyu-CAS commented Dec 7, 2016 • edited

tianzhi0549 commented Dec 8, 2016

Jayhello commented Jan 23, 2017

Xiangyu-CAS commented Feb 9, 2017 • edited

melody-rain commented Feb 27, 2017

melody-rain commented Feb 27, 2017

Xiangyu-CAS commented Mar 1, 2017 • edited

leftstone2015 commented Mar 16, 2017

Xiangyu-CAS commented Mar 17, 2017

thisismohitgupta commented Mar 17, 2017 • edited

LearnerInGithub commented Mar 22, 2017

Xiangyu-CAS commented Mar 22, 2017

LearnerInGithub commented Mar 22, 2017

willyangyang commented Mar 22, 2017

LearnerInGithub commented Mar 27, 2017

Xiangyu-CAS commented Mar 28, 2017

Xiangyu-CAS commented Mar 28, 2017 • edited

LearnerInGithub commented Mar 28, 2017

LearnerInGithub commented Mar 28, 2017

LearnerInGithub commented Mar 29, 2017

Xiangyu-CAS commented Mar 29, 2017

Xiangyu-CAS commented Mar 29, 2017

willyangyang commented Mar 29, 2017

LearnerInGithub commented Mar 29, 2017

Xiangyu-CAS commented Mar 29, 2017

willyangyang commented Mar 29, 2017

thisismohitgupta commented Mar 29, 2017

Xiangyu-CAS commented Mar 30, 2017 • edited

thisismohitgupta commented Mar 30, 2017

willyangyang commented Mar 30, 2017

thisismohitgupta commented Apr 1, 2017

Xiangyu-CAS commented Apr 4, 2017

thisismohitgupta commented Apr 6, 2017

athmey commented Jun 19, 2017

AnshulBasia commented Jul 27, 2017 • edited

tonmoyborah commented Mar 27, 2018

Xiangyu-CAS commented Mar 31, 2018 • edited

tianzhi0549 commented Dec 5, 2016 •

edited

Xiangyu-CAS commented Dec 7, 2016 •

edited

Xiangyu-CAS commented Feb 9, 2017 •

edited

Xiangyu-CAS commented Mar 1, 2017 •

edited

thisismohitgupta commented Mar 17, 2017 •

edited

Xiangyu-CAS commented Mar 28, 2017 •

edited

Xiangyu-CAS commented Mar 30, 2017 •

edited

AnshulBasia commented Jul 27, 2017 •

edited

Xiangyu-CAS commented Mar 31, 2018 •

edited