New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation detail of training code #10
Comments
@Xiangyu-CAS Thank you for your interest.
Thank you:-). |
Thank you ! I figured it out! It's not the problem of lstm, lstm works well after 40k iterations with trainset consitutes of 20k images . It's RCNN which is responsable. It's seems that RCNN had weeken the sequential information(The input of RCNN is conv5_3 instead of lstm_output). After discardinging RCNN, I got a result which looks reansonable. Comparing to Faster-rcnn, CTPN discard RCNN , did you ever test the perfermance of (CTPN + RCNN) ? BTW, with your help , I trained my own CTPN model ! From the result, it looks better than faster-rcnn, but still worse than yours. The evaluation result on ICDAR2013 is (R= 81.08 % P= 81.61 % F= 81.34 %) , comparing to my own Faster-rcnn model which end up with (R=75.21 % P=85.85 % F=80.18 % ). I have never see a person who is as paticent as you in teaching a stranger ! ^_^ Thank you again for your kindness. |
@Xiangyu-CAS Thank you. We didn't try to add Fast RCNN behind CTPN because of time and memory cost. Also, I'm very glad that you have got these good results. |
@Xiangyu-CAS |
@Jayhello The implementation is based on the original py-faster-rcnn by Ross Girshick, and the trainset is ICDAR2013 and 20000 labeled picture and then tranformed into VOC fromat. The only difference is that I divided a textline boundingBox GT by fixed width (16 pixel) ,just like Figure 2(Right) in the CTPN paper |
Hi @Xiangyu-CAS Do you do the offset regression in your code? |
@Xiangyu-CAS |
@melody-rain offset regression? you mean side refinement? No,I did not.In CTPN, Lstm tend to group words togther, but you can overcome this by adding space between words as negative GT. |
hi, @Xiangyu-CAS
thank you in advance. |
@leftstone2015 The parameters including image per batch and bg_fg num are not a principle thing that matters, I suppose. All this parameters follow the default value in py-faster-rcnn. |
@leftstone2015 I am too not able to converge this. If I only try the class scores they tend to converge but when the coordinates are added the network just wont converge at any learning rate. |
@Xiangyu-CAS Hi, Xiangyu! I have a question that I add a regression layer which for vertical coordinate(cter_y, height), but in the deploy file it output normal 4 element bbox targets, so am confusing how to handle this inconsistence... Thank you in advance! |
@LearnerInGithub The regression layer apply offsets(cter_y,height) to generated anchors (x,y,h,w) instead of outputing as bbox coordinates. Even if you don't use regression layer, you can get anchors as bbox targets directly. |
@Xiangyu-CAS Thanks for your explanation, and I look the network architecture file again, yes! This is not the problem, I think I need change other place. Much appreciate for your explain! |
@Xiangyu-CAS did you change the rpn-data layers?Due to the difference of generating anchors and the new size of output ,I change the rpn-data layers from the py-faster-rcnn.Unfortunately I find it is hard to converge .Appreciate for your reply! |
@Xiangyu-CAS Another question, how could we testing our implemented CTPN on test set of some dataset like ICDAR2013 or ICDAR2015? The test.py in faster_rcnn module seems prefer to test the trained fast_rcnn model, so how could I modify the faster_rcnn_test.pt to make the testing normal running? |
@willyangyang rpn-data layer? you mean roi-data perhaps ? I don't know what's wrong with your code, but I did change rot-data layer. Two kinds of modification are applied: (1) divide gt to vertical gt (2) revise anchor to vertical anchor. I guess you did not divide gt to fixed width vertical gt |
@LearnerInGithub That's easy. In test.py , faster-rcnn obtains roi score from blob "cls_prob" and applys BBOX_REG for boxes. You can get RPN output directly by blob "roi_scores" and "rois" without BBOX_REG. CTPN is a novel RPN. thus you can get CTPN result in this way. Additionaly , the result you get is sequences vertical proposals without connection, you need to connect these to a complete text proposal. The conection function is provide in Tianzhi's test code. |
@Xiangyu-CAS I can't access the blob named 'roi_scores' in test.py, the program will stop at this line:scores = blobs_out['roi_scores'], and give me an error: scores = blobs_out['roi_scores'] |
@Xiangyu-CAS And also can't find the blob named "cls_prob", error: scores = blobs_out['cls_prob'] |
@Xiangyu-CAS @tianzhi0549 I want ask how to test the pre-trained model on ICDAR2013? Seems CTPN always output one line of text bbox, but the ICDAR2013 only mark the text word region. Could we improve the detection-level to word instead of text line? |
@LearnerInGithub It is the innate defect of CTPN I suppose. RNN tends to connect words togther. You can add space as negative samples to improve it as tianzhi said. |
@LearnerInGithub blob can't find : you can refer to model file to find out that blob. 'roi_scores' is in proposal_layer, commented by default. |
@Xiangyu-CAS Have you ever changed the SmoothL1Loss and SoftmaxWithLoss?I find it's hard to converge |
@Xiangyu-CAS Thank you very much for your suggestions and answering! |
@willyangyang You dont need to change any loss function. SmoothL1Loss and SoftmaxWithLoss are fundamental loss function. Some works change SmoothL1Loss to SmoothL2Loss and or so called 'SmoothLn' to converge in a better way. However, when you change your learning goal, you dont need to change these functions. For example, you dont need to change SGD algorithm when you training a recognition task or a segmentation task. |
@Xiangyu-CAS my rpn-score loss fluctuate around 0.3,after 30k iteration.I choose coco-text for my training data, is there a better choice? |
@willyangyang indeed my score loss too revolves around 0.3 and smoothL1 loss at ~0.05 on MS-COCO. Even on a very small dataset I was trying to overfit just to test it. @Xiangyu-CAS any help or suggestions? |
@willyangyang @thisismohitgupta We trained our model on private dataset which consists of focused scene text images similar to ICDAR2013. I suggest you train your model on ICDAR2017 MLT DATASET, MLT DATASET consist of focused Latin languages including Italian , French and German. We provide 1,000 images for each of them. Some of them had been published, and the rest will come soon. You can find it on the competition websites. |
@willyangyang just got it working last night. I changed the weight distribution method originally used in paper and a higher learning rate of 0.1 and per channel mean subtraction. I dint test it on whole dataset just on a small subset so theres still work to be done. |
@thisismohitgupta how many iterations have you trained when convergence is met? |
@Xiangyu-CAS about GT calculation Then I iterate over each GT annotations which Ive aready sliced to 16 and then calculate the IOU of gt sliced and propsal box If its greater than 0.7 is mark the score as 1 for that particular proposal Loop goes on and the proposals that do not fall in the category of > 0.7 remain zero |
@thisismohitgupta From your description, it seems right, and that's what py-faster-rcnn had done except dividing width flexible GT to width fixed (16) GT |
@Xiangyu-CAS you were right MSCOCO text wont work here becuase more refined word level annotations are needed. So the reason I couldn't get past 50%acc in the score predictions which is even worse than random so I am trying your suggeted dataset of MLT |
Hi, tianzhi, May I have your email address? I have sth in details to talk with you. Sincerely waiting for your response. Thank you |
@thisismohitgupta @Xiangyu-CAS For training purposes, can proposals produced by class AnchorText (locate_anchors ) be used ? If not, am I supposed to take the image, feed it to py-faster rcnn , get those rpn propsals, and label them with IOU with gt. Would that be my ground truth for training? |
Hi @Xiangyu-CAS . What if I want to club words in a sentence together. If there is more space, this ctpn implementation tries to separate the sections. Even when I trained it on a custom dataset, ctpn predicts separate boxes for words some spaces apart. Is there a parameter I can tune to not do this? ie club words in a straight line together in one box? Thanks a lot for the help! |
@tonmoyborah The best way is to re-train this model with annotations labeled in sentence rather than word. What you get from NN heavily depends on what is GT looks like. If your GT are labeled in sentence level, then you get sentence level predcition. |
Hi, tianzhi,
I tried to implement CTPN training code on the framework of py-faster-rcnn (by RBG), but the results were different from yours (of course worse) .
The text was updated successfully, but these errors were encountered: