Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues about gen_gts_layer #42

Open
chunhui999 opened this issue Mar 21, 2019 · 13 comments
Open

issues about gen_gts_layer #42

chunhui999 opened this issue Mar 21, 2019 · 13 comments

Comments

@chunhui999
Copy link

Q1:in train.pt ,"gt_bbox" is noted by ” N * 8 ### grounding truth boxes for text (for computing loss)”
but in Class gen_gts_layer which in tool_layers.py it is noted by "bottom[0]: gt_label [N,1,sz,sz]"
What does gt_bbox mean?
Q2:Could you please provide an intuitive explanation of what the following variables are ?
'sample_gt_cont'
'sample_gt_label_input'
'sample_gt_label_output'

@chunhui999
Copy link
Author

@tonghe90 I have another question. If I use ICDAR2015, how to generate the data about "mask_gt" and "mask_iou_angle". Looking forward to your reply.

@chunhui999
Copy link
Author

@tonghe90 看完代码我发现文本识别部分的文本标签也包含在gt_bbox中。对于某个gt_bbox,其前8个元素表示bbox坐标,第9个元素表示文本标签的长度,从10开始的label_len个元素表示文本标签,这里完整的标签文本分为单个的元素,其类型是什么,是如何转换的?
layer {
name: 'iou_maps_angles'
type: 'Python'
bottom: 'gt_bbox'
top: 'rois'
top: 'sample_gt_cont'
top: 'sample_gt_label_input'
top: "sample_gt_label_output"
......
}

@crazysal
Copy link

mask_gt is generated only for dataset having character level annotation : Synthtext. Check section 2.3 of paper for training strategy.

mask_iou_angle is generated from output of East proposals in case of rbox (rotated rectangle bounding box) - Output of east is distances of pixel from sides of quadrilateral and angle in 5 channels.

sample_gt_cont is vector of shape of gt labels having zeroes and ones, used for continuity of hidden state of lstm : multiply 0 to hidden state, when start of predict new box, rest values 1.

sample_gt_label_input : one hot encoding or character embedding of each label from groundtruth - shape also used to pad max length of sequence when less than 25 .

sample_gt_label_output : similar as above but for during inference time. used to keep track of how many decoder samples to predict as fed into previous input.

Please correct me if i'm wrong ??

@chunhui999
Copy link
Author

@crazysal Thanks for your reply. I think you are right, and it helps me a lot.

@chunhui999
Copy link
Author

@crazysal Could you tell me how to deal with text labels, and what's the format of text label in gt_bbox?

@wenston2006
Copy link

@crazysal 有没有成功复现训练部分的代码,我基于@tonghe的代码尝试复现训练部分的代码,但遇到segmentation fault的问题,

@wenston2006
Copy link

@chunhui999 @crazysal 细看代码发现, 前面8个是坐标,第十个是标签长度, 第九个没用上,不知是不是我弄错了;python 里面元素下标从0开始的,

@wenston2006
Copy link

@crazysal 数据层我修改了@argman的east python数据层, 我把loss_4s和iou_loss都注释掉了,只训练文字识别的softmaxloss; 但不知为何出现内存溢出的问题;不知你的数据层用什么代码编写的;不知你的数据层怎么编写的? 在@tonghe给的代码基础上,加上自己的数据层和iou_loss层是否就可以成功训练了?

@chunhui999
Copy link
Author

@wenston2006 下标索引你说的是对的,我之前忽略了这个问题。那么假设忽略第9个元素,其他的前移,那么你的gt_label格式是这样吗?(x1, y1, x2, y2, x3, y3, x4,y4, len, 't', 'e', 'x', 't')

@wenston2006
Copy link

@chunhui999 我的理解是这样的,但我目前训练时遇到内存溢出(segmentation fault)的问题; 目前还不清楚是数据层还是别的层存在问题;

@chunhui999
Copy link
Author

@wenston2006 我也遇到了内存溢出的问题,应该是输入图片尺寸的问题,我把resize尺寸改小了一倍(参照之前测试当中遇到的内存溢出问题),就可以训练了。

@ustczhouyu
Copy link

@wenston2006 请问你训练成功了吗?结果怎么样?

@ZDDEAN
Copy link

ZDDEAN commented Sep 20, 2019

请问如何能分享一下synthtext格式转换为icdar格式的脚本吗,谢谢鸭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants