-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to train it on my own dataset #3
Comments
Hi, When I train the models such as |
@makefile If you don't want to remove the redundant gt boxes, you can simply set gt_iou_thr=1.0 or higher. But a more important problem is you might not have enough proposals. In your case of error, only gt boxes and no negative box. You can try to lower the proposal threshold in "BoxGroupOutput" layer to have more proposals. Or your training is diverging and crashed. You can also try to use a lower learning rate. |
@zhaoweicai Thanks! Follow your advice, set lower the fg_thr in BoxGroupOutput layer, the problem disappeared. |
@zhaoweicai @makefile I try to train cascade rcnn on my own dataset, and I got this problem, I tried to lower the iou_thr in "BoxGroupOutput" layer but the problem still there, can you give me any suggestion. |
The error seems related to multiple gpus. When I tried single gpu (not all GPU ids, gpu id 1 is fine, but gpu id 2 encounters same above error), training proceeds; however, with 2 gpus, encountered same above error. |
@Peng-wei-Yu try lower the score of |
FYI. coco model seems to work fine (e.g. coco/res50-15s-800-fpn-cascade is fine, res101 runs out of GPU memory on 1080 Ti), suggest you switch to coco flavor from voc. |
@Peng-wei-Yu when you change the number of GPUs, you should change the learning rate at the same time, as described in the paper. |
@jwnsu The code should have no problem on multi-gpu training or VOC dataset. Try the run the script a couple of times to see if the problem still happens. If the problem is still there, try to lower the learning rate a little bit. If it still cannot be fixed, maybe there is something wrong. |
@makefile @zhaoweicai When you trained cascade rcnn on your own data, which caffemodel did you use. Your own caffemodel or ResNet-50-model-merge.caffemodel. The picture in my own data have the size of 1600*1200, should I change the short_size and long_size in train.prototxt. |
@Peng-wei-Yu If you use the author's prototxt, you should use the corresponding ResNet-50-model-merge.caffemodel, since it merges the BN layer to scale layer to reduce memory and speed up. You can increase the input size of image if your memory is enough, but the result may not increase too much. |
@makefile Thank you very much. I'll have a try by using ResNet-50-model-merge.caffemodel. |
@makefile @Peng-wei-Yu in BoxGroupOutput layer,the original setting is 0.001, you finally set it? |
@makefile @Peng-wei-Yu |
@GuoxingYan I set fg_thr: 0.01 or 0 in all BoxGroupOutput layer. If your positive rois num is always 0, maybe your dataset has some problem. |
@makefile Did you try to change the short_size and long_size in train.prototxt?when i only changed the short_size or long_size ,There will be an error。 |
@GuoxingYan I did not try to change that, since there use Deconvolution layer to upsample, the size maybe need to be multiplier of 32, 64 or larger. |
@makefile thank you very much!! |
@makefile Will you have the following problems when training fpn? |
@GuoxingYan I didn't met. the integer seems to be abnormal big. |
@Peng-wei-Yu @zhaoweicai my own data size is 960*1280,I try to use the ResNet-50-model-merge.caffemodel, but I also get this problem. |
@makefile @zhaoweicai @Peng-wei-Yu When I was training, I found that the short_size in detection_data_param in trian.prototxt is 800, which is exactly equal to img_width and img_height in proposal_target_param. So the question arises. When I change the short_size to 320, does the img_width and img_height need to be changed to 320? |
@GuoxingYan I think it needs to be. |
@makefile I use to train my owe dataset,how can I get the output for every picture? |
@licy5152 I wrote a python script CascadeRCNN-demo.py imitate the matlab code, you can modify it to use. |
@makefile 你的demo.py 显示无效链接诶。 |
@GuoxingYan 你的网络问题吧 |
@makefile @zhaoweicai |
@PacteraKun The situation you encountered is unusual, check carefully. |
@makefile |
@PacteraKun I once trained several model, but failed to visualize the demo result. Later I transplant it to my own familiar framework to use. |
@makefile 请问下,你test那个python文件中的labelmap_file是什么呢? |
@GuoxingYan @zhaoweicai I0806 23:44:24.048591 20123 solver.cpp:219] Iteration 9900 (2.14913 iter/s, 46.5305s/100 iters), loss = 0.440841 |
@Emmra https://blog.csdn.net/e01528/article/details/80913443 希望能帮到你,可以的话,帮忙点个赞。 |
@Emmra 保存不了caffemodel的问题我没有遇到 |
@lzh19961031 那个检测的python你能打开吗? |
@makefile @GuoxingYan @licy5152 请问,在train.prototext文件中的long_size和short_size的作用是什么呢?我得数据集中有的图片长宽分别为6000和4000,我需要在设置这两个参数为6000和4000吗?谢谢! |
@makefile @zhaoweicai Hi,I try to lower the fg_th in BoxGroupOutput layer, but I still get the problem of |
@huinsysu the size is about input resize. 6000x4000 maybe too large to fit into 1 gpu. |
@licy5152 Hi, when I trained the model with my own dataset, I met the same error as you met. Would you please tell me how you solve such problem? Thanks! |
@zhaoweicai May I know the intuition of using fg_thr ( or when the cls_score is 0.99 or higher) to filter the bboxes? It seems that you drop all those bboxes. ( they dont even get into the nms_by_cls_score or proposal stage). So why drop the bbox whose cls_score is higher than 0.99 by default? |
网络能正常训练了,但是每次 Ctrl +c 终止程序,会出现 “irq/132-nvidia”的root进程,cpu100%占用,内存占用0,重新执行训练会卡在最开始的地方,Nvidia-smi也卡住了:
|
@makefile At last,are you satisfied with you results about your datasets? I am preparing for train my dataset in my datasets. |
@hu5tao not bad. |
@makefile 你好,我试了好几次都不能打开 CascadeRCNN-demo.py的链接,请问你是否方便发给我一份?422246019@qq.com 谢谢了! |
@lininglouis |
when I train my own data ,it has a error,but I don't know why,could you give me some ideas? Thanks a lot I0604 13:28:15.270220 87804 detection_data_layer.cpp:142] num: 0 /home/zhulei/data/VOCdevkit/VOC2007/JPEGImages/IMG_0_112.jpg 3 1080 1920 windows to process: 36, RONI windows: 0 |
hi! I want to train cascade-rcnn on my own dataset (three classes). I don't know how to modify the files(eg. examples/voc/). Can you give me some instructions? Thank you!
The text was updated successfully, but these errors were encountered: