Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DCN #73

Closed
ywaner opened this issue Oct 12, 2022 · 5 comments
Closed

Add DCN #73

ywaner opened this issue Oct 12, 2022 · 5 comments

Comments

@ywaner
Copy link

ywaner commented Oct 12, 2022

Hello, Thank you for your insight work! I have a question when I add a DCN structure for the teacher or student, e.g, faster_rcnn_r50_fpn_dconv_c3-c5, cascade_mask_rcnn_x101_32x4d_fpn_dconv_c3-c5, loss_fgd_fpn and total loss will become too large to train? I am wondering the reason behind it.

Here is the log output:

2022-10-10 15:07:23,967 - mmdet - INFO - Epoch [1][50/16178]	lr: 9.890e-04, eta: 3 days, 7:50:51, time: 1.481, data_time: 0.072, memory: 11330, loss_cls: 0.6365, loss_bbox: 0.6870, loss_dfl: 0.3843, loss_fgd_fpn_4: 0.0002, loss_fgd_fpn_3: 0.0181, loss_fgd_fpn_2: 0.1217, loss_fgd_fpn_1: 1.3931, loss_fgd_fpn_0: 6.3135, loss: 9.5544, grad_norm: 68.6224
2022-10-10 15:08:34,256 - mmdet - INFO - Epoch [1][100/16178]	lr: 1.988e-03, eta: 3 days, 5:47:53, time: 1.406, data_time: 0.010, memory: 11331, loss_cls: 0.5288, loss_bbox: 0.5811, loss_dfl: 0.3424, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0081, loss_fgd_fpn_2: 0.0561, loss_fgd_fpn_1: 0.9112, loss_fgd_fpn_0: 4.2769, loss: 6.7047, grad_norm: 23.8445
2022-10-10 15:09:45,544 - mmdet - INFO - Epoch [1][150/16178]	lr: 2.987e-03, eta: 3 days, 5:27:40, time: 1.426, data_time: 0.010, memory: 11332, loss_cls: 0.5084, loss_bbox: 0.5301, loss_dfl: 0.3167, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0086, loss_fgd_fpn_2: 0.0718, loss_fgd_fpn_1: 1.0272, loss_fgd_fpn_0: 4.8596, loss: 7.3225, grad_norm: 37.7240
2022-10-10 15:10:57,655 - mmdet - INFO - Epoch [1][200/16178]	lr: 3.986e-03, eta: 3 days, 5:30:15, time: 1.442, data_time: 0.010, memory: 11332, loss_cls: 0.5128, loss_bbox: 0.5244, loss_dfl: 0.3146, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0079, loss_fgd_fpn_2: 0.0572, loss_fgd_fpn_1: 1.0461, loss_fgd_fpn_0: 4.8641, loss: 7.3272, grad_norm: 30.0396
2022-10-10 15:12:08,429 - mmdet - INFO - Epoch [1][250/16178]	lr: 4.985e-03, eta: 3 days, 5:14:02, time: 1.415, data_time: 0.010, memory: 11332, loss_cls: 0.5229, loss_bbox: 0.5141, loss_dfl: 0.3086, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0068, loss_fgd_fpn_2: 0.0506, loss_fgd_fpn_1: 0.8142, loss_fgd_fpn_0: 3.9057, loss: 6.1230, grad_norm: 12.6570
2022-10-10 15:13:19,710 - mmdet - INFO - Epoch [1][300/16178]	lr: 5.984e-03, eta: 3 days, 5:08:18, time: 1.426, data_time: 0.010, memory: 11332, loss_cls: 0.4991, loss_bbox: 0.5199, loss_dfl: 0.3110, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0072, loss_fgd_fpn_2: 0.0523, loss_fgd_fpn_1: 0.8164, loss_fgd_fpn_0: 3.7634, loss: 5.9694, grad_norm: 11.5621
2022-10-10 15:14:30,729 - mmdet - INFO - Epoch [1][350/16178]	lr: 6.983e-03, eta: 3 days, 5:01:27, time: 1.420, data_time: 0.010, memory: 11332, loss_cls: 0.5207, loss_bbox: 0.5134, loss_dfl: 0.3113, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0073, loss_fgd_fpn_2: 0.0541, loss_fgd_fpn_1: 0.8442, loss_fgd_fpn_0: 4.1035, loss: 6.3546, grad_norm: 13.2545
2022-10-10 15:15:41,433 - mmdet - INFO - Epoch [1][400/16178]	lr: 7.982e-03, eta: 3 days, 4:53:28, time: 1.414, data_time: 0.010, memory: 11332, loss_cls: 0.4979, loss_bbox: 0.5297, loss_dfl: 0.3136, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0070, loss_fgd_fpn_2: 0.0515, loss_fgd_fpn_1: 0.7945, loss_fgd_fpn_0: 3.7389, loss: 5.9332, grad_norm: 8.9079
2022-10-10 15:16:52,334 - mmdet - INFO - Epoch [1][450/16178]	lr: 8.981e-03, eta: 3 days, 4:48:25, time: 1.418, data_time: 0.010, memory: 11332, loss_cls: 0.5142, loss_bbox: 0.5102, loss_dfl: 0.3119, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0068, loss_fgd_fpn_2: 0.0489, loss_fgd_fpn_1: 0.7655, loss_fgd_fpn_0: 3.7806, loss: 5.9382, grad_norm: 9.4913
2022-10-10 15:18:03,443 - mmdet - INFO - Epoch [1][500/16178]	lr: 9.980e-03, eta: 3 days, 4:45:28, time: 1.422, data_time: 0.010, memory: 11332, loss_cls: 0.5051, loss_bbox: 0.5061, loss_dfl: 0.3060, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0071, loss_fgd_fpn_2: 0.0511, loss_fgd_fpn_1: 0.7981, loss_fgd_fpn_0: 3.7618, loss: 5.9355, grad_norm: 9.1807
2022-10-10 15:19:14,859 - mmdet - INFO - Epoch [1][550/16178]	lr: 1.000e-02, eta: 3 days, 4:44:39, time: 1.428, data_time: 0.010, memory: 11332, loss_cls: 0.5116, loss_bbox: 0.4758, loss_dfl: 0.2929, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0071, loss_fgd_fpn_2: 0.0524, loss_fgd_fpn_1: 0.7696, loss_fgd_fpn_0: 3.6697, loss: 5.7791, grad_norm: 10.2593
2022-10-10 15:20:26,284 - mmdet - INFO - Epoch [1][600/16178]	lr: 1.000e-02, eta: 3 days, 4:43:49, time: 1.428, data_time: 0.010, memory: 11332, loss_cls: 0.5084, loss_bbox: 0.5035, loss_dfl: 0.3048, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0074, loss_fgd_fpn_2: 0.0538, loss_fgd_fpn_1: 0.8274, loss_fgd_fpn_0: 3.9499, loss: 6.1554, grad_norm: 11.5289
2022-10-10 15:21:38,146 - mmdet - INFO - Epoch [1][650/16178]	lr: 1.000e-02, eta: 3 days, 4:45:06, time: 1.437, data_time: 0.010, memory: 11332, loss_cls: 0.5201, loss_bbox: 0.5220, loss_dfl: 0.3139, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0105, loss_fgd_fpn_2: 0.0730, loss_fgd_fpn_1: 0.8779, loss_fgd_fpn_0: 4.0663, loss: 6.3838, grad_norm: 12.7775
2022-10-10 15:22:49,456 - mmdet - INFO - Epoch [1][700/16178]	lr: 1.000e-02, eta: 3 days, 4:43:29, time: 1.426, data_time: 0.010, memory: 11332, loss_cls: 0.5132, loss_bbox: 0.5461, loss_dfl: 0.3226, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0079, loss_fgd_fpn_2: 0.0564, loss_fgd_fpn_1: 0.8152, loss_fgd_fpn_0: 3.8834, loss: 6.1450, grad_norm: 11.4293
2022-10-10 15:24:00,110 - mmdet - INFO - Epoch [1][750/16178]	lr: 1.000e-02, eta: 3 days, 4:39:06, time: 1.413, data_time: 0.010, memory: 11332, loss_cls: 0.5316, loss_bbox: 0.5172, loss_dfl: 0.3141, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0076, loss_fgd_fpn_2: 0.0548, loss_fgd_fpn_1: 0.8379, loss_fgd_fpn_0: 3.9216, loss: 6.1849, grad_norm: 8.1157
2022-10-10 15:25:11,329 - mmdet - INFO - Epoch [1][800/16178]	lr: 1.000e-02, eta: 3 days, 4:37:24, time: 1.424, data_time: 0.010, memory: 11332, loss_cls: 0.4892, loss_bbox: 0.4858, loss_dfl: 0.2968, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0069, loss_fgd_fpn_2: 0.0519, loss_fgd_fpn_1: 0.7800, loss_fgd_fpn_0: 3.5984, loss: 5.7091, grad_norm: 7.2720
2022-10-10 15:26:21,786 - mmdet - INFO - Epoch [1][850/16178]	lr: 1.000e-02, eta: 3 days, 4:32:53, time: 1.409, data_time: 0.010, memory: 11332, loss_cls: 0.5169, loss_bbox: 0.5149, loss_dfl: 0.3076, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0076, loss_fgd_fpn_2: 0.0551, loss_fgd_fpn_1: 0.8749, loss_fgd_fpn_0: 3.9996, loss: 6.2768, grad_norm: 11.8264
2022-10-10 15:27:32,728 - mmdet - INFO - Epoch [1][900/16178]	lr: 1.000e-02, eta: 3 days, 4:30:27, time: 1.419, data_time: 0.010, memory: 11332, loss_cls: 0.5167, loss_bbox: 0.4732, loss_dfl: 0.2936, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0070, loss_fgd_fpn_2: 0.0519, loss_fgd_fpn_1: 0.7775, loss_fgd_fpn_0: 3.6971, loss: 5.8172, grad_norm: 8.8075
2022-10-10 15:28:44,486 - mmdet - INFO - Epoch [1][950/16178]	lr: 1.000e-02, eta: 3 days, 4:30:56, time: 1.435, data_time: 0.010, memory: 11332, loss_cls: 0.4985, loss_bbox: 0.4969, loss_dfl: 0.3020, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0066, loss_fgd_fpn_2: 0.0498, loss_fgd_fpn_1: 0.7534, loss_fgd_fpn_0: 3.6059, loss: 5.7131, grad_norm: 7.6654
2022-10-10 15:29:55,254 - mmdet - INFO - Epoch [1][1000/16178]	lr: 1.000e-02, eta: 3 days, 4:28:03, time: 1.415, data_time: 0.010, memory: 11332, loss_cls: 0.4810, loss_bbox: 0.5045, loss_dfl: 0.3053, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0063, loss_fgd_fpn_2: 0.0447, loss_fgd_fpn_1: 0.6467, loss_fgd_fpn_0: 3.1656, loss: 5.1542, grad_norm: 6.4960
2022-10-10 15:31:05,415 - mmdet - INFO - Epoch [1][1050/16178]	lr: 1.000e-02, eta: 3 days, 4:23:28, time: 1.403, data_time: 0.010, memory: 11332, loss_cls: 0.5539, loss_bbox: 0.5281, loss_dfl: 0.3202, loss_fgd_fpn_4: 0.0002, loss_fgd_fpn_3: 0.0451, loss_fgd_fpn_2: 0.2862, loss_fgd_fpn_1: 10.4438, loss_fgd_fpn_0: 30.4854, loss: 42.6630, grad_norm: 3034.1699
2022-10-10 15:32:16,896 - mmdet - INFO - Epoch [1][1100/16178]	lr: 1.000e-02, eta: 3 days, 4:23:04, time: 1.430, data_time: 0.010, memory: 11332, loss_cls: 0.5293, loss_bbox: 0.5076, loss_dfl: 0.3075, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0076, loss_fgd_fpn_2: 0.0534, loss_fgd_fpn_1: 0.8656, loss_fgd_fpn_0: 3.8698, loss: 6.1410, grad_norm: 15.3256
2022-10-10 15:33:27,140 - mmdet - INFO - Epoch [1][1150/16178]	lr: 1.000e-02, eta: 3 days, 4:19:08, time: 1.405, data_time: 0.010, memory: 11332, loss_cls: 0.5029, loss_bbox: 0.4510, loss_dfl: 0.2841, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0064, loss_fgd_fpn_2: 0.0465, loss_fgd_fpn_1: 0.6350, loss_fgd_fpn_0: 3.1598, loss: 5.0856, grad_norm: 6.5924
2022-10-10 15:34:37,744 - mmdet - INFO - Epoch [1][1200/16178]	lr: 1.000e-02, eta: 3 days, 4:16:23, time: 1.412, data_time: 0.010, memory: 11332, loss_cls: 0.4919, loss_bbox: 0.4949, loss_dfl: 0.3020, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0075, loss_fgd_fpn_2: 0.0529, loss_fgd_fpn_1: 0.7247, loss_fgd_fpn_0: 3.5468, loss: 5.6208, grad_norm: 6.6782
2022-10-10 15:35:48,395 - mmdet - INFO - Epoch [1][1250/16178]	lr: 1.000e-02, eta: 3 days, 4:13:54, time: 1.413, data_time: 0.010, memory: 11332, loss_cls: 0.5052, loss_bbox: 0.4780, loss_dfl: 0.2934, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0071, loss_fgd_fpn_2: 0.0507, loss_fgd_fpn_1: 0.6617, loss_fgd_fpn_0: 3.1991, loss: 5.1952, grad_norm: 7.1381
2022-10-10 15:36:59,383 - mmdet - INFO - Epoch [1][1300/16178]	lr: 1.000e-02, eta: 3 days, 4:12:20, time: 1.420, data_time: 0.010, memory: 11332, loss_cls: 0.4702, loss_bbox: 0.4689, loss_dfl: 0.2900, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0067, loss_fgd_fpn_2: 0.0483, loss_fgd_fpn_1: 0.6287, loss_fgd_fpn_0: 3.1048, loss: 5.0178, grad_norm: 6.6730
2022-10-10 15:38:09,195 - mmdet - INFO - Epoch [1][1350/16178]	lr: 1.000e-02, eta: 3 days, 4:08:00, time: 1.396, data_time: 0.010, memory: 11332, loss_cls: 0.5176, loss_bbox: 0.4569, loss_dfl: 0.2849, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0062, loss_fgd_fpn_2: 0.0443, loss_fgd_fpn_1: 0.5653, loss_fgd_fpn_0: 2.8550, loss: 4.7303, grad_norm: 6.2432
2022-10-10 15:39:19,317 - mmdet - INFO - Epoch [1][1400/16178]	lr: 1.000e-02, eta: 3 days, 4:04:37, time: 1.402, data_time: 0.010, memory: 11332, loss_cls: 0.4808, loss_bbox: 0.4700, loss_dfl: 0.2906, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0062, loss_fgd_fpn_2: 0.0451, loss_fgd_fpn_1: 0.5628, loss_fgd_fpn_0: 2.8688, loss: 4.7243, grad_norm: 5.8212
2022-10-10 15:40:29,784 - mmdet - INFO - Epoch [1][1450/16178]	lr: 1.000e-02, eta: 3 days, 4:02:08, time: 1.409, data_time: 0.010, memory: 11332, loss_cls: 0.4922, loss_bbox: 0.4429, loss_dfl: 0.2802, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0061, loss_fgd_fpn_2: 0.0443, loss_fgd_fpn_1: 0.5911, loss_fgd_fpn_0: 2.9405, loss: 4.7974, grad_norm: 7.1132
2022-10-10 15:41:40,482 - mmdet - INFO - Epoch [1][1500/16178]	lr: 1.000e-02, eta: 3 days, 4:00:15, time: 1.414, data_time: 0.010, memory: 11332, loss_cls: 0.4567, loss_bbox: 0.4931, loss_dfl: 0.3016, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0066, loss_fgd_fpn_2: 0.0472, loss_fgd_fpn_1: 0.6008, loss_fgd_fpn_0: 2.9786, loss: 4.8846, grad_norm: 6.0787
2022-10-10 15:42:51,679 - mmdet - INFO - Epoch [1][1550/16178]	lr: 1.000e-02, eta: 3 days, 3:59:26, time: 1.424, data_time: 0.010, memory: 11332, loss_cls: 0.4667, loss_bbox: 0.4836, loss_dfl: 0.2966, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0064, loss_fgd_fpn_2: 0.0461, loss_fgd_fpn_1: 0.6019, loss_fgd_fpn_0: 2.9062, loss: 4.8075, grad_norm: 7.3193
2022-10-10 15:44:02,559 - mmdet - INFO - Epoch [1][1600/16178]	lr: 1.000e-02, eta: 3 days, 3:57:57, time: 1.418, data_time: 0.010, memory: 11332, loss_cls: 0.5035, loss_bbox: 0.4874, loss_dfl: 0.2984, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0208, loss_fgd_fpn_2: 0.1415, loss_fgd_fpn_1: 12.0289, loss_fgd_fpn_0: 22.1169, loss: 35.5976, grad_norm: 9077.5504
2022-10-10 15:45:13,235 - mmdet - INFO - Epoch [1][1650/16178]	lr: 1.000e-02, eta: 3 days, 3:56:06, time: 1.414, data_time: 0.010, memory: 11332, loss_cls: 0.4893, loss_bbox: 0.4638, loss_dfl: 0.2870, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0065, loss_fgd_fpn_2: 0.0473, loss_fgd_fpn_1: 0.6394, loss_fgd_fpn_0: 3.2284, loss: 5.1618, grad_norm: 46.0868
2022-10-10 15:46:23,125 - mmdet - INFO - Epoch [1][1700/16178]	lr: 1.000e-02, eta: 3 days, 3:52:48, time: 1.398, data_time: 0.010, memory: 11332, loss_cls: 0.4880, loss_bbox: 0.4438, loss_dfl: 0.2836, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0061, loss_fgd_fpn_2: 0.0430, loss_fgd_fpn_1: 0.5120, loss_fgd_fpn_0: 2.5668, loss: 4.3435, grad_norm: 5.6961
2022-10-10 15:47:34,943 - mmdet - INFO - Epoch [1][1750/16178]	lr: 1.000e-02, eta: 3 days, 3:53:10, time: 1.436, data_time: 0.010, memory: 11332, loss_cls: 0.4759, loss_bbox: 0.4977, loss_dfl: 0.3028, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0070, loss_fgd_fpn_2: 0.0493, loss_fgd_fpn_1: 0.6108, loss_fgd_fpn_0: 3.0078, loss: 4.9514, grad_norm: 6.7499
2022-10-10 15:48:45,031 - mmdet - INFO - Epoch [1][1800/16178]	lr: 1.000e-02, eta: 3 days, 3:50:21, time: 1.402, data_time: 0.010, memory: 11332, loss_cls: 0.5020, loss_bbox: 0.4571, loss_dfl: 0.2838, loss_fgd_fpn_4: 0.0001, loss_fgd_fpn_3: 0.0067, loss_fgd_fpn_2: 0.0485, loss_fgd_fpn_1: 0.5821, loss_fgd_fpn_0: 2.9682, loss: 4.8486, grad_norm: 8.6155
2022-10-10 15:49:55,865 - mmdet - INFO - Epoch [1][1850/16178]	lr: 1.000e-02, eta: 3 days, 3:48:56, time: 1.417, data_time: 0.010, memory: 11332, loss_cls: 0.5524, loss_bbox: 0.5681, loss_dfl: 0.3350, loss_fgd_fpn_4: 0.0234, loss_fgd_fpn_3: 20.7488, loss_fgd_fpn_2: 135.5364, loss_fgd_fpn_1: 4514.5350, loss_fgd_fpn_0: 5980.8744, loss: 10653.1733, grad_norm: 693299.5836
@yzd-v
Copy link
Owner

yzd-v commented Oct 12, 2022

It seems the grad explodes. Maybe tha gap between them is too largre. If the loss keeps too large, you can try to decrease the grad_norm to avoid explosion.

@ywaner
Copy link
Author

ywaner commented Oct 13, 2022

It seems the grad explodes. Maybe tha gap between them is too largre. If the loss keeps too large, you can try to decrease the grad_norm to avoid explosion.

Thanks for your reply! But it still grad explodes when I try to make the teacher and student model the same, that is make the model distill itself. I think this case may not be due to the gap between the teacher and student. I also try to distill the model itself without adding DCN, it trains well.

@ywaner
Copy link
Author

ywaner commented Oct 13, 2022

I also try on your further fantastic work of MGD, it also has the same situation when I add DCN.

@yzd-v
Copy link
Owner

yzd-v commented Oct 13, 2022

Fine, I don't know the reason either. However, RepPoints can be trained with DCN in the congig

@ywaner
Copy link
Author

ywaner commented Oct 14, 2022

OK, thanks a lot! I will do more experiments later.

@ywaner ywaner closed this as completed Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants