Exploding loss after few iterations of training of Faster RCNN ResNet50 #8423

Boltuzamaki · 2020-04-22T08:25:31Z

System information
What is the top-level directory of the model you are using:
object_detection

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
use train.py script on my own dataset

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Using google collab

TensorFlow installed from (source or binary):
changed to %tensorflow_version 1.x (in collab)

TensorFlow version (use command below):
1.15.2

CUDA/cuDNN version:
Using google collab

GPU model and memory:
12GB NVIDIA Tesla K80 GP(I guess as its google collab)

I am using faster_rcnn_resnet50_coco and facing problem of exploding loss after few iteration of training

I relabelled my dataset from scratch too to avoid any annotation error .
I rechecked labelmap.pbtxt many times .
I rechecked classes and also I tried to change different learning rates and gradient clipping .
My tfrecord is good too
I recheck csv files which created .

I am training on only one class but gradient is exploding after few iterations exponentially please help. I am using dataset Penn-Fudan Database for Pedestrian Detection

Following is my config file

model {
faster_rcnn {
num_classes: 1
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 400
max_dimension: 600
}
}
feature_extractor {
type: 'faster_rcnn_resnet50'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SIGMOID
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}

train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0001
schedule {
step: 900000
learning_rate: .000001
}
schedule {
step: 1200000
learning_rate: .000001
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 5.0
fine_tune_checkpoint: "/content/drive/My Drive/Tensorflow/models/faster_rcnn_resnet50_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true

num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}

train_input_reader: {
tf_record_input_reader {
input_path: "/content/drive/My Drive/Tensorflow/models/train.record"
}
label_map_path: "/content/drive/My Drive/Tensorflow/models/training/labelmap.pbtxt"
}

eval_config: {
num_examples: 36

max_evals: 10
}

eval_input_reader: {
tf_record_input_reader {
input_path: "/content/drive/My Drive/Tensorflow/models/test.record"
}
label_map_path: "/content/drive/My Drive/Tensorflow/models/training/labelmap.pbtxt"
shuffle: false
num_readers: 1
}

My training log

I0422 07:09:55.320835 140486560917376 learning.py:507] global step 374: loss = 0.2636 (0.294 sec/step)
INFO:tensorflow:global step 375: loss = 0.6618 (0.306 sec/step)
I0422 07:09:55.628338 140486560917376 learning.py:507] global step 375: loss = 0.6618 (0.306 sec/step)
INFO:tensorflow:global step 376: loss = 0.3663 (0.289 sec/step)
I0422 07:09:55.919772 140486560917376 learning.py:507] global step 376: loss = 0.3663 (0.289 sec/step)
INFO:tensorflow:global step 377: loss = 0.2678 (0.292 sec/step)
I0422 07:09:56.213927 140486560917376 learning.py:507] global step 377: loss = 0.2678 (0.292 sec/step)
INFO:tensorflow:global step 378: loss = 0.3992 (0.293 sec/step)
I0422 07:09:56.509216 140486560917376 learning.py:507] global step 378: loss = 0.3992 (0.293 sec/step)
INFO:tensorflow:global step 379: loss = 0.4918 (0.264 sec/step)
I0422 07:09:56.775810 140486560917376 learning.py:507] global step 379: loss = 0.4918 (0.264 sec/step)
INFO:tensorflow:global step 380: loss = 0.2143 (0.269 sec/step)
I0422 07:09:57.046996 140486560917376 learning.py:507] global step 380: loss = 0.2143 (0.269 sec/step)
INFO:tensorflow:global step 381: loss = 1.0149 (0.273 sec/step)
I0422 07:09:57.321473 140486560917376 learning.py:507] global step 381: loss = 1.0149 (0.273 sec/step)
INFO:tensorflow:global step 382: loss = 0.2884 (0.262 sec/step)
I0422 07:09:57.585358 140486560917376 learning.py:507] global step 382: loss = 0.2884 (0.262 sec/step)
INFO:tensorflow:global step 383: loss = 0.3797 (0.289 sec/step)
I0422 07:09:57.876454 140486560917376 learning.py:507] global step 383: loss = 0.3797 (0.289 sec/step)
INFO:tensorflow:global step 384: loss = 0.6466 (0.306 sec/step)
I0422 07:09:58.183916 140486560917376 learning.py:507] global step 384: loss = 0.6466 (0.306 sec/step)
INFO:tensorflow:global step 385: loss = 0.0887 (0.271 sec/step)
I0422 07:09:58.457142 140486560917376 learning.py:507] global step 385: loss = 0.0887 (0.271 sec/step)
INFO:tensorflow:global step 386: loss = 0.7238 (0.297 sec/step)
I0422 07:09:58.756018 140486560917376 learning.py:507] global step 386: loss = 0.7238 (0.297 sec/step)
INFO:tensorflow:global step 387: loss = 0.8665 (0.314 sec/step)
I0422 07:09:59.072811 140486560917376 learning.py:507] global step 387: loss = 0.8665 (0.314 sec/step)
INFO:tensorflow:global step 388: loss = 1.0504 (0.292 sec/step)
I0422 07:09:59.366814 140486560917376 learning.py:507] global step 388: loss = 1.0504 (0.292 sec/step)
INFO:tensorflow:global step 389: loss = 1.0150 (0.290 sec/step)
I0422 07:09:59.661545 140486560917376 learning.py:507] global step 389: loss = 1.0150 (0.290 sec/step)
INFO:tensorflow:global step 390: loss = 0.6820 (0.284 sec/step)
I0422 07:09:59.947800 140486560917376 learning.py:507] global step 390: loss = 0.6820 (0.284 sec/step)
INFO:tensorflow:global step 391: loss = 1.7842 (0.266 sec/step)
I0422 07:10:00.215992 140486560917376 learning.py:507] global step 391: loss = 1.7842 (0.266 sec/step)
INFO:tensorflow:global step 392: loss = 5.3590 (0.257 sec/step)
I0422 07:10:00.474348 140486560917376 learning.py:507] global step 392: loss = 5.3590 (0.257 sec/step)
INFO:tensorflow:global step 393: loss = 5.0548 (0.304 sec/step)
I0422 07:10:00.780560 140486560917376 learning.py:507] global step 393: loss = 5.0548 (0.304 sec/step)
INFO:tensorflow:global step 394: loss = 7.1372 (0.302 sec/step)
I0422 07:10:01.084518 140486560917376 learning.py:507] global step 394: loss = 7.1372 (0.302 sec/step)
INFO:tensorflow:global step 395: loss = 0.3071 (0.259 sec/step)
I0422 07:10:01.345200 140486560917376 learning.py:507] global step 395: loss = 0.3071 (0.259 sec/step)
INFO:tensorflow:global step 396: loss = 33.3896 (0.267 sec/step)
I0422 07:10:01.613836 140486560917376 learning.py:507] global step 396: loss = 33.3896 (0.267 sec/step)
INFO:tensorflow:global step 397: loss = 0.1341 (0.285 sec/step)
I0422 07:10:01.900630 140486560917376 learning.py:507] global step 397: loss = 0.1341 (0.285 sec/step)
INFO:tensorflow:global step 398: loss = 20.5861 (0.252 sec/step)
I0422 07:10:02.154454 140486560917376 learning.py:507] global step 398: loss = 20.5861 (0.252 sec/step)
INFO:tensorflow:global step 399: loss = 0.2996 (0.260 sec/step)
I0422 07:10:02.416572 140486560917376 learning.py:507] global step 399: loss = 0.2996 (0.260 sec/step)
INFO:tensorflow:global step 400: loss = 53.6560 (0.276 sec/step)
I0422 07:10:02.693947 140486560917376 learning.py:507] global step 400: loss = 53.6560 (0.276 sec/step)
INFO:tensorflow:global step 401: loss = 354.2955 (0.303 sec/step)
I0422 07:10:02.998691 140486560917376 learning.py:507] global step 401: loss = 354.2955 (0.303 sec/step)
INFO:tensorflow:global step 402: loss = 138.3654 (0.270 sec/step)
I0422 07:10:03.271044 140486560917376 learning.py:507] global step 402: loss = 138.3654 (0.270 sec/step)
INFO:tensorflow:global step 403: loss = 81.3835 (0.292 sec/step)
I0422 07:10:03.565308 140486560917376 learning.py:507] global step 403: loss = 81.3835 (0.292 sec/step)
INFO:tensorflow:global step 404: loss = 2043.4795 (0.282 sec/step)
I0422 07:10:03.848872 140486560917376 learning.py:507] global step 404: loss = 2043.4795 (0.282 sec/step)
INFO:tensorflow:global step 405: loss = 0.0435 (0.297 sec/step)
I0422 07:10:04.148034 140486560917376 learning.py:507] global step 405: loss = 0.0435 (0.297 sec/step)
INFO:tensorflow:global step 406: loss = 7290.5103 (0.267 sec/step)
I0422 07:10:04.417101 140486560917376 learning.py:507] global step 406: loss = 7290.5103 (0.267 sec/step)
INFO:tensorflow:global step 407: loss = 7957.7422 (0.261 sec/step)
I0422 07:10:04.680258 140486560917376 learning.py:507] global step 407: loss = 7957.7422 (0.261 sec/step)
INFO:tensorflow:global step 408: loss = 0.1442 (0.302 sec/step)
I0422 07:10:04.984018 140486560917376 learning.py:507] global step 408: loss = 0.1442 (0.302 sec/step)
INFO:tensorflow:global step 409: loss = 25237.8984 (0.273 sec/step)
I0422 07:10:05.258542 140486560917376 learning.py:507] global step 409: loss = 25237.8984 (0.273 sec/step)
INFO:tensorflow:global step 410: loss = 75835.2812 (0.319 sec/step)
I0422 07:10:05.579621 140486560917376 learning.py:507] global step 410: loss = 75835.2812 (0.319 sec/step)
INFO:tensorflow:global step 411: loss = 28575.1914 (0.250 sec/step)
I0422 07:10:05.832293 140486560917376 learning.py:507] global step 411: loss = 28575.1914 (0.250 sec/step)
INFO:tensorflow:global step 412: loss = 134869.8906 (0.293 sec/step)
I0422 07:10:06.129227 140486560917376 learning.py:507] global step 412: loss = 134869.8906 (0.293 sec/step)
INFO:tensorflow:global step 413: loss = 437442.4062 (0.296 sec/step)
I0422 07:10:06.427104 140486560917376 learning.py:507] global step 413: loss = 437442.4062 (0.296 sec/step)
INFO:tensorflow:global step 414: loss = 212268.4531 (0.255 sec/step)
I0422 07:10:06.684252 140486560917376 learning.py:507] global step 414: loss = 212268.4531 (0.255 sec/step)
INFO:tensorflow:global step 415: loss = 1216893.1250 (0.276 sec/step)
I0422 07:10:06.961721 140486560917376 learning.py:507] global step 415: loss = 1216893.1250 (0.276 sec/step)
INFO:tensorflow:global step 416: loss = 0.1749 (0.262 sec/step)
I0422 07:10:07.225651 140486560917376 learning.py:507] global step 416: loss = 0.1749 (0.262 sec/step)
INFO:tensorflow:global step 417: loss = 2736256.2500 (0.312 sec/step)
I0422 07:10:07.539854 140486560917376 learning.py:507] global step 417: loss = 2736256.2500 (0.312 sec/step)
INFO:tensorflow:global step 418: loss = 4241052.0000 (0.263 sec/step)
I0422 07:10:07.805094 140486560917376 learning.py:507] global step 418: loss = 4241052.0000 (0.263 sec/step)
INFO:tensorflow:global step 419: loss = 4462876.0000 (0.266 sec/step)
I0422 07:10:08.073152 140486560917376 learning.py:507] global step 419: loss = 4462876.0000 (0.266 sec/step)
INFO:tensorflow:global step 420: loss = 18808836.0000 (0.295 sec/step)
I0422 07:10:08.370062 140486560917376 learning.py:507] global step 420: loss = 18808836.0000 (0.295 sec/step)
INFO:tensorflow:global step 421: loss = 96460304.0000 (0.288 sec/step)
I0422 07:10:08.660426 140486560917376 learning.py:507] global step 421: loss = 96460304.0000 (0.288 sec/step)
INFO:tensorflow:global step 422: loss = 85134320.0000 (0.320 sec/step)
I0422 07:10:08.982865 140486560917376 learning.py:507] global step 422: loss = 85134320.0000 (0.320 sec/step)
INFO:tensorflow:global step 423: loss = 364593632.0000 (0.257 sec/step)
I0422 07:10:09.241693 140486560917376 learning.py:507] global step 423: loss = 364593632.0000 (0.257 sec/step)
INFO:tensorflow:global step 424: loss = 159115248.0000 (0.267 sec/step)
I0422 07:10:09.510233 140486560917376 learning.py:507] global step 424: loss = 159115248.0000 (0.267 sec/step)
INFO:tensorflow:global step 425: loss = 854715264.0000 (0.312 sec/step)
I0422 07:10:09.823988 140486560917376 learning.py:507] global step 425: loss = 854715264.0000 (0.312 sec/step)
INFO:tensorflow:global step 426: loss = 3067453952.0000 (0.296 sec/step)
I0422 07:10:10.121925 140486560917376 learning.py:507] global step 426: loss = 3067453952.0000 (0.296 sec/step)
INFO:tensorflow:global step 427: loss = 3518234624.0000 (0.291 sec/step)
I0422 07:10:10.414811 140486560917376 learning.py:507] global step 427: loss = 3518234624.0000 (0.291 sec/step)
INFO:tensorflow:global step 428: loss = 17210691584.0000 (0.327 sec/step)
I0422 07:10:10.743706 140486560917376 learning.py:507] global step 428: loss = 17210691584.0000 (0.327 sec/step)
INFO:tensorflow:global step 429: loss = 22827235328.0000 (0.298 sec/step)
I0422 07:10:11.043578 140486560917376 learning.py:507] global step 429: loss = 22827235328.0000 (0.298 sec/step)
INFO:tensorflow:global step 430: loss = 99799859200.0000 (0.263 sec/step)
I0422 07:10:11.308007 140486560917376 learning.py:507] global step 430: loss = 99799859200.0000 (0.263 sec/step)
INFO:tensorflow:global step 431: loss = 0.7569 (0.287 sec/step)
I0422 07:10:11.596587 140486560917376 learning.py:507] global step 431: loss = 0.7569 (0.287 sec/step)
INFO:tensorflow:global step 432: loss = 164616962048.0000 (0.323 sec/step)
I0422 07:10:11.922135 140486560917376 learning.py:507] global step 432: loss = 164616962048.0000 (0.323 sec/step)
INFO:tensorflow:global step 433: loss = 598838804480.0000 (0.267 sec/step)
I0422 07:10:12.191077 140486560917376 learning.py:507] global step 433: loss = 598838804480.0000 (0.267 sec/step)
INFO:tensorflow:global step 434: loss = 171039686656.0000 (0.285 sec/step)
I0422 07:10:12.478295 140486560917376 learning.py:507] global step 434: loss = 171039686656.0000 (0.285 sec/step)
INFO:tensorflow:global step 435: loss = 0.1586 (0.294 sec/step)
I0422 07:10:12.774455 140486560917376 learning.py:507] global step 435: loss = 0.1586 (0.294 sec/step)
INFO:tensorflow:global step 436: loss = 11961404227584.0000 (0.264 sec/step)
I0422 07:10:13.040502 140486560917376 learning.py:507] global step 436: loss = 11961404227584.0000 (0.264 sec/step)
INFO:tensorflow:global step 437: loss = 10615577903104.0000 (0.297 sec/step)
I0422 07:10:13.339689 140486560917376 learning.py:507] global step 437: loss = 10615577903104.0000 (0.297 sec/step)
INFO:tensorflow:global step 438: loss = 6634327769088.0000 (0.262 sec/step)
I0422 07:10:13.603152 140486560917376 learning.py:507] global step 438: loss = 6634327769088.0000 (0.262 sec/step)
INFO:tensorflow:global step 439: loss = 0.0360 (0.265 sec/step)
I0422 07:10:13.870558 140486560917376 learning.py:507] global step 439: loss = 0.0360 (0.265 sec/step)
INFO:tensorflow:global step 440: loss = 15168851410944.0000 (0.312 sec/step)
I0422 07:10:14.184696 140486560917376 learning.py:507] global step 440: loss = 15168851410944.0000 (0.312 sec/step)
INFO:tensorflow:global step 441: loss = 0.3786 (0.265 sec/step)
I0422 07:10:14.451148 140486560917376 learning.py:507] global step 441: loss = 0.3786 (0.265 sec/step)
INFO:tensorflow:global step 442: loss = 204700758573056.0000 (0.258 sec/step)
I0422 07:10:14.711573 140486560917376 learning.py:507] global step 442: loss = 204700758573056.0000 (0.258 sec/step)
INFO:tensorflow:global step 443: loss = 0.0319 (0.262 sec/step)
I0422 07:10:14.974974 140486560917376 learning.py:507] global step 443: loss = 0.0319 (0.262 sec/step)
INFO:tensorflow:global step 444: loss = 1549614536720384.0000 (0.273 sec/step)
I0422 07:10:15.250102 140486560917376 learning.py:507] global step 444: loss = 1549614536720384.0000 (0.273 sec/step)
INFO:tensorflow:global step 445: loss = 706502994165760.0000 (0.280 sec/step)
I0422 07:10:15.532128 140486560917376 learning.py:507] global step 445: loss = 706502994165760.0000 (0.280 sec/step)
INFO:tensorflow:global step 446: loss = 1583030992896000.0000 (0.292 sec/step)
I0422 07:10:15.825623 140486560917376 learning.py:507] global step 446: loss = 1583030992896000.0000 (0.292 sec/step)
INFO:tensorflow:global step 447: loss = 11534830458109952.0000 (0.286 sec/step)
I0422 07:10:16.113319 140486560917376 learning.py:507] global step 447: loss = 11534830458109952.0000 (0.286 sec/step)
INFO:tensorflow:global step 448: loss = 28171772826222592.0000 (0.310 sec/step)
I0422 07:10:16.424631 140486560917376 learning.py:507] global step 448: loss = 28171772826222592.0000 (0.310 sec/step)
INFO:tensorflow:global step 449: loss = 33334265533956096.0000 (0.271 sec/step)
I0422 07:10:16.697575 140486560917376 learning.py:507] global step 449: loss = 33334265533956096.0000 (0.271 sec/step)
INFO:tensorflow:global step 450: loss = 0.0328 (0.276 sec/step)
I0422 07:10:16.975500 140486560917376 learning.py:507] global step 450: loss = 0.0328 (0.276 sec/step)
INFO:tensorflow:global step 451: loss = 0.0162 (0.274 sec/step)
I0422 07:10:17.251272 140486560917376 learning.py:507] global step 451: loss = 0.0162 (0.274 sec/step)
INFO:tensorflow:global step 452: loss = 67449892993236992.0000 (0.313 sec/step)
I0422 07:10:17.565719 140486560917376 learning.py:507] global step 452: loss = 67449892993236992.0000 (0.313 sec/step)
INFO:tensorflow:global step 453: loss = 95736882612142080.0000 (0.263 sec/step)
I0422 07:10:17.831138 140486560917376 learning.py:507] global step 453: loss = 95736882612142080.0000 (0.263 sec/step)
INFO:tensorflow:global step 454: loss = 266017148894183424.0000 (0.280 sec/step)
I0422 07:10:18.112651 140486560917376 learning.py:507] global step 454: loss = 266017148894183424.0000 (0.280 sec/step)
INFO:tensorflow:global step 455: loss = 903144486052298752.0000 (0.298 sec/step)
I0422 07:10:18.412338 140486560917376 learning.py:507] global step 455: loss = 903144486052298752.0000 (0.298 sec/step)
INFO:tensorflow:global step 456: loss = 2083570548905869312.0000 (0.286 sec/step)
I0422 07:10:18.700075 140486560917376 learning.py:507] global step 456: loss = 2083570548905869312.0000 (0.286 sec/step)
INFO:tensorflow:global step 457: loss = 845515095910907904.0000 (0.269 sec/step)
I0422 07:10:18.971438 140486560917376 learning.py:507] global step 457: loss = 845515095910907904.0000 (0.269 sec/step)
INFO:tensorflow:global step 458: loss = 1061061568713719808.0000 (0.272 sec/step)
I0422 07:10:19.245239 140486560917376 learning.py:507] global step 458: loss = 1061061568713719808.0000 (0.272 sec/step)

I am using google collab having following libraries
absl-py==0.9.0
alabaster==0.7.12
albumentations==0.1.12
altair==4.1.0
asgiref==3.2.7
astor==0.8.1
astropy==4.0.1.post1
astunparse==1.6.3
atari-py==0.2.6
atomicwrites==1.3.0
attrs==19.3.0
audioread==2.1.8
autograd==1.3
Babel==2.8.0
backcall==0.1.0
backports.tempfile==1.0
backports.weakref==1.0.post1
beautifulsoup4==4.6.3
bleach==3.1.4
blis==0.4.1
bokeh==1.4.0
boto==2.49.0
boto3==1.12.40
botocore==1.15.40
Bottleneck==1.3.2
branca==0.4.0
bs4==0.0.1
bz2file==0.98
CacheControl==0.12.6
cachetools==3.1.1
catalogue==1.0.0
certifi==2020.4.5.1
cffi==1.14.0
chainer==6.5.0
chardet==3.0.4
click==7.1.1
cloudpickle==1.3.0
cmake==3.12.0
cmdstanpy==0.4.0
colorlover==0.3.0
community==1.0.0b1
contextlib2==0.5.5
convertdate==2.2.0
coverage==3.7.1
coveralls==0.5
crcmod==1.7
cufflinks==0.17.3
cupy-cuda101==6.5.0
cvxopt==1.2.5
cvxpy==1.0.31
cycler==0.10.0
cymem==2.0.3
Cython==0.29.16
daft==0.0.4
dask==2.12.0
dataclasses==0.7
datascience==0.10.6
decorator==4.4.2
defusedxml==0.6.0
descartes==1.1.0
dill==0.3.1.1
distributed==1.25.3
Django==3.0.5
dlib==19.18.0
dm-sonnet==1.35
docopt==0.6.2
docutils==0.15.2
dopamine-rl==1.0.5
earthengine-api==0.1.218
easydict==1.9
ecos==2.0.7.post1
editdistance==0.5.3
en-core-web-sm==2.2.5
entrypoints==0.3
ephem==3.7.7.1
et-xmlfile==1.0.1
fa2==0.3.5
fancyimpute==0.4.3
fastai==1.0.60
fastdtw==0.3.4
fastprogress==0.2.3
fastrlock==0.4
fbprophet==0.6
feather-format==0.4.0
featuretools==0.4.1
filelock==3.0.12
firebase-admin==4.0.1
fix-yahoo-finance==0.0.22
Flask==1.1.2
folium==0.8.3
fsspec==0.7.2
future==0.16.0
gast==0.3.3
GDAL==2.2.2
gdown==3.6.4
gensim==3.6.0
geographiclib==1.50
geopy==1.17.0
gevent==1.4.0
gin-config==0.3.0
glob2==0.7
google==2.0.3
google-api-core==1.16.0
google-api-python-client==1.7.12
google-auth==1.7.2
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.4.1
google-cloud-bigquery==1.21.0
google-cloud-core==1.0.3
google-cloud-datastore==1.8.0
google-cloud-firestore==1.6.2
google-cloud-language==1.2.0
google-cloud-storage==1.18.1
google-cloud-translate==1.5.0
google-colab==1.0.0
google-pasta==0.2.0
google-resumable-media==0.4.1
googleapis-common-protos==1.51.0
googledrivedownloader==0.4
graph-nets==1.0.5
graphviz==0.10.1
greenlet==0.4.15
grpcio==1.28.1
gspread==3.0.1
gspread-dataframe==3.0.5
gunicorn==20.0.4
gym==0.17.1
h5py==2.10.0
HeapDict==1.0.1
holidays==0.9.12
html5lib==1.0.1
httpimport==0.5.18
httplib2==0.17.2
httplib2shim==0.0.3
humanize==0.5.1
hyperopt==0.1.2
ideep4py==2.0.0.post3
idna==2.8
image==1.5.30
imageio==2.4.1
imagesize==1.2.0
imbalanced-learn==0.4.3
imblearn==0.0
imgaug==0.2.9
importlib-metadata==1.6.0
imutils==0.5.3
inflect==2.1.0
intel-openmp==2020.0.133
intervaltree==2.1.0
ipykernel==4.10.1
ipython==5.5.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.5.1
itsdangerous==1.1.0
jax==0.1.62
jaxlib==0.1.42
jdcal==1.4.1
jedi==0.17.0
jieba==0.42.1
Jinja2==2.11.2
jmespath==0.9.5
joblib==0.14.1
jpeg4py==0.1.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==5.2.0
jupyter-core==4.6.3
kaggle==1.5.6
kapre==0.1.3.1
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keras-vis==0.4.1
kfac==0.2.0
kiwisolver==1.2.0
knnimpute==0.1.0
librosa==0.6.3
lightgbm==2.2.3
llvmlite==0.31.0
lmdb==0.98
lucid==0.3.8
LunarCalendar==0.0.9
lxml==4.2.6
magenta==0.3.19
Markdown==3.2.1
MarkupSafe==1.1.1
matplotlib==3.2.1
matplotlib-venn==0.11.5
mesh-tensorflow==0.1.12
mido==1.2.6
mir-eval==0.5
missingno==0.4.2
mistune==0.8.4
mizani==0.6.0
mkl==2019.0
mlxtend==0.14.0
more-itertools==8.2.0
moviepy==0.2.3.5
mpi4py==3.0.3
mpmath==1.1.0
msgpack==1.0.0
multiprocess==0.70.9
multitasking==0.0.9
murmurhash==1.0.2
music21==5.5.0
natsort==5.5.0
nbconvert==5.6.1
nbformat==5.0.5
networkx==2.4
nibabel==3.0.2
nltk==3.2.5
notebook==5.2.2
np-utils==0.5.12.1
numba==0.48.0
numexpr==2.7.1
numpy==1.18.2
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.0
object-detection==0.1
okgrade==0.4.3
opencv-contrib-python==4.1.2.30
opencv-python==4.1.2.30
openpyxl==2.5.9
opt-einsum==3.2.1
osqp==0.6.1
packaging==20.3
palettable==3.3.0
pandas==1.0.3
pandas-datareader==0.8.1
pandas-gbq==0.11.0
pandas-profiling==1.4.1
pandocfilters==1.4.2
parso==0.7.0
pathlib==1.0.1
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.0.0
pip-tools==4.5.1
plac==1.1.3
plotly==4.4.1
plotnine==0.6.0
pluggy==0.7.1
portpicker==1.3.1
prefetch-generator==1.0.1
preshed==3.0.2
pretty-midi==0.2.8
prettytable==0.7.2
progressbar2==3.38.0
prometheus-client==0.7.1
promise==2.3
prompt-toolkit==1.0.18
protobuf==3.10.0
psutil==5.4.8
psycopg2==2.7.6.1
ptvsd==5.0.0a12
ptyprocess==0.6.0
py==1.8.1
pyarrow==0.14.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.0
pycparser==2.20
pydata-google-auth==0.3.0
pydot==1.3.0
pydot-ng==2.0.0
pydotplus==2.0.2
PyDrive==1.3.1
pyemd==0.5.1
pyglet==1.5.0
Pygments==2.1.3
pygobject==3.26.1
pymc3==3.7
PyMeeus==0.3.7
pymongo==3.10.1
pymystem3==0.2.0
PyOpenGL==3.1.5
pyparsing==2.4.7
pypng==0.0.20
pyrsistent==0.16.0
pysndfile==1.3.8
PySocks==1.7.1
pystan==2.19.1.1
pytest==3.6.4
python-apt==1.6.5+ubuntu0.2
python-chess==0.23.11
python-dateutil==2.8.1
python-louvain==0.14
python-rtmidi==1.4.0
python-slugify==4.0.0
python-utils==2.4.0
pytz==2018.9
PyWavelets==1.1.1
PyYAML==3.13
pyzmq==19.0.0
qtconsole==4.7.3
QtPy==1.9.0
regex==2019.12.20
requests==2.21.0
requests-oauthlib==1.3.0
resampy==0.2.2
retrying==1.3.3
rpy2==3.2.7
rsa==4.0
s3fs==0.4.2
s3transfer==0.3.3
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.4.1
screen-resolution-extra==0.0.0
scs==2.1.2
seaborn==0.10.0
semantic-version==2.8.4
Send2Trash==1.5.0
setuptools-git==1.2
Shapely==1.7.0
simplegeneric==0.8.1
six==1.12.0
sklearn==0.0
sklearn-pandas==1.8.0
smart-open==1.11.1
snowballstemmer==2.0.0
sortedcontainers==2.1.0
spacy==2.2.4
Sphinx==1.8.5
sphinxcontrib-websupport==1.2.1
SQLAlchemy==1.3.16
sqlparse==0.3.1
srsly==1.0.2
stable-baselines==2.2.1
statsmodels==0.10.2
sympy==1.1.1
tables==3.4.4
tabulate==0.8.7
tbb==2020.0.133
tblib==1.6.0
tensor2tensor==1.14.1
tensorboard==1.15.0
tensorboard-plugin-wit==1.6.0.post3
tensorboardcolab==0.0.22
tensorflow==1.15.2
tensorflow-addons==0.8.3
tensorflow-datasets==2.1.0
tensorflow-estimator==1.15.1
tensorflow-gan==2.0.0
tensorflow-gcs-config==2.1.8
tensorflow-hub==0.8.0
tensorflow-metadata==0.21.2
tensorflow-privacy==0.2.2
tensorflow-probability==0.7.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
text-unidecode==1.3
textblob==0.15.3
textgenrnn==1.4.1
tflearn==0.3.2
Theano==1.0.4
thinc==7.4.0
toolz==0.10.0
torch==1.4.0
torchsummary==1.5.1
torchtext==0.3.1
torchvision==0.5.0
tornado==4.5.3
tqdm==4.38.0
traitlets==4.3.3
tweepy==3.6.0
typeguard==2.7.1
typing==3.6.6
typing-extensions==3.6.6
tzlocal==1.5.1
umap-learn==0.4.1
uritemplate==3.0.1
urllib3==1.24.3
vega-datasets==0.8.0
wasabi==0.6.0
wcwidth==0.1.9
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wordcloud==1.5.0
wrapt==1.12.1
xarray==0.15.1
xgboost==0.90
xkit==0.0.0
xlrd==1.1.0
xlwt==1.3.0
yellowbrick==0.9.1
zict==2.0.0
zipp==3.1.0
zmq==0.0.0

rggs · 2020-04-27T19:14:41Z

I am having a similar issue: one class, and around step 350 the loss explodes, despite my label map etc. looking fine.

rggs · 2020-04-27T19:33:19Z

Ok I feel like an idiot but I'm putting this here: there WAS an issue with my label map. In my label map, the class name was capitalized, whereas in the .record files it was all lower case. Changing the name in the label map file to all lower case (so that it was EXACTLY as it appeared in the .record and .csv files) seems to have fixed the issue.

Boltuzamaki · 2020-04-29T07:09:14Z

@rsbball11 Thanks for answering. I don't know about whether it was capitalized or not. But I made a new environment in my PC and done everything from scratch and now it is training fine.

One thing I noticed that even if I have only one class. In .cfg file, I have written 2 (I was just experimenting nonsense). The training was going fine and the loss was decreasing. I waited for around 1000 steps but still, it was decreasing. Idk why?

My problem is solved hence I am closing my issue :)

Boltuzamaki added the type:support label Apr 22, 2020

Boltuzamaki changed the title ~~Exploding loss after few iterations of training~~ Exploding loss after few iterations of training of Faster RCNN ResNet50 Apr 24, 2020

jaeyounkim assigned pkulzc Apr 26, 2020

jaeyounkim added the models:research models that come under research directory label Apr 26, 2020

rggs mentioned this issue Apr 27, 2020

Explosion in loss value EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10#481

Closed

Boltuzamaki closed this as completed Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploding loss after few iterations of training of Faster RCNN ResNet50 #8423

Exploding loss after few iterations of training of Faster RCNN ResNet50 #8423

Boltuzamaki commented Apr 22, 2020 •

edited

Loading

rggs commented Apr 27, 2020

rggs commented Apr 27, 2020

Boltuzamaki commented Apr 29, 2020 •

edited

Loading

Exploding loss after few iterations of training of Faster RCNN ResNet50 #8423

Exploding loss after few iterations of training of Faster RCNN ResNet50 #8423

Comments

Boltuzamaki commented Apr 22, 2020 • edited Loading

rggs commented Apr 27, 2020

rggs commented Apr 27, 2020

Boltuzamaki commented Apr 29, 2020 • edited Loading

Boltuzamaki commented Apr 22, 2020 •

edited

Loading

Boltuzamaki commented Apr 29, 2020 •

edited

Loading