Training: invalid value encountered in less + nan's #26

ghost · 2019-06-20T00:35:47Z

python train.py --batch_size 8 --dataset=C:\...\platt.record --val_dataset=C:\...\platt_val.record --epochs 10 --mode eager_fit --transfer fine_tune --weights ./checkpoints/yolov3-tiny.tf --tiny

results in this output:

Epoch 1/10
2019-06-20 02:13:00.680170: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profile Session started.
2019-06-20 02:13:00.685371: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library cupti64_100.dll
      1/Unknown - 4s 4s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nanW0620 02:13:01.387073  9828 callbacks.py:236] Method (on_train_batch_end) is slow compared to the batch update (0.256449). Check your callbacks.
      7/Unknown - 6s 807ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nanC:\...\Anaconda3\envs\yolov3-tf2\lib\site-packages\tensorflow\python\keras\callbacks.py:1467: RuntimeWarning: invalid value encountered in less
  self.monitor_op = lambda a, b: np.less(a, b - self.min_delta)
C:\...\Anaconda3\envs\yolov3-tf2\lib\site-packages\tensorflow\python\keras\callbacks.py:979: RuntimeWarning: invalid value encountered in less
  if self.monitor_op(current - self.min_delta, self.best):

Epoch 00001: saving model to checkpoints/yolov3_train_1.tf
7/7 [==============================] - 7s 1s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 2/10
6/7 [========================>.....] - ETA: 0s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan
Epoch 00002: saving model to checkpoints/yolov3_train_2.tf
7/7 [==============================] - 3s 394ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 3/10
6/7 [========================>.....] - ETA: 0s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan
Epoch 00003: saving model to checkpoints/yolov3_train_3.tf
7/7 [==============================] - 3s 396ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 00003: early stopping

What might be the cause for that? Also there are other open issues regarding training and I'm wondering if anyone was successfull.

The text was updated successfully, but these errors were encountered:

dgarkov · 2019-06-30T12:10:03Z

I can confirm the issue. I have tested it on two different setups - one with tiny and one with notiny. Although I didn't get nans right from the start, both ended up with the above outcome.

zzh8829 · 2019-08-02T17:45:35Z

can you paste some of your sample data here, its hard to tell without training data

ghost · 2019-08-19T12:07:55Z

can you paste some of your sample data here, its hard to tell without training data

Unfortunately it was a long time ago and I switched. I'm not using this project anymore. Thank you for reply :)

samratkokula · 2019-08-28T19:17:37Z

I am seeing a similar error. My sample data looks like below, this is before converting it to tfrecord

img1.jpeg 0.2901965,0.492121,0.4980395,0.576363,0 0.500981,0.495151,0.701961,0.573333,0 0.6464709999999999,0.5275755,0.696079,0.5809085,1
img2.jpeg 0.259094,0.4052765,0.416548,0.49675549999999996,0 0.417618,0.403979,0.5686519999999999,0.500649,0

Can you please help

AnaRhisT94 · 2019-10-02T14:28:26Z

Hi @samratkokula ,
have you managed to fix the issue?

IlkayW · 2020-02-06T15:18:38Z

I'm encountering the same issue.
It seems to occur randomly ...
Is there a quick way to fix it?

julio-ruepp · 2021-01-06T07:43:44Z

I had the same failure, the problem was that there were some nan's in my data ;)

ghost closed this as completed Aug 19, 2019

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training: invalid value encountered in less + nan's #26

Training: invalid value encountered in less + nan's #26

ghost commented Jun 20, 2019

dgarkov commented Jun 30, 2019 •

edited

zzh8829 commented Aug 2, 2019

ghost commented Aug 19, 2019 •

edited by ghost

samratkokula commented Aug 28, 2019

AnaRhisT94 commented Oct 2, 2019

IlkayW commented Feb 6, 2020

julio-ruepp commented Jan 6, 2021

Training: invalid value encountered in less + nan's #26

Training: invalid value encountered in less + nan's #26

Comments

ghost commented Jun 20, 2019

dgarkov commented Jun 30, 2019 • edited

zzh8829 commented Aug 2, 2019

ghost commented Aug 19, 2019 • edited by ghost

samratkokula commented Aug 28, 2019

AnaRhisT94 commented Oct 2, 2019

IlkayW commented Feb 6, 2020

julio-ruepp commented Jan 6, 2021

dgarkov commented Jun 30, 2019 •

edited

ghost commented Aug 19, 2019 •

edited by ghost