New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about transfer learning and training loss = nan #185
Comments
I encountered this issue as well. It was resolved by providing more training data. You can use some data augmentation to increase your dataset size. |
Hi! Finally, the fine-tune mode of training get 80 classes, but if you provide just one, there will be no problem. If this cases are not yours, reply this issue in order to help you. |
Thanks for the answers both! I used 1000 images for transfer learning. Do I still need to increase the number of images? @chenminni , how many images did you use to resolve the problem? @PieroCV sorry, I don't get your last suggestion "Finally, the fine-tune mode of training get 80 classes, but if you provide just one, there will be no problem." I am training for 1 class. What should I do? Thanks! |
Hi, @jackyvr.
For 5 point, use this: filenames = ["<filename>"] #Replace here
raw_dataset = tf.data.TFRecordDataset(filenames)
for raw_record in raw_dataset.take(1):
example = tf.train.Example()
example.ParseFromString(raw_record.numpy())
print(example) I hope you could answer as soon as possible in order to help you. |
Thanks a lot, @PieroCV! 1. Are you using VoTT,LabelImage or something else to generate your tfrecord files? 2. Did you modify the repo (I saw Binnary Crossentropy modifications, but it is not necessary)? 3. What is the content of your .names file? 4. Did you pass the parameters correctly when training? 5. Could you verify the content of one tfrecord file? |
The first thing that I can see is the Upper case on tfrecord file. Change the .names file to "Glasses". I'm kind of busy right now, but i will check the other answers later. |
Thanks @PieroCV . Will do. |
@PieroCV , you are amazing! With "glasses" changed to "Glasses" in .names, I am getting non-nan loss now! Why lower case does not work? |
Oh, I see. My folder name has upper case "Glasses". Thanks! |
@jackyvr no problem! |
Thank you @PieroCV . Although now I can train with my own data, the generated model does not pick up any objects that I want - even when I feed in a training image. Probably I need to tune the hyper parameters? What loss value is a good value to stop? Sorry, I am new to dl. |
Thanks for the code. I am doing transfer learning with the yolov3 tf2 model using my own dataset (only one custom class - outside coco). Does the transfer learning function work in my case?
When I put in everything and trained a new model, I got a loss = nan. Below is the log. Could you point me to the problem? Thanks!
2020-02-21 09:01:43.873761: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
[[loss/yolo_output_0_loss/Shape_1/_12]]
2020-02-21 09:01:51.458659: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
2020-02-21 09:01:51.459038: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
[[loss/yolo_output_1_loss/Shape_1/_14]]
D:\software\conda_envs\tf2\lib\site-packages\tensorflow_core\python\keras\callbacks.py:1806: RuntimeWarning: invalid value encountered in less
self.monitor_op = lambda a, b: np.less(a, b - self.min_delta)
D:\software\conda_envs\tf2\lib\site-packages\tensorflow_core\python\keras\callbacks.py:1225: RuntimeWarning: invalid value encountered in less
if self.monitor_op(current - self.min_delta, self.best):
Epoch 00001: saving model to checkpoints/yolov3_train_1.tf
57/57 [==============================] - 64s 1s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan - val_yolo_output_2_loss: nan
Epoch 2/10
1/7 [===>..........................] - ETA: 30s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: na2/7 [=======>......................] - ETA: 13s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: na3/7 [===========>..................] - ETA: 8s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan4/7 [================>.............] - ETA: 4s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan5/7 [====================>.........] - ETA: 2s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan6/7 [========================>.....] - ETA: 1s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2020-02-21 09:02:25.001690: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
2020-02-21 09:02:25.001806: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
[[loss/yolo_output_0_loss/Shape_1/_12]]
2020-02-21 09:02:29.828009: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
2020-02-21 09:02:29.828422: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range: End of sequence
[[{{node IteratorGetNext}}]]
[[loss/yolo_output_1_loss/Shape_1/_14]]
Epoch 00002: saving model to checkpoints/yolov3_train_2.tf
57/57 [==============================] - 38s 673ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan - val_yolo_output_2_loss: nan
Epoch 3/10
1/7 [===>..........................] - ETA: 30s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: na2/7 [=======>......................] - ETA: 14s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: na3/7 [===========>..................] - ETA: 8s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan4/7 [================>.............] - ETA: 5s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan5/7 [====================>.........] - ETA: 2s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan6/7 [========================>.....] - ETA: 1s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan
The text was updated successfully, but these errors were encountered: