New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BaseCollectiveExecuter::StartAbort Out of range: End of Sequence #104
Comments
I think it's logical to add the |
Hi @AnaRhisT94, So while loading TFRecords, using dataset.map(....).repeat() will resolve this issue? Please guide thoroughly as I am new to Tensorflow and wanted to investigate in depth details. Thanks |
Best thing is to try and see. Should work though :) |
Another option is switching from "fit" to "eager_tf". Ah just saw you said you did that first. |
@muhammad-maaz-confiz Did adding the .repeat() resolve this for you? and have you been able to train a modal that did detects something (even inaccurately)? I am having this exact same issue and have yet to train a functioning modal. |
Hi @NiklasWilson , adding .repeat() and passing steps_per_epoch in model.fit() solves the issue for me. Because of the busy routine I did not able to test the model. will let you know about detection once I test the model. |
@muhammad-maaz-confiz Awesome! I will add that to my code tonight and test myself too. While eagerly awaiting the results of your tests :) |
Side note this
can happen with or without these errors. It is related to this line of code. You can modify the patience value to decrease the premature stopping. Its purpose is to make sure that your training quality does not decrease. |
Thanks @NiklasWilson and @AnaRhisT94 , While repeating the dataset and adding steps_per_epoch in model.fit() seems to resolve the issue for initial epochs and the issue reappeared after a couple of epochs (3 epochs for me). What could be gone wrong? Also note that the training automatically stops after 9 epochs and no message regarding early stopping. Thanks |
@muhammad-maaz-confiz I didnt put it here, but it makes sense that it would also need to go here (adding that now actually)
Notice in your picture, the very last line says "Killed" |
By repeating the training dataset and specifying steps_per_epoch and validation_steps in model.fit(), I am able to get rid of this error/warning. |
I believe this error doesn't actually affect training, it's likely a bug from tensorflow |
Hi,
While training Tiny Yolo on VOC dataset, in the end of each epoch I am getting the error "BaseCollectiveExecuter::StartAbort Out of range: End of Sequence". Also training early stops after 4 epochs. The terminal outputs are attached. Note that I am using ubuntu 18.04 with TF2.0.
Also while using eager mode training, all went right.
The text was updated successfully, but these errors were encountered: