"UnavailableError: OS Error" when running training on Google Cloud with TensorFlow 1.8 #4314
Comments
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. |
I have the same issue when i train the faster_rcnn_resnet101 model with ml-engine It's my trial account i have a tesla k80 here's my cloud.yml config file:
|
Did anyone solve this? If yes, mind sharing how? |
@gamcoh did you solve it? |
@prameshbajra nop I'm waiting too |
@devAdvc have you tried reducing the number of |
Anyone solved this issue? |
@moussas1 I'm stuck too. Let me know if you happen to solve it |
As i understand this is bug Tensorflow developers know about but don't fix it!(i don't undestand why) P S learning.py i could find thanks to tensorflow/tensorflow#15793 (comment) Like solution u can reduce runtime version and hope that will be worked) After few hours i find solution and test it.
|
This should be gone if you sync to HEAD and use tf 1.12. |
Ok I will update to 1.12, but what do you mean sync to HEAD? Thank you! |
I mean have you synced your fork? |
@pkulzc, is synced fork the same as |
@prameshbajra Still nothing for me. :/ #6220 I also posted a full log of things in the referenced issue. @pkulzc Any ideas? Thanks |
For future viewers, please switch to another thread to track the progress of this issue since this one is duplicated. |
Hi There, |
My model trains fine in 20 minutes with TensorFlow 1.2. I changed my cloud.yml file's
runtimeVersion
to TensorFlow 1.8, my setup.py file'sREQUIRED PACKAGES
to require'Tensorflow>=1.8.0'
, and my submit training command'sruntime-version
to 1.8.Now, training took about 80 minutes, before crashing at 4940 steps (60 short of my 5000 steps) I'd set in my training, with this error:
The text was updated successfully, but these errors were encountered: