-
Notifications
You must be signed in to change notification settings - Fork 45.5k
Closed
Description
Please go to Stack Overflow for help and support:
I tried to run models/tutorials/image/cifar10/train.py
I let it run about a day on my pc :
(windows10 , tensorflow-gpu 1.2 ,) after
2017-07-20 13:58:20.441224: step 941580, loss = 0.14 (3076.2 examples/sec; 0.042 sec/batch)
`I got this error :
2017-07-20 13:58:20.791379: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[2304,384]
Traceback (most recent call last):
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1139, in _do_call
return fn(*args)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1121, in _run_fn
status, run_metadata)
File "D:\Anaconda3\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2304,384]
[[Node: ExponentialMovingAverage/AssignMovingAvg_4/sub_1 = Sub[T=DT_FLOAT, _class=["loc:@local3/weights"], _device="/job:localhost/replica:0/task:0/cpu:0"](local3/weights/ExponentialMovingAverage/read, local3/weights/read)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Hoda/Documents/GitHub/models/tutorials/image/cifar10/cifar10_train.py", line 127, in <module>
tf.app.run()
File "D:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "C:/Users/Hoda/Documents/GitHub/models/tutorials/image/cifar10/cifar10_train.py", line 123, in main
train()
File "C:/Users/Hoda/Documents/GitHub/models/tutorials/image/cifar10/cifar10_train.py", line 115, in train
mon_sess.run(train_op)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 952, in run
run_metadata=run_metadata)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
run_metadata_ptr)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2304,384]
[[Node: ExponentialMovingAverage/AssignMovingAvg_4/sub_1 = Sub[T=DT_FLOAT, _class=["loc:@local3/weights"], _device="/job:localhost/replica:0/task:0/cpu:0"](local3/weights/ExponentialMovingAverage/read, local3/weights/read)]]
Caused by op 'ExponentialMovingAverage/AssignMovingAvg_4/sub_1', defined at:
File "C:/Users/Hoda/Documents/GitHub/models/tutorials/image/cifar10/cifar10_train.py", line 127, in <module>
tf.app.run()
File "D:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "C:/Users/Hoda/Documents/GitHub/models/tutorials/image/cifar10/cifar10_train.py", line 123, in main
train()
File "C:/Users/Hoda/Documents/GitHub/models/tutorials/image/cifar10/cifar10_train.py", line 79, in train
train_op = cifar10.train(loss, global_step)
File "C:\Users\Hoda\Documents\GitHub\models\tutorials\image\cifar10\cifar10.py", line 373, in train
variables_averages_op = variable_averages.apply(tf.trainable_variables())
File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\moving_averages.py", line 392, in apply
self._averages[var], var, decay, zero_debias=zero_debias))
File "D:\Anaconda3\lib\site-packages\tensorflow\python\training\moving_averages.py", line 72, in assign_moving_average
update_delta = (variable - value) * decay
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\variables.py", line 694, in _run_op
return getattr(ops.Tensor, operator)(a._AsTensor(), *args)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 838, in binary_op_wrapper
return func(x, y, name=name)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 2501, in _sub
result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
op_def=op_def)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2510, in create_op
original_op=self._default_original_op, op_def=op_def)
File "D:\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1273, in __init__
self._traceback = _extract_stack()
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2304,384]
[[Node: ExponentialMovingAverage/AssignMovingAvg_4/sub_1 = Sub[T=DT_FLOAT, _class=["loc:@local3/weights"], _device="/job:localhost/replica:0/task:0/cpu:0"](local3/weights/ExponentialMovingAverage/read, local3/weights/read)]]
`how can I fix it? and do I have to run it again from or the previous result is saved?
ibe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.
Metadata
Metadata
Assignees
Labels
No labels