Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow keras callback using tensorboard, "ProfilerNotRunningError: Cannot stop profiling. No profiler is running." #2279

Closed
duancaohui opened this issue May 26, 2019 · 5 comments

Comments

@duancaohui
Copy link

@duancaohui duancaohui commented May 26, 2019

Environment information (required)

System: Windows 10, RAM 64 GB
Environment: Python 3.6, TensorFlow-gpu==2.0.0-alpha0
GPU: GeForce RTX 2080 Ti, CUDA 10.0, cudnn-10.0-windows10-x64-v7.4.2.24

Issue description

When I try the Guide in https://tensorflow.google.cn/tensorboard/r2/graphs, a mistake is often encountered, such as a simple example:

@tf.function
def my_func(x, y):
    return tf.nn.relu(tf.matmul(x, y))

stamp = datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = os.path.join(PATH,'func',stamp)
writer = tf.summary.create_file_writer(logdir)

x = tf.random.uniform((3, 3))
y = tf.random.uniform((3, 3))

tf.summary.trace_on(graph=True, profiler=True)

z = my_func(x, y)

with writer.as_default():
    tf.summary.trace_export(
        name="my_func_trace",
        step=0,
        profiler_outdir=logdir)

Error description

ProfilerNotRunningError                   Traceback (most recent call last)
<ipython-input-11-c67128fece27> in <module>
     19         name="my_func_trace",
     20         step=0,
---> 21         profiler_outdir=logdir)
     22 

c:\users\admin\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py in trace_export(name, step, profiler_outdir)
   1142 
   1143   if profiler:
-> 1144     _profiler.save(profiler_outdir, _profiler.stop())
   1145 
   1146   trace_off()

c:\users\admin\appdata\local\programs\python\python36\lib\site-packages\tensorflow\python\eager\profiler.py in stop()
     97     if _profiler is None:
     98       raise ProfilerNotRunningError(
---> 99           'Cannot stop profiling. No profiler is running.')
    100     with c_api_util.tf_buffer() as buffer_:
    101       pywrap_tensorflow.TFE_ProfilerSerializeToString(

ProfilerNotRunningError: Cannot stop profiling. No profiler is running.

I don't know what is issue mean, and how to fix it.

@stephanwlee

This comment has been minimized.

Copy link
Collaborator

@stephanwlee stephanwlee commented May 29, 2019

Hi @duancaohui, I was unable to reproduce it both on Colab (with GPU) and my local environment (Cuda 10.0, Pascal, tensorflow-gpu@pre and tf-gpu-nightly-2.0-preview@latest).

Would you be able to share, perhaps, a Colab that reproduces this issue? Thanks!

@Jakkiejinn

This comment has been minimized.

Copy link

@Jakkiejinn Jakkiejinn commented Jul 17, 2019

hi guys,
I think that is because profiler_outdir or log_dir issue. As below codes in tensorflow:
plugin_dir = os.path.join(
logdir, 'plugins', 'profile',
datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
gfile.MakeDirs(plugin_dir)

if you name your dir with './log/fit/' , above codes will pop up error, if you write as '.\log\fig\' will not pop up error.

@wchargin

This comment has been minimized.

Copy link
Member

@wchargin wchargin commented Jul 17, 2019

That sounds like you’re running into tensorflow/tensorflow#26021, then:
tensorflow/tensorflow#26021

TL;DR: Due to a current bug in TensorFlow, on Windows, if your path
contains any forward slashes then it should not contain any backslashes.
Thus, os.path.join(logdir, "foo") will hit this bug if logdir
contains any forward slashes.

In the meantime until this bug is fixed, please ensure that your paths
use consistent delimiters.

@wchargin wchargin closed this Jul 17, 2019
TensorBoard primary automation moved this from Needs triage to Closed Jul 17, 2019
@grewe

This comment has been minimized.

Copy link

@grewe grewe commented Oct 21, 2019

I am still experiencing this error and can not resolve
#BASE_DATA_PATH = 'C:/Grewe/Classes/CS663/Mat/LSTM/data'
BASE_DATA_PATH = 'C:\Grewe\Classes\CS663\Mat\LSTM\data'
mylog_dir = os.path.join( BASE_DATA_PATH, "train_log")
print("Mylog directory = " + mylog_dir)
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=mylog_dir, update_freq=1000)
print(os.path.join(mylog_dir, 'train'))
tensorboard_callback = tf.keras.callbacks.TensorBoard(os.path.join('tmp'), update_freq=1000)
model.fit(train_dataset, epochs=17, callbacks=[tensorboard_callback], validation_data=valid_dataset)

get output

Mylog directory = C:\Grewe\Classes\CS663\Mat\LSTM\data\train_log
C:\Grewe\Classes\CS663\Mat\LSTM\data\train_log\train
Epoch 1/17
WARNING:tensorflow:Trace already enabled
1/Unknown - 2s 2s/step - loss: 4.7203 - accuracy: 0.0625 - top_k_categorical_accuracy: 0.062 - 2s 2s/step - loss: 4.7203 - accuracy: 0.0625 - top_k_categorical_accuracy: 0.0625

UnknownError Traceback (most recent call last)
~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\keras\engine\training_v2.py in on_batch(self, step, mode, size)
695 try:
--> 696 yield batch_logs
697 finally:

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\keras\engine\training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs)
122 try:
--> 123 batch_outs = execution_function(iterator)
124 except (StopIteration, errors.OutOfRangeError):

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py in execution_function(input_fn)
85 return nest.map_structure(_non_none_constant_value,
---> 86 distributed_function(input_fn))
87

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\eager\def_function.py in call(self, *args, **kwds)
456 tracing_count = self._get_tracing_count()
--> 457 result = self._call(*args, **kwds)
458 if tracing_count == self._get_tracing_count():

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\eager\def_function.py in _call(self, *args, **kwds)
486 # defunned version which is guaranteed to never create variables.
--> 487 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
488 elif self._stateful_fn is not None:

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\eager\function.py in call(self, *args, **kwargs)
1822 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 1823 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
1824

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\eager\function.py in _filtered_call(self, args, kwargs)
1140 resource_variable_ops.BaseResourceVariable))),
-> 1141 self.captured_inputs)
1142

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1223 flat_outputs = forward_function.call(
-> 1224 ctx, args, cancellation_manager=cancellation_manager)
1225 else:

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\eager\function.py in call(self, ctx, args, cancellation_manager)
510 attrs=("executor_type", executor_type, "config_proto", config),
--> 511 ctx=ctx)
512 else:

~\AppData\Roaming\Python\Python36\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
66 message = e.message
---> 67 six.raise_from(core._status_to_exception(e.code, message), None)
68 except TypeError as e:

c:\python36\lib\site-packages\six.py in raise_from(value, from_value)

UnknownError: FileNotFoundError: [Errno 2] No such file or directory: 'C:/Grewe/Classes/CS663/Mat/LSTM/data/UCF-101\CricketBowling/v_CricketBowling_g25_c06.npy'
Traceback (most recent call last):

@jdc17d

This comment has been minimized.

Copy link

@jdc17d jdc17d commented Oct 26, 2019

I was having the same issue. I am using anaconda and in jupyter notebook when I tried to set up the tensorboard callback I was getting the same error. My import tensorboard command was running without complaint but I don't think it was really there.

I did conda install tensorboard and I've been able to run without issues. I also just specify my dir as 'logs' so I don't have to thing about the forward/backslash as I am on a windows machine.

If you aren't using conda, maybe just try pip install tensorboard and see if that fixes your issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.