Skip to content

"Failed to call mkldnn_sgemm. Error code: 3" Error when training a simple CNN model #33608

@ophiry

Description

@ophiry

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):binary
  • TensorFlow version (use command below):2.1.0-dev20191021, 2.0.0
  • Python version: 3.6.8
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:
    Trying to build a model that uses CNN over text.
    minimal code to reproduce the error:
import tensorflow
from tensorflow.keras.layers import Dense, LSTM, Bidirectional, Embedding, Lambda, Conv1D, GlobalMaxPool1D, concatenate
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras import Input
import string

CHARS = string.punctuation + string.digits + string.ascii_letters + string.whitespace
CHAR_MAP = tensorflow.lookup.StaticHashTable(tensorflow.lookup.KeyValueTensorInitializer(list(CHARS), list(range(1, len(CHARS)+1)), 'string', 'int32'), 0)


def string_to_vec(s):
    return tensorflow.ragged.map_flat_values (CHAR_MAP.lookup, tensorflow.strings.unicode_split(s, 'UTF-8')).to_tensor()


input_layer = Input(shape=tuple(), ragged=False, dtype='string')
feature_layer = Lambda(string_to_vec)(input_layer)
embedding_layer = Embedding(input_dim=len(CHARS)+1, output_dim=10, name='char_embeddings')(feature_layer)
cnn_layer = concatenate([Conv1D(10, kernel_size, activation='relu', padding='same')(embedding_layer) for kernel_size in range(5)])
max_pool = GlobalMaxPool1D()(cnn_layer)
dense_layer = Dense(1, activation='relu')(max_pool)
model = Model(inputs=[input_layer], outputs=[dense_layer])
model.compile(optimizer='adam', loss='binary_crossentropy')

print(tensorflow.__version__)

x = ['hello world', 'this is a test']
y = [0] * len(x)
print(model.predict(x))
model.fit(x,y)

Output:

2.1.0-dev20191021
[[0.]
 [0.]]
Train on 2 samples
2019-10-22 15:29:13.322033: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Failed to call mkldnn_sgemm. Error code: 3
	 [[{{node Conv2DBackpropInput}}]]
2/2 [==============================] - 1s 285ms/sample
Traceback (most recent call last):
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-64494552324e>", line 4, in <module>
    runfile('/Users/ophir/dev/ophir/cnn_error.py', wdir='/Users/ophir/dev/ophir')
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/ophir/dev/ophir/cnn_error.py", line 29, in <module>
    model.fit(x,y)
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 777, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 337, in fit
    total_epochs=epochs)
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 127, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2339, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1589, in _filtered_call
    self.captured_inputs)
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1670, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 521, in call
    ctx=ctx)
  File "/usr/local/miniconda3/envs/zr/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError:  Failed to call mkldnn_sgemm. Error code: 3
	 [[node Conv2DBackpropInput (defined at /Users/ophir/dev/ophir/cnn_error.py:29) ]] [Op:__inference_distributed_function_1725]
Function call stack:
distributed_function

Metadata

Metadata

Labels

TF 2.1for tracking issues in 2.1 releasecomp:kerasKeras related issuescomp:mklMKL related issuesstat:awaiting responseStatus - Awaiting response from authortype:bugBug

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions