Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typo in getting started guide #9

Closed
maccam912 opened this issue Nov 9, 2015 · 2 comments
Closed

Typo in getting started guide #9

maccam912 opened this issue Nov 9, 2015 · 2 comments

Comments

@maccam912
Copy link

http://tensorflow.org/get_started/basic_usage.md#variables

In the code example, 1 is being added to "counter", which is named "var" initially, but referred to as "state" later. My guess is that these are meant to refer to the same thing, and a typo in one of the variable names made it into the example code.

@mdda
Copy link

mdda commented Nov 9, 2015

Apparently the fix has already been put into their pipeline.

@vrv
Copy link

vrv commented Nov 9, 2015

I just pushed the fix in db0b5da

Thanks!

@vrv vrv closed this as completed Nov 9, 2015
ilblackdragon added a commit to ilblackdragon/tensorflow that referenced this issue Mar 9, 2016
…d vocabulary and vocabulary processor to transform documents.
ilblackdragon added a commit to ilblackdragon/tensorflow that referenced this issue Mar 9, 2016
ilblackdragon added a commit to ilblackdragon/tensorflow that referenced this issue Mar 9, 2016
Continue working on neural translation model tensorflow#9. Added language model example tensorflow#64
benoitsteiner added a commit to benoitsteiner/tensorflow that referenced this issue Nov 15, 2016
benoitsteiner pushed a commit to benoitsteiner/tensorflow that referenced this issue Jan 27, 2017
* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7
benoitsteiner pushed a commit to benoitsteiner/tensorflow that referenced this issue Jan 27, 2017
* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7
benoitsteiner pushed a commit to benoitsteiner/tensorflow that referenced this issue Feb 1, 2017
* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7
benoitsteiner pushed a commit to benoitsteiner/tensorflow that referenced this issue Feb 1, 2017
* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7
benoitsteiner pushed a commit to benoitsteiner/tensorflow that referenced this issue Mar 30, 2017
* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold.

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* Fixed libxsmm_config_arguments in libxsmm.BUILD (tensorflow#7)

* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold.

* Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator().

For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size).

* Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed.

* Account for removed non-check API.

* Include libxsmm_malloc.h now that libxsmm_tf_allocator is used.

* Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer.

* Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*.

* Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer.

* Include missing header file (libxsmm_malloc.h).

* Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*.

* Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure).

* Updated to new libxsmm_dnn_link_* functions.

* Updated to use new libxsmm_dnn_bind_* functions.

* Fixed calling libxsmm_dnn_transpose_filter.

* Updates in preparation of LIBXSMM 1.7 (tensorflow#8)

* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold.

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator().

For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size).

* Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed.

* Account for removed non-check API.

* Include libxsmm_malloc.h now that libxsmm_tf_allocator is used.

* Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer.

* Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*.

* Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer.

* Include missing header file (libxsmm_malloc.h).

* Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*.

* Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure).

* Updated to new libxsmm_dnn_link_* functions.

* Updated to use new libxsmm_dnn_bind_* functions.

* Fixed calling libxsmm_dnn_transpose_filter.

* integrated LIBXSMM 1.7

* support for LIBXSMM 1.7 (tensorflow#9)

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7

* updated LIBXSMM to 1.7.1

* updated to LIBXSMM 1.7.1 (tensorflow#10)

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7

* updated LIBXSMM to 1.7.1

* merge alheinecke master (tensorflow#11)

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7

* updated LIBXSMM to 1.7.1

* Take new translation units into account (LIBXSMM 1.8).

* Account for adjusted header dependency in LIBXSMM (TODO: API to avoid incl. header from LIBXSMM's src).

* Trigger rebuild if template changed (LIBXSMM).
benoitsteiner pushed a commit to benoitsteiner/tensorflow that referenced this issue Mar 30, 2017
* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold.

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* Fixed libxsmm_config_arguments in libxsmm.BUILD (tensorflow#7)

* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold.

* Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator().

For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size).

* Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed.

* Account for removed non-check API.

* Include libxsmm_malloc.h now that libxsmm_tf_allocator is used.

* Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer.

* Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*.

* Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer.

* Include missing header file (libxsmm_malloc.h).

* Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*.

* Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure).

* Updated to new libxsmm_dnn_link_* functions.

* Updated to use new libxsmm_dnn_bind_* functions.

* Fixed calling libxsmm_dnn_transpose_filter.

* Updates in preparation of LIBXSMM 1.7 (tensorflow#8)

* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold.

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator().

For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size).

* Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed.

* Account for removed non-check API.

* Include libxsmm_malloc.h now that libxsmm_tf_allocator is used.

* Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer.

* Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*.

* Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer.

* Include missing header file (libxsmm_malloc.h).

* Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*.

* Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure).

* Updated to new libxsmm_dnn_link_* functions.

* Updated to use new libxsmm_dnn_bind_* functions.

* Fixed calling libxsmm_dnn_transpose_filter.

* integrated LIBXSMM 1.7

* support for LIBXSMM 1.7 (tensorflow#9)

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7

* updated LIBXSMM to 1.7.1

* updated to LIBXSMM 1.7.1 (tensorflow#10)

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7

* updated LIBXSMM to 1.7.1

* merge alheinecke master (tensorflow#11)

* Upgraded to libxsmm 1.6.5

* Enable the use of libxsmm for matrix multiplications

* Enable the use of libxsmm to speedup 1x1 convolutions (which are
computed using matrix multiplications)

* integrated LIBXSMM 1.7

* updated LIBXSMM to 1.7.1

* Take new translation units into account (LIBXSMM 1.8).

* Account for adjusted header dependency in LIBXSMM (TODO: API to avoid incl. header from LIBXSMM's src).

* Trigger rebuild if template changed (LIBXSMM).
MykolaMoshak pushed a commit to tum-ei-eda/tensorflow that referenced this issue Nov 6, 2020
keithm-xmos referenced this issue in xmos/tensorflow Feb 1, 2021
added xcore implementation of micro_time timer functions
copybara-service bot pushed a commit that referenced this issue Dec 23, 2021
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis.

When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away.

related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684

A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build.

```
Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault.
[Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))]
0x00007fff54530000 in ?? ()
(gdb) where
#0  0x00007fff54530000 in ?? ()
#1  0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#2  0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#3  0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#4  0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#5  0x00007fffd52fe983 in dnnl_sgemm ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
#6  0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) ()
#7  0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) ()
#8  0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) ()
#9  0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) ()
#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) ()
#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6
```
fsx950223 pushed a commit to fsx950223/tensorflow that referenced this issue Nov 28, 2023
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis.

When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away.

related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684

A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build.

```
Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault.
[Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))]
0x00007fff54530000 in ?? ()
(gdb) where
#0  0x00007fff54530000 in ?? ()
#1  0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#2  0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#3  0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#4  0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#5  0x00007fffd52fe983 in dnnl_sgemm ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so
tensorflow#6  0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) ()
tensorflow#7  0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) ()
tensorflow#8  0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) ()
tensorflow#9  0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) ()
tensorflow#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) ()
tensorflow#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
tensorflow#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
tensorflow#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) ()
   from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2
tensorflow#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
tensorflow#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6
```
copybara-service bot pushed a commit that referenced this issue May 2, 2024
copybara-service bot pushed a commit that referenced this issue May 2, 2024
Reverts changelist 525613555

PiperOrigin-RevId: 630160501
copybara-service bot pushed a commit that referenced this issue May 2, 2024
Reverts changelist 443785169

PiperOrigin-RevId: 630160501
copybara-service bot pushed a commit that referenced this issue May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants