New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typo in getting started guide #9
Comments
Apparently the fix has already been put into their pipeline. |
I just pushed the fix in db0b5da Thanks! |
ilblackdragon
added a commit
to ilblackdragon/tensorflow
that referenced
this issue
Mar 9, 2016
…d vocabulary and vocabulary processor to transform documents.
ilblackdragon
added a commit
to ilblackdragon/tensorflow
that referenced
this issue
Mar 9, 2016
Fixes tensorflow#9: Neural translation
ilblackdragon
added a commit
to ilblackdragon/tensorflow
that referenced
this issue
Mar 9, 2016
Continue working on neural translation model tensorflow#9. Added language model example tensorflow#64
This was referenced Jul 10, 2016
benoitsteiner
added a commit
to benoitsteiner/tensorflow
that referenced
this issue
Nov 15, 2016
Small update
benoitsteiner
pushed a commit
to benoitsteiner/tensorflow
that referenced
this issue
Jan 27, 2017
* Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7
benoitsteiner
pushed a commit
to benoitsteiner/tensorflow
that referenced
this issue
Jan 27, 2017
* Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7
benoitsteiner
pushed a commit
to benoitsteiner/tensorflow
that referenced
this issue
Feb 1, 2017
* Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7
benoitsteiner
pushed a commit
to benoitsteiner/tensorflow
that referenced
this issue
Feb 1, 2017
* Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7
benoitsteiner
pushed a commit
to benoitsteiner/tensorflow
that referenced
this issue
Mar 30, 2017
* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * Fixed libxsmm_config_arguments in libxsmm.BUILD (tensorflow#7) * Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator(). For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size). * Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed. * Account for removed non-check API. * Include libxsmm_malloc.h now that libxsmm_tf_allocator is used. * Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer. * Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*. * Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer. * Include missing header file (libxsmm_malloc.h). * Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*. * Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure). * Updated to new libxsmm_dnn_link_* functions. * Updated to use new libxsmm_dnn_bind_* functions. * Fixed calling libxsmm_dnn_transpose_filter. * Updates in preparation of LIBXSMM 1.7 (tensorflow#8) * Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator(). For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size). * Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed. * Account for removed non-check API. * Include libxsmm_malloc.h now that libxsmm_tf_allocator is used. * Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer. * Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*. * Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer. * Include missing header file (libxsmm_malloc.h). * Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*. * Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure). * Updated to new libxsmm_dnn_link_* functions. * Updated to use new libxsmm_dnn_bind_* functions. * Fixed calling libxsmm_dnn_transpose_filter. * integrated LIBXSMM 1.7 * support for LIBXSMM 1.7 (tensorflow#9) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * updated to LIBXSMM 1.7.1 (tensorflow#10) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * merge alheinecke master (tensorflow#11) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * Take new translation units into account (LIBXSMM 1.8). * Account for adjusted header dependency in LIBXSMM (TODO: API to avoid incl. header from LIBXSMM's src). * Trigger rebuild if template changed (LIBXSMM).
benoitsteiner
pushed a commit
to benoitsteiner/tensorflow
that referenced
this issue
Mar 30, 2017
* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * Fixed libxsmm_config_arguments in libxsmm.BUILD (tensorflow#7) * Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator(). For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size). * Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed. * Account for removed non-check API. * Include libxsmm_malloc.h now that libxsmm_tf_allocator is used. * Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer. * Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*. * Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer. * Include missing header file (libxsmm_malloc.h). * Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*. * Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure). * Updated to new libxsmm_dnn_link_* functions. * Updated to use new libxsmm_dnn_bind_* functions. * Fixed calling libxsmm_dnn_transpose_filter. * Updates in preparation of LIBXSMM 1.7 (tensorflow#8) * Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator(). For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size). * Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed. * Account for removed non-check API. * Include libxsmm_malloc.h now that libxsmm_tf_allocator is used. * Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer. * Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*. * Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer. * Include missing header file (libxsmm_malloc.h). * Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*. * Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure). * Updated to new libxsmm_dnn_link_* functions. * Updated to use new libxsmm_dnn_bind_* functions. * Fixed calling libxsmm_dnn_transpose_filter. * integrated LIBXSMM 1.7 * support for LIBXSMM 1.7 (tensorflow#9) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * updated to LIBXSMM 1.7.1 (tensorflow#10) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * merge alheinecke master (tensorflow#11) * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * integrated LIBXSMM 1.7 * updated LIBXSMM to 1.7.1 * Take new translation units into account (LIBXSMM 1.8). * Account for adjusted header dependency in LIBXSMM (TODO: API to avoid incl. header from LIBXSMM's src). * Trigger rebuild if template changed (LIBXSMM).
This was referenced Nov 2, 2020
Undefined symbols for architecture arm64 when loading TensorFlowLiteSelectTfOps on iOS device
#41948
Closed
MykolaMoshak
pushed a commit
to tum-ei-eda/tensorflow
that referenced
this issue
Nov 6, 2020
Update master from upstream
keithm-xmos
referenced
this issue
in xmos/tensorflow
Feb 1, 2021
added xcore implementation of micro_time timer functions
copybara-service bot
pushed a commit
that referenced
this issue
Dec 23, 2021
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () #7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () #8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () #9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () #10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () #11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```
Closed
fsx950223
pushed a commit
to fsx950223/tensorflow
that referenced
this issue
Nov 28, 2023
On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () tensorflow#7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () tensorflow#8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () tensorflow#9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () tensorflow#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () tensorflow#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 tensorflow#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```
copybara-service bot
pushed a commit
that referenced
this issue
May 2, 2024
PiperOrigin-RevId: 630160501
copybara-service bot
pushed a commit
that referenced
this issue
May 2, 2024
Reverts changelist 525613555 PiperOrigin-RevId: 630160501
copybara-service bot
pushed a commit
that referenced
this issue
May 2, 2024
Reverts changelist 443785169 PiperOrigin-RevId: 630160501
copybara-service bot
pushed a commit
that referenced
this issue
May 2, 2024
PiperOrigin-RevId: 630200031
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
http://tensorflow.org/get_started/basic_usage.md#variables
In the code example, 1 is being added to "counter", which is named "var" initially, but referred to as "state" later. My guess is that these are meant to refer to the same thing, and a typo in one of the variable names made it into the example code.
The text was updated successfully, but these errors were encountered: