Setting lower gcc version for cuda #8

kmatzen · 2015-11-09T16:35:47Z

gcc 4.10 and up are not supported for the required cuda toolkit. How can I specify a different version of gcc to bazel? I tried setting a new version with --compiler, but it reported that no toolchain was found. Do I need to provide a CROSSTOOL file or is there an easier way?

zheng-xq · 2015-11-09T17:28:59Z

Could you try the docker-based installation for now? It should have the correct gcc version.

http://tensorflow.org/get_started/os_setup.md#docker-based_installation

It is possible to modify the lower-level scripts to use a different gcc. But you will need to modify a few places in the source code to make that happen.

zheng-xq · 2015-11-09T18:21:02Z

In case you have any trouble with the docker flow, this is the place you can modify to use a different compiler with Cuda, although we don't currently have a more user-friendly way of doing that.

https://github.com/tensorflow/tensorflow/blob/master/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc

CPU_COMPILER = ('/usr/bin/gcc')
NVCC_PATH = CURRENT_DIR + '/../../../cuda/bin/nvcc'
GCC_HOST_COMPILER_PATH = ('/usr/bin/gcc')
LLVM_HOST_COMPILER_PATH = ('/usr/bin/gcc')

jontis · 2015-12-01T20:26:19Z

I changed the required gcc version in CUDA before installing so I could use my system version.
Seems to have worked. There are rumors about possible problems but I've seen none so far.

https://www.udacity.com/wiki/cs344/troubleshoot-gcc47

jontis · 2015-12-01T20:27:07Z

Here are some other options for you:
https://stackoverflow.com/questions/6622454/cuda-incompatible-with-my-gcc-version

Typo in README

…r and added tests for both data feeders. Start adding support in DataFeeder for multi-dimensional targets

* Fixed libxsmm_config_arguments: Fixed the incorrect value supposed to trigger auto-prefetch. Fixed the 0-threshold, which is now accounted for in LIBXSMM (by just populating the default threshold). The problem arised from the assumption "threshold: fallback to BLAS if n*m*k above this", which is wrong (the threshold populates an upper bound until which JIT code is generated). The previous configuration perhaps caused all sorts of issues due to other values derived from the 0-threshold. Note, explicitly JIT'ting code is/was never subject to a threshold. * Upgraded to libxsmm 1.6.5 * Enable the use of libxsmm for matrix multiplications * Enable the use of libxsmm to speedup 1x1 convolutions (which are computed using matrix multiplications) * Make use of TensorFlow's allocation infrastructure even when using LIBXSMM allocation functions. In particular, the (cached) libxsmm_spmdm_init now relies on TF's cpu_allocator(). For C++ code, one can use a libxsmm_scoped_allocator<kind> in order to (temporarily) setup a different allocation mechanism. For instance, using libxsmm_tf_allocator<libxsmm_scratch_allocator> changes LIBXSMM's scratch allocator to rely on TensorFlow. The libxsmm_tf_allocator provides two kinds of c'tors: (1) the no-argument variant adopts TF's cpu_allocator(), whereas the one-argument form (2) adopts the allocator from the given OpKernelContext. Changing the allocator in LIBXSMM with pending buffers (from different allocators) is valid, and all other services in LIBXSMM's "malloc domain" work regardless of the allocation mechanism (e.g., libxsmm_malloc_size). * Simply renamed API items in order to follow changes in LIBXSMM 1.7. This is incomplete as more changes/adjustments are needed. * Account for removed non-check API. * Include libxsmm_malloc.h now that libxsmm_tf_allocator is used. * Renamed libxsmm_dnn_create_conv_handle to libxsmm_dnn_create_conv_layer. * Renamed LIBXSMM_DNN_CONV_FORMAT_* to LIBXSMM_DNN_TENSOR_FORMAT_*. * Renamed libxsmm_dnn_destroy_conv_handle to libxsmm_dnn_destroy_conv_layer. * Include missing header file (libxsmm_malloc.h). * Renamed LIBXSMM_DNN_CONV_KIND_* to LIBXSMM_DNN_COMPUTE_KIND_*. * Account for the fact that datatype_in/out is now only datatype (libxsmm_dnn_conv_desc structure). * Updated to new libxsmm_dnn_link_* functions. * Updated to use new libxsmm_dnn_bind_* functions. * Fixed calling libxsmm_dnn_transpose_filter.

Update TF, add ResNext Stage1 example

Update master from upstream

Sync develop with upstream/master

On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so #6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () #7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () #8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () #9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () #10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () #11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 #14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```

* add error hint to check if the separately compiled cus module file exists * support `leakyrelu` with `cus` type * allow cus type for dropout * add literal for cus * add `PhiloxBitGenerator` for random op of cus * split the fp16 and cus implementations, so that it could be easier to implement the emulation of other data types. * fix the weird behaviors brought by modifications in model dataloader and clean code * add cgan benchmark * add vae benchmark * add nv standard lib function calls to replace previous manually implemented math functions * add exp8 header * add the new Resnet model * add general implementation for width16 * rename and modify the conversion betweeen general float type and float32 * change cus default to be general float * add cus for more ops like sqaure_difference in math_ops * add the timeseries_classification_transformer model * support for keras models * support cus type in einsum_op * fix the issue that `cus` `constant` op always has value `0` * remove the redundant casting as the cus constant has been fixed, change the default type as cus * remove unused code * make sure that layets called in multiheadattention are with the correct policy * make sure gamma and beta in normalization are using cus type to compute * use float to determine dropout mask * add cutlass strided batched gemm * fixed a typo * add seed for timeseries model for deterministic results * remove early stopping and set epoch to be 120 for consistency * fix a typo * add exp8 into general_float definition for consistency * use round up for exp for consistency * make sure loss scale is not used for consistency * resolve the issue on XLA assertion

On some CI nodes (typically those with higher CPU core counts 128/256), the `//tensorflow/c/eager:c_api_distributed_test_gpu` test fails on an intermitent basis. When it does fail, the failures manifests as segfault at the end of the test, with the stack dump shown at the end of this commit message. The stack dump points the finger to a routine within the MKLDNN implementation. This is further confirmed by the observation that disabling the MKLDNN based Eigen contraction kernels (for ROCm) seems to make the crash go away. related JIRA ticket - https://ontrack-internal.amd.com/browse/SWDEV-313684 A previous commit disabled the `//tensorflow/c/eager:c_api_distributed_test` unit-test only in the CPU unit-tests CI job (for the same reason). That comit cannot be reverted, because this commit disables MKLDNN based Eigen contraction kernels *only* for the ROCm build. ``` Thread 191 "c_api_distribut" received signal SIGSEGV, Segmentation fault. [Switching to thread 191 (Thread 0x7ffc777fe700 (LWP 159004))] 0x00007fff54530000 in ?? () (gdb) where #0 0x00007fff54530000 in ?? () #1 0x00007fffd5d15ae4 in dnnl::impl::cpu::x64::avx_gemm_f32::sgemm_nocopy_driver(char const*, char const*, long, long, long, float const*, float const*, long, float const*, long, float const*, float*, long, float const*, float*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#2 0x00007fffd5d166e1 in dnnl::impl::cpu::x64::jit_avx_gemm_f32(int, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#3 0x00007fffd5e277ed in dnnl_status_t dnnl::impl::cpu::x64::gemm_driver<float, float, float>(char const*, char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, float const*, long const*, float const*, float const*, float*, long const*, float const*, bool, dnnl::impl::cpu::x64::pack_type, dnnl::impl::cpu::x64::gemm_pack_storage_t*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#4 0x00007fffd5665056 in dnnl::impl::cpu::extended_sgemm(char const*, char const*, long const*, long const*, long const*, float const*, float const*, long const*, float const*, long const*, float const*, float*, long const*, float const*, bool) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#5 0x00007fffd52fe983 in dnnl_sgemm () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/libexternal_Smkl_Udnn_Uv1_Slibmkl_Udnn.so tensorflow#6 0x0000555557187b0b in Eigen::internal::TensorContractionKernel<float, float, float, long, Eigen::internal::blas_data_mapper<float, long, 0, 0, 1>, Eigen::internal::TensorContractionInputMapper<float, long, 1, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer>, Eigen::internal::TensorContractionInputMapper<float, long, 0, Eigen::TensorEvaluator<Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::ThreadPoolDevice>, Eigen::array<long, 1ul>, Eigen::array<long, 1ul>, 4, true, false, 0, Eigen::MakePointer> >::invoke(Eigen::internal::blas_data_mapper<float, long, 0, 0, 1> const&, Eigen::internal::ColMajorBlock<float, long> const&, Eigen::internal::ColMajorBlock<float, long> const&, long, long, long, float, float) () tensorflow#7 0x000055555718dc76 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::kernel(long, long, long, bool) () tensorflow#8 0x000055555718f327 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::signal_kernel(long, long, long, bool, bool) () tensorflow#9 0x00005555571904cb in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::pack_rhs(long, long) () tensorflow#10 0x000055555718fd69 in Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::EvalParallelContext<Eigen::TensorEvaluator<Eigen::TensorContractionOp<Eigen::array<Eigen::IndexPair<long>, 1ul> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const, Eigen::NoOpOutputKernel const> const, Eigen::ThreadPoolDevice>::NoCallback, true, true, false, 0>::enqueue_packing_helper(long, long, long, bool) () tensorflow#11 0x00007ffff6b607a1 in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#12 0x00007ffff6b5de93 in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#13 0x00007ffff6b40107 in tensorflow::(anonymous namespace)::PThread::ThreadFn(void*) () from /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/c/eager/../../../_solib_local/_U_S_Stensorflow_Sc_Seager_Cc_Uapi_Udistributed_Utest_Ugpu___Utensorflow/libtensorflow_framework.so.2 tensorflow#14 0x00007fffd1ca86db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 tensorflow#15 0x00007fffd00b471f in clone () from /lib/x86_64-linux-gnu/libc.so.6 ```

keveman added the cuda label Nov 9, 2015

astellato mentioned this issue Nov 18, 2015

Android App Crash On Start #280

Closed

martinwicke closed this as completed Dec 16, 2015

wchan mentioned this issue Jan 7, 2016

failed to query current context: CUDA_ERROR_DEINITIALIZED #713

Closed

WeiTang114 mentioned this issue Jan 30, 2016

build fail with cuda: sparse_xent_op.h #942

Closed

ilblackdragon added a commit to ilblackdragon/tensorflow that referenced this issue Mar 9, 2016

Merge pull request tensorflow#8 from terrytangyuan/patch-1

42e19a2

Typo in README

ilblackdragon added a commit to ilblackdragon/tensorflow that referenced this issue Mar 9, 2016

Work on tensorflow#7 and tensorflow#8: Implemented StreamingDataFeede…

0eaf00d

…r and added tests for both data feeders. Start adding support in DataFeeder for multi-dimensional targets

anuragkr90 mentioned this issue Apr 8, 2016

Core Dump Error on running examples #1803

Closed

alphaf52 mentioned this issue Apr 19, 2016

recurrent layer on top of convolution layer fails with GPU #2015

Closed

wangyongliang mentioned this issue Apr 27, 2016

core dump when import tensorflow #2129

Closed

neurotenguin mentioned this issue May 4, 2016

Can't allocate memory from GPU error in digits.py #2219

Closed

wagonhelm mentioned this issue Jun 30, 2016

Android Tensorflow Crashes w/ Custom Graph #3117

Closed

This was referenced Jul 10, 2016

Segmentation fault on tensorflow 0.9.0 #2773

Closed

Segmentation fault, import tensorflow, tensorflow 0.9, mac osx #3263

Closed

pmousoul mentioned this issue Sep 11, 2016

Error malloc(): memory corruption #3442

Closed

jthoth mentioned this issue Sep 16, 2016

Error with my own model trained in android demo #4407

Closed

danielgordon10 mentioned this issue Dec 10, 2016

Running summary call crashes sporadically #6195

Closed

aseuteurideu mentioned this issue Dec 13, 2016

in debug mode, got Assertion `cudaGetLastError() == cudaSuccess' failed #6285

Closed

beniroquai mentioned this issue Dec 18, 2016

[Android] Put large .pb files outside of Asset-folder? [ERROR]: Check failed: message->ParseFromZeroCopyStream(&lis) #6213

Closed

vahidk mentioned this issue Jan 27, 2017

Tensorflow freezes on iOS during Session::Run #7108

Closed

GeorgianaPetria mentioned this issue Jan 31, 2017

TensorFlow demo app crashes with my own model #4451

Closed

mkabra mentioned this issue Feb 14, 2017

Seg fault when using tf session with opencv 3 #7378

Closed

RayLucchesi mentioned this issue Aug 4, 2020

TF 2.4.0 build from source gets InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid. #41976

Closed

i-tolmachev mentioned this issue Aug 19, 2020

Application crash when using SetNumThreads from tflite::impl::Interpreter #42487

Closed

denise-k referenced this issue in plaidml/tensorflow Aug 28, 2020

Merge pull request #8 from dgkutnic/dgkutnic/update-tf

b00ac04

Update TF, add ResNext Stage1 example

This was referenced Sep 24, 2020

Didnt find op for builtin opcode 'RESIZE_NEAREST_NEIGHBOR' version '3' #43291

Closed

null pointer dereference Error in TF2.3.0 with runforMultipleInputOutput #43657

Closed

This was referenced Nov 2, 2020

Undefined symbols for architecture arm64 when loading TensorFlowLiteSelectTfOps on iOS device #41948

Closed

crashed at TfLiteInterpreterCreate #44513

Closed

MykolaMoshak pushed a commit to tum-ei-eda/tensorflow that referenced this issue Nov 6, 2020

Merge pull request tensorflow#8 from tensorflow/master

fb5af3e

Update master from upstream

zhangsanfeng86 mentioned this issue Nov 11, 2020

LeakSanitizer: detected memory leaks #44714

Open

gnthibault mentioned this issue Nov 14, 2020

tf.data.Dataset.list_files result in segfault #44878

Closed

sunzhe09 mentioned this issue Nov 27, 2020

ResizeBilinear op with half_pixel_centers true not support by nnapi #45207

Closed

keithm-xmos referenced this issue in xmos/tensorflow Feb 1, 2021

Merge pull request #8 from xmos/master

2eb0e75

Sync develop with upstream/master

DanielMao2015 mentioned this issue Mar 16, 2021

error about Kernel8bitNeonDotprodOutOfOrder occurs when running int8 CPU inference by tflite #47834

Closed

limdlh mentioned this issue Apr 6, 2021

tensorflow-2.4.1 crash by pthread_mutex_lock while enable Hexagon #48325

Closed

dinkdeep mentioned this issue Apr 7, 2021

Segmentation fault in tf-opt while running a tf dialect mlir file #48365

Open

wang5566 mentioned this issue May 12, 2021

TensorRT Segmentation Fault During Conversion For Debug Mode #49139

Open

rsanthanam-amd mentioned this issue Jul 1, 2021

[ROCm] This change replaces the original assert for detecting multiple #49232

Closed

DavidvSon1 mentioned this issue Oct 3, 2021

Segmentation fault when invoking TFLite interpreter on basic quantized model tensorflow/model-optimization#857

Open

elfringham mentioned this issue Nov 24, 2021

Unit test //tensorflow/compiler/xla/tests:xla_hlo_profile_test_cpu gives illegal instruction on AARCH64 #53189

Closed

StewardH mentioned this issue Feb 13, 2022

A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x70 in tid 28007 #34313

Closed

goddie1 mentioned this issue Mar 10, 2022

tensorflow core while do stdthread create #55186

Closed

rbaranchuk-capgemini mentioned this issue Jan 9, 2023

Fix memory leaks in xla::BufferDonationTest #59033

Closed

iammeizu mentioned this issue May 10, 2023

tensorflow1.7 hangs at LocalMaster::RunStep with tf.train.MonitoredTrainingSession in sync mode #24338

Closed

ivankxt mentioned this issue Jun 12, 2023

Get deadlock after Predict(cuda10.0, cudnn7.6.5, Tesla T4 GPU) #60841

Closed

lyz1005 mentioned this issue Oct 26, 2023

Interpreter run crash #62240

Closed

CarloWood mentioned this issue Nov 11, 2023

cuDNN, cuFFT, and cuBLAS Errors #62075

Open

spacycoder mentioned this issue Dec 11, 2023

Why does my full integer quantized tflite model crash when loaded? #62618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting lower gcc version for cuda #8

Setting lower gcc version for cuda #8

kmatzen commented Nov 9, 2015

zheng-xq commented Nov 9, 2015

zheng-xq commented Nov 9, 2015

jontis commented Dec 1, 2015

jontis commented Dec 1, 2015

Setting lower gcc version for cuda #8

Setting lower gcc version for cuda #8

Comments

kmatzen commented Nov 9, 2015

zheng-xq commented Nov 9, 2015

zheng-xq commented Nov 9, 2015

jontis commented Dec 1, 2015

jontis commented Dec 1, 2015