TF_SessionRun() from C API crashes when not enough RAM #53413

ozavalistyi · 2021-12-14T04:46:10Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Microsoft Windows 10 Enterprise, version: 10.0.18363 N/A Build 18363
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No.
TensorFlow installed from (source or binary): https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-windows-x86_64-2.4.0.zip
TensorFlow version (use command below): bug is tested and reproducible with tensorflow C API of versions: 2.4.0, 2.7.0
Python version: 3.8
CUDA/cuDNN version: used CPU
GPU model and memory: used CPU

Describe the current behavior
I'm using tensorflow C API in my C++ code. I'm using microsoft visual studio 2019 (compiler version - msvc 14.26).
When there's not enough memory, TF_SessionRun() crashes. Visual Studio debugger shows that TF_SessionRun() throws std::bad_alloc. If I debug code and place break immediately before executing TF_SessionRun() and immediately after it, after executing TF_SessionRun() I get the following exception message in Debugger
"Exception thrown at 0x00007FFB9F7EA859 in MyProject.exe: Microsoft C++ exception: std::bad_alloc at memory location 0x000000F789CFD580"
And here's the callstack from where the exception is thrown.
[External Code]

vcruntime140.dll!00007ffb85ba6480() Unknown
tensorflow-2.4.0.dll!00007ffb108e335f() Unknown
tensorflow-2.4.0.dll!00007ffb10863eaf() Unknown
tensorflow-2.4.0.dll!00007ffb108641b3() Unknown
tensorflow-2.4.0.dll!00007ffb1086c119() Unknown
tensorflow-2.4.0.dll!00007ffb10866624() Unknown
tensorflow-2.4.0.dll!00007ffb1432f562() Unknown
tensorflow-2.4.0.dll!00007ffb1433d9fa() Unknown
tensorflow-2.4.0.dll!00007ffb1433c248() Unknown
tensorflow-2.4.0.dll!00007ffb14343589() Unknown
tensorflow-2.4.0.dll!00007ffb16b2860b() Unknown
tensorflow-2.4.0.dll!00007ffb16b26531() Unknown
tensorflow-2.4.0.dll!00007ffb17117799() Unknown
tensorflow-2.4.0.dll!00007ffb17117c31() Unknown
tensorflow-2.4.0.dll!00007ffb17111128() Unknown
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>() Unknown
kernel32.dll!BaseThreadInitThunk�() Unknown
ntdll.dll!RtlUserThreadStart�() Unknown

And the worst thing is that I cannot wrap this call to TF_SessionRun() in try-catch, because the compiler optimizes it away, since we are calling C code, and C code isn't allowed to throw exceptions. But even if could be caught, it is technically undefined behavior, and I cannot rely on catching exceptions from tensorflow.dll, because 1) it was built with a different compiler 2) it is C API, it should not throw anything.

I attached files.zip, which contains main.cpp file that reproduces the problem. It reads neural network protobuf (.pb) file and tries to run it. Script, that generated neural network, and the neural network itself are also inside files.zip

On line 11 of main.cpp there is variable SIZE (const size_t SIZE = 500;) On my machine TF_SessionRun() crashes when SIZE is around 500. It can vary from machine to machine, try different values to reproduce the problem on your machine.

So, the problem is that if SIZE is sufficiently big, a call to TF_SessionRun() on line 63 crashes, and we do not reach the code below TF_SessionRun() that would print either "FINISHED SUCCESSFULLY" or "FINISHED WITH ERROR".
(But if SIZE is sufficiently small, it successfully finishes and prints "FINISHED SUCCESSFULLY")

Describe the expected behavior
If there's not enough memory, TF_SessionRun() should return error code via TF_Status*, and it should not crash.
If we reached call to TF_SessionRun() in main.cpp, then after the call it should print either "FINISHED SUCCESSFULLY" or "FINISHED WITH ERROR"

Standalone code to reproduce the issue
I attached an archive - files.zip. It contains 3 files:
main.cpp - C++ code that reproduces the problem
frozen_graph.pb - neural network protobuf file, that is used inside main.cpp
generate_graph.py - script, that generates frozen_graph.pb

I think you can reproduce this bug (where TF_SessionRun() crashes because there isn't enough RAM) with any sufficiently complex neural network, that consumes lots of RAM. I actually encountered this bug with a different neural network, I cannot share it with you, because it is not my Intellectual Property.

files.zip

kasyap1234 · 2022-01-05T17:52:39Z

is anyone working on this issue?

Mainadol · 2022-01-17T10:23:16Z

The same crash exception in my project. I used C_API of Tensorflow 2.4.0.
The flowing is the callstack in the dump file:

    Idemera.exe!crashpad::CrashpadClient::DumpAndCrashTargetProcess(void *,void *,unsigned long)	
ucrtbase.dll!raise�()	
ucrtbase.dll!abort�()	
ucrtbase.dll!terminate�()	
VCRUNTIME140_1.dll!FindHandler<__FrameHandler4>(EHExceptionRecord * pExcept=0x000000ea62cfd460, unsigned __int64 * pRN=0x000000ea62cfc490, _CONTEXT * pContext=0x000000ea62cfcc10, _xDISPATCHER_CONTEXT * pDC=0x000000ea62cfca50, FH4::FuncInfo4 * pFuncInfo=0x000000ea62cfc460, unsigned char recursive='\0', int CatchDepth=0, unsigned __int64 * pMarkerRN=0x0000000000000000) line 682	C++
VCRUNTIME140_1.dll!__InternalCxxFrameHandler<__FrameHandler4>(EHExceptionRecord * pExcept=0x000000ea62cfd460, unsigned __int64 * pRN=0x000000ea62cfc490, _CONTEXT * pContext=0x000000ea62cfcc10, _xDISPATCHER_CONTEXT * pDC=0x000000ea62cfca50, FH4::FuncInfo4 * pFuncInfo=0x000000ea62cfc460, int CatchDepth=0, unsigned __int64 * pMarkerRN=0x0000000000000000, unsigned char recursive='\0') line 352	C++
VCRUNTIME140_1.dll!__CxxFrameHandler4(EHExceptionRecord * pExcept=0x000000ea62cfd460, unsigned __int64 RN, _CONTEXT * pContext=0x000000ea62cfcc10, _xDISPATCHER_CONTEXT * pDC=0x000000ea62cfca50) line 290	C++
ntdll.dll!RtlpExecuteHandlerForException�()	
ntdll.dll!RtlDispatchException()	
ntdll.dll!RtlRaiseException�()	
KERNELBASE.dll!RaiseException�()	
VCRUNTIME140.dll!_CxxThrowException(void * pExceptionObject=0x000000ea62cfd5b0, const _s__ThrowInfo * pThrowInfo) line 133	C++
tensorflow.dll!Eigen::internal::throw_std_bad_alloc(void)	C++
tensorflow.dll!Eigen::internal::TensorContractionBlockMemAllocator<int,int>::allocateSlices<struct Eigen::ThreadPoolDevice const >(struct Eigen::ThreadPoolDevice const &,__int64,__int64,__int64,__int64,__int64,__int64,class std::vector<int *,class std::allocator<int *> > *,class std::vector<int *,class std::allocator<int *> > *)	C++
tensorflow.dll!Eigen::internal::TensorContractionKernel<float,float,float,__int64,class Eigen::internal::blas_data_mapper<float,__int64,0,0,1>,class Eigen::internal::TensorContractionInputMapper<float,__int64,1,struct Eigen::TensorEvaluator<class Eigen::Tensor<float,2,1,__int64> const ,struct Eigen::ThreadPoolDevice>,class Eigen::array<__int64,1>,class Eigen::array<__int64,1>,8,1,0,0,struct Eigen::MakePointer>,class Eigen::internal::TensorContractionInputMapper<float,__int64,0,struct Eigen::TensorEvaluator<class Eigen::TensorCwiseUnaryOp<struct Eigen::internal::scalar_square_op<float const >,class Eigen::TensorMap<class Eigen::Tensor<float const ,2,1,__int64>,16,struct Eigen::MakePointer> const > const ,struct Eigen::ThreadPoolDevice>,class Eigen::array<__int64,1>,class Eigen::array<__int64,1>,8,1,1,0,struct Eigen::MakePointer> >::allocateSlices<struct Eigen::ThreadPoolDevice const >(struct Eigen::ThreadPoolDevice const &,int,int,int,class std::vector<struct Eigen::internal::ColMajorBlock<float,__int64>,class std::allocato	C++
tensorflow.dll!Eigen::TensorEvaluator<class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const > const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const ,struct tensorflow::LaunchFusedConv2DWithOutputKernel<float>::OutputKernelWrapper const > const ,struct Eigen::ThreadPoolDevice>::EvalParallelContext<struct Eigen::TensorEvaluator<class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > co�()	C++
tensorflow.dll!Eigen::TensorEvaluator<class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const > const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const ,struct tensorflow::LaunchFusedConv2DWithOutputKernel<float>::OutputKernelWrapper const > const ,struct Eigen::ThreadPoolDevice>::evalProductImpl<struct Eigen::TensorEvaluator<class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const �()	C++
tensorflow.dll!Eigen::internal::TensorExecutor<class Eigen::TensorAssignOp<class Eigen::TensorMap<class Eigen::Tensor<float,4,1,__int64>,16,struct Eigen::MakePointer>,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,4> const ,class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const > const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const ,struct tensorflow::LaunchFusedConv2DWithOutputKernel<float>::OutputKernelWrapper const > const > const > const ,struct Eigen::ThreadPoolDevice,1,0>::run(class Eigen::TensorAssignOp<class Eigen::TensorMap<class Eigen::Tensor<float,4,1,__int64>,16,struct Eigen::MakePointer>,class Eigen::Tensor	C++
tensorflow.dll!tensorflow::LaunchFusedConv2DWithOutputKernel<float>::operator()<struct tensorflow::BiasAddOutputKernel<float,struct tensorflow::LeakyRelu> >(struct tensorflow::BiasAddOutputKernel<float,struct tensorflow::LeakyRelu> const &,class tensorflow::OpKernelContext *,class tensorflow::Tensor const &,class tensorflow::Tensor const &,class tensorflow::Tensor *)	C++
tensorflow.dll!tensorflow::LaunchFusedConv2DOp<struct Eigen::ThreadPoolDevice,float>::operator()(class tensorflow::OpKernelContext *,bool,bool,class tensorflow::Tensor const &,class tensorflow::Tensor const &,enum tensorflow::FusedComputationType,struct tensorflow::FusedComputationArgs const &,struct tensorflow::Conv2DParameters const &,struct tensorflow::Conv2DDimensions const &,class tensorflow::Tensor *)	C++
tensorflow.dll!tensorflow::FusedConv2DOp<struct Eigen::ThreadPoolDevice,float>::Compute(class tensorflow::OpKernelContext *)	C++
tensorflow.dll!tensorflow::NewLocalExecutor(struct tensorflow::LocalExecutorParams const &,class tensorflow::Graph const &,class tensorflow::Executor * *)	C++
tensorflow.dll!tensorflow::NewLocalExecutor(struct tensorflow::LocalExecutorParams const &,class tensorflow::Graph const &,class tensorflow::Executor * *)	C++
tensorflow.dll!Eigen::ThreadPoolTempl<struct tensorflow::thread::EigenEnvironment>::WorkerLoop(int)	C++
tensorflow.dll!std::_Func_impl_no_alloc<class <lambda_fe7aa395b13fe170862dcdb4d85eb030>,void>::_Do_call(void)	C++
tensorflow.dll!std::thread::_Invoke<class std::tuple<class std::function<void > >,0>(void *)	C++
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>()	
KERNEL32.DLL!BaseThreadInitThunk�()	
ntdll.dll!RtlUserThreadStart�()

The next callstack is from the calling method in the same dump file:

ntdll.dll!NtWaitForAlertByThreadId�()	
ntdll.dll!RtlSleepConditionVariableSRW()	
KERNELBASE.dll!SleepConditionVariableSRW�()	
MSVCP140.dll!__crtSleepConditionVariableSRW(_RTL_CONDITION_VARIABLE * pCond, _RTL_SRWLOCK * pLock, unsigned long dwMs, unsigned long flags) line 659	C++
[内联框架] MSVCP140.dll!Concurrency::details::stl_condition_variable_win7::wait_for(Concurrency::details::stl_critical_section_interface *) line 216	C++
MSVCP140.dll!Concurrency::details::stl_condition_variable_win7::wait(Concurrency::details::stl_critical_section_interface * lock) line 210	C++
MSVCP140.dll!do_wait(_Cnd_internal_imp_t * cond=0x00000253c8db76c8, _Mtx_internal_imp_t * mtx=0x00000253c8db7678, const xtime * target=0x0000000000000000) line 77	C++
tensorflow.dll!nsync::nsync_mu_semaphore_p_with_deadline(struct nsync::nsync_semaphore_s_ *,struct timespec)	C++
tensorflow.dll!nsync::nsync_sem_wait_with_cancel_(struct nsync::waiter *,struct timespec,struct nsync::nsync_note_s_ *)	C++
tensorflow.dll!nsync::nsync_cv_wait_with_deadline_generic(struct nsync::nsync_cv_s_ *,void *,void (*)(void *),void (*)(void *),struct timespec,struct nsync::nsync_note_s_ *)	C++
tensorflow.dll!nsync::nsync_cv_wait(struct nsync::nsync_cv_s_ *,struct nsync::nsync_mu_s_ *)	C++
tensorflow.dll!tensorflow::Executor::Run(struct tensorflow::Executor::Args const &)	C++
tensorflow.dll!tensorflow::DirectSession::RunInternal(__int64,class tensorflow::RunOptions const &,class tensorflow::CallFrameInterface *,struct tensorflow::DirectSession::ExecutorsAndKeys *,class tensorflow::RunMetadata *,struct tensorflow::thread::ThreadPoolOptions const &)	C++
tensorflow.dll!tensorflow::DirectSession::Run(class tensorflow::RunOptions const &,class std::vector<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class tensorflow::Tensor>,class std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class tensorflow::Tensor> > > const &,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,class std::vector<class tensorflow::Tensor,class std::allocator<class tensorflow::Tensor> > *,class tensorflow::RunMetadata *,struct tensorflow::thread::ThreadPoolOptions const &)	C++
tensorflow.dll!tensorflow::DirectSession::Run(class tensorflow::RunOptions const &,class std::vector<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class tensorflow::Tensor>,class std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class tensorflow::Tensor> > > const &,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,class std::vector<class tensorflow::Tensor,class std::allocator<class tensorflow::Tensor> > *,class tensorflow::RunMetadata *)	C++
tensorflow.dll!absl::lts_2020_02_25::StartsWith(class absl::lts_2020_02_25::string_view,class absl::lts_2020_02_25::string_view)	C++
tensorflow.dll!TF_SessionRun�()	C++
TextureProcessLibrary.dll!TFUtils::RunSession(TF_Session * sess=0x000002553c6d15e0, const TF_Output * inputs=0x000002553b7dd530, TF_Tensor * const * input_tensors=0x00000254c74348a0, unsigned __int64 ninputs=1, const TF_Output * outputs=0x0000025539d60e10, TF_Tensor * * output_tensors=0x000002553c6d1160, unsigned __int64 noutputs=4) line 301	C++
[内联框架] TextureProcessLibrary.dll!TFUtils::RunSession(TF_Session *) line 334	C++
TextureProcessLibrary.dll!TFUtils::RunSession(const std::vector<TF_Output,std::allocator<TF_Output> > & inputs, const std::vector<TF_Tensor *,std::allocator<TF_Tensor *> > & input_tensors={...}, const std::vector<TF_Output,std::allocator<TF_Output> > & outputs, std::vector<TF_Tensor *,std::allocator<TF_Tensor *> > & output_tensors={...}) line 145	C++
TextureProcessLibrary.dll!TexEdit::DeepTextureModelGenerator::generateMap(Base::Path modelPackagePath={...}, QString modelSubPath={...}, cv::Mat & inputMat, std::vector<cv::Mat,std::allocator<cv::Mat> > & outMaps={...}, const char * inputName=0x00007ffa747847b0) 行 543	C++
TextureProcessLibrary.dll!TexEdit::DeepTextureModelGenerator::generateMetallicTexturesFromS11() line 327	C++
TextureProcessLibrary.dll!TexEdit::DeepTextureModelGenerator::generate() line 209	C++
MaterialLibrary.dll!XScan::ScanTaskThread::generateDesPngs() line 504	C++
MaterialLibrary.dll!XScan::ScanTaskThread::generateTextures() line 367	C++
MaterialLibrary.dll!XScan::ScanTaskThread::startScan() line 271	C++
MaterialLibrary.dll!XScan::ScanTaskThread::run() line 38	C++
Qt5Core.dll!QThreadPrivate::start(void * arg=0x00000253c8d84880) line 405	C++
KERNEL32.DLL!BaseThreadInitThunk�()	
ntdll.dll!RtlUserThreadStart�()

ozavalistyi added the type:bug Bug label Dec 14, 2021

google-ml-butler bot assigned tilakrayal Dec 14, 2021

tilakrayal added TF 2.7 Issues related to TF 2.7.0 comp:runtime c++ runtime, performance issues (cpu) labels Dec 14, 2021

tilakrayal assigned sanatmpa1 and unassigned tilakrayal Dec 14, 2021

sanatmpa1 assigned sachinprasadhs and unassigned sanatmpa1 Dec 14, 2021

sachinprasadhs assigned mhong Dec 15, 2021

sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF_SessionRun() from C API crashes when not enough RAM #53413

TF_SessionRun() from C API crashes when not enough RAM #53413

ozavalistyi commented Dec 14, 2021

kasyap1234 commented Jan 5, 2022

Mainadol commented Jan 17, 2022 •

edited

TF_SessionRun() from C API crashes when not enough RAM #53413

TF_SessionRun() from C API crashes when not enough RAM #53413

Comments

ozavalistyi commented Dec 14, 2021

kasyap1234 commented Jan 5, 2022

Mainadol commented Jan 17, 2022 • edited

Mainadol commented Jan 17, 2022 •

edited