Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF_SessionRun() from C API crashes when not enough RAM #53413

Open
ozavalistyi opened this issue Dec 14, 2021 · 2 comments
Open

TF_SessionRun() from C API crashes when not enough RAM #53413

ozavalistyi opened this issue Dec 14, 2021 · 2 comments
Assignees
Labels
comp:runtime c++ runtime, performance issues (cpu) stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.7 Issues related to TF 2.7.0 type:bug Bug

Comments

@ozavalistyi
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Microsoft Windows 10 Enterprise, version: 10.0.18363 N/A Build 18363
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No.
  • TensorFlow installed from (source or binary): https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-windows-x86_64-2.4.0.zip
  • TensorFlow version (use command below): bug is tested and reproducible with tensorflow C API of versions: 2.4.0, 2.7.0
  • Python version: 3.8
  • CUDA/cuDNN version: used CPU
  • GPU model and memory: used CPU

Describe the current behavior
I'm using tensorflow C API in my C++ code. I'm using microsoft visual studio 2019 (compiler version - msvc 14.26).
When there's not enough memory, TF_SessionRun() crashes. Visual Studio debugger shows that TF_SessionRun() throws std::bad_alloc. If I debug code and place break immediately before executing TF_SessionRun() and immediately after it, after executing TF_SessionRun() I get the following exception message in Debugger
"Exception thrown at 0x00007FFB9F7EA859 in MyProject.exe: Microsoft C++ exception: std::bad_alloc at memory location 0x000000F789CFD580"
And here's the callstack from where the exception is thrown.
[External Code]

vcruntime140.dll!00007ffb85ba6480() Unknown
tensorflow-2.4.0.dll!00007ffb108e335f() Unknown
tensorflow-2.4.0.dll!00007ffb10863eaf() Unknown
tensorflow-2.4.0.dll!00007ffb108641b3() Unknown
tensorflow-2.4.0.dll!00007ffb1086c119() Unknown
tensorflow-2.4.0.dll!00007ffb10866624() Unknown
tensorflow-2.4.0.dll!00007ffb1432f562() Unknown
tensorflow-2.4.0.dll!00007ffb1433d9fa() Unknown
tensorflow-2.4.0.dll!00007ffb1433c248() Unknown
tensorflow-2.4.0.dll!00007ffb14343589() Unknown
tensorflow-2.4.0.dll!00007ffb16b2860b() Unknown
tensorflow-2.4.0.dll!00007ffb16b26531() Unknown
tensorflow-2.4.0.dll!00007ffb17117799() Unknown
tensorflow-2.4.0.dll!00007ffb17117c31() Unknown
tensorflow-2.4.0.dll!00007ffb17111128() Unknown
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>() Unknown
kernel32.dll!BaseThreadInitThunk�() Unknown
ntdll.dll!RtlUserThreadStart�() Unknown

And the worst thing is that I cannot wrap this call to TF_SessionRun() in try-catch, because the compiler optimizes it away, since we are calling C code, and C code isn't allowed to throw exceptions. But even if could be caught, it is technically undefined behavior, and I cannot rely on catching exceptions from tensorflow.dll, because 1) it was built with a different compiler 2) it is C API, it should not throw anything.

I attached files.zip, which contains main.cpp file that reproduces the problem. It reads neural network protobuf (.pb) file and tries to run it. Script, that generated neural network, and the neural network itself are also inside files.zip

On line 11 of main.cpp there is variable SIZE (const size_t SIZE = 500;) On my machine TF_SessionRun() crashes when SIZE is around 500. It can vary from machine to machine, try different values to reproduce the problem on your machine.

So, the problem is that if SIZE is sufficiently big, a call to TF_SessionRun() on line 63 crashes, and we do not reach the code below TF_SessionRun() that would print either "FINISHED SUCCESSFULLY" or "FINISHED WITH ERROR".
(But if SIZE is sufficiently small, it successfully finishes and prints "FINISHED SUCCESSFULLY")

Describe the expected behavior
If there's not enough memory, TF_SessionRun() should return error code via TF_Status*, and it should not crash.
If we reached call to TF_SessionRun() in main.cpp, then after the call it should print either "FINISHED SUCCESSFULLY" or "FINISHED WITH ERROR"

Standalone code to reproduce the issue
I attached an archive - files.zip. It contains 3 files:
main.cpp - C++ code that reproduces the problem
frozen_graph.pb - neural network protobuf file, that is used inside main.cpp
generate_graph.py - script, that generates frozen_graph.pb

I think you can reproduce this bug (where TF_SessionRun() crashes because there isn't enough RAM) with any sufficiently complex neural network, that consumes lots of RAM. I actually encountered this bug with a different neural network, I cannot share it with you, because it is not my Intellectual Property.

files.zip

@ozavalistyi ozavalistyi added the type:bug Bug label Dec 14, 2021
@tilakrayal tilakrayal added TF 2.7 Issues related to TF 2.7.0 comp:runtime c++ runtime, performance issues (cpu) labels Dec 14, 2021
@tilakrayal tilakrayal assigned sanatmpa1 and unassigned tilakrayal Dec 14, 2021
@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Dec 15, 2021
@kasyap1234
Copy link

is anyone working on this issue?

@Mainadol
Copy link

Mainadol commented Jan 17, 2022

The same crash exception in my project. I used C_API of Tensorflow 2.4.0.
The flowing is the callstack in the dump file:

    Idemera.exe!crashpad::CrashpadClient::DumpAndCrashTargetProcess(void *,void *,unsigned long)	
ucrtbase.dll!raise�()	
ucrtbase.dll!abort�()	
ucrtbase.dll!terminate�()	
VCRUNTIME140_1.dll!FindHandler<__FrameHandler4>(EHExceptionRecord * pExcept=0x000000ea62cfd460, unsigned __int64 * pRN=0x000000ea62cfc490, _CONTEXT * pContext=0x000000ea62cfcc10, _xDISPATCHER_CONTEXT * pDC=0x000000ea62cfca50, FH4::FuncInfo4 * pFuncInfo=0x000000ea62cfc460, unsigned char recursive='\0', int CatchDepth=0, unsigned __int64 * pMarkerRN=0x0000000000000000) line 682	C++
VCRUNTIME140_1.dll!__InternalCxxFrameHandler<__FrameHandler4>(EHExceptionRecord * pExcept=0x000000ea62cfd460, unsigned __int64 * pRN=0x000000ea62cfc490, _CONTEXT * pContext=0x000000ea62cfcc10, _xDISPATCHER_CONTEXT * pDC=0x000000ea62cfca50, FH4::FuncInfo4 * pFuncInfo=0x000000ea62cfc460, int CatchDepth=0, unsigned __int64 * pMarkerRN=0x0000000000000000, unsigned char recursive='\0') line 352	C++
VCRUNTIME140_1.dll!__CxxFrameHandler4(EHExceptionRecord * pExcept=0x000000ea62cfd460, unsigned __int64 RN, _CONTEXT * pContext=0x000000ea62cfcc10, _xDISPATCHER_CONTEXT * pDC=0x000000ea62cfca50) line 290	C++
ntdll.dll!RtlpExecuteHandlerForException�()	
ntdll.dll!RtlDispatchException()	
ntdll.dll!RtlRaiseException�()	
KERNELBASE.dll!RaiseException�()	
VCRUNTIME140.dll!_CxxThrowException(void * pExceptionObject=0x000000ea62cfd5b0, const _s__ThrowInfo * pThrowInfo) line 133	C++
tensorflow.dll!Eigen::internal::throw_std_bad_alloc(void)	C++
tensorflow.dll!Eigen::internal::TensorContractionBlockMemAllocator<int,int>::allocateSlices<struct Eigen::ThreadPoolDevice const >(struct Eigen::ThreadPoolDevice const &,__int64,__int64,__int64,__int64,__int64,__int64,class std::vector<int *,class std::allocator<int *> > *,class std::vector<int *,class std::allocator<int *> > *)	C++
tensorflow.dll!Eigen::internal::TensorContractionKernel<float,float,float,__int64,class Eigen::internal::blas_data_mapper<float,__int64,0,0,1>,class Eigen::internal::TensorContractionInputMapper<float,__int64,1,struct Eigen::TensorEvaluator<class Eigen::Tensor<float,2,1,__int64> const ,struct Eigen::ThreadPoolDevice>,class Eigen::array<__int64,1>,class Eigen::array<__int64,1>,8,1,0,0,struct Eigen::MakePointer>,class Eigen::internal::TensorContractionInputMapper<float,__int64,0,struct Eigen::TensorEvaluator<class Eigen::TensorCwiseUnaryOp<struct Eigen::internal::scalar_square_op<float const >,class Eigen::TensorMap<class Eigen::Tensor<float const ,2,1,__int64>,16,struct Eigen::MakePointer> const > const ,struct Eigen::ThreadPoolDevice>,class Eigen::array<__int64,1>,class Eigen::array<__int64,1>,8,1,1,0,struct Eigen::MakePointer> >::allocateSlices<struct Eigen::ThreadPoolDevice const >(struct Eigen::ThreadPoolDevice const &,int,int,int,class std::vector<struct Eigen::internal::ColMajorBlock<float,__int64>,class std::allocato	C++
tensorflow.dll!Eigen::TensorEvaluator<class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const > const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const ,struct tensorflow::LaunchFusedConv2DWithOutputKernel<float>::OutputKernelWrapper const > const ,struct Eigen::ThreadPoolDevice>::EvalParallelContext<struct Eigen::TensorEvaluator<class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > co�()	C++
tensorflow.dll!Eigen::TensorEvaluator<class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const > const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const ,struct tensorflow::LaunchFusedConv2DWithOutputKernel<float>::OutputKernelWrapper const > const ,struct Eigen::ThreadPoolDevice>::evalProductImpl<struct Eigen::TensorEvaluator<class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const �()	C++
tensorflow.dll!Eigen::internal::TensorExecutor<class Eigen::TensorAssignOp<class Eigen::TensorMap<class Eigen::Tensor<float,4,1,__int64>,16,struct Eigen::MakePointer>,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,4> const ,class Eigen::TensorContractionOp<class Eigen::array<struct Eigen::IndexPair<__int64>,1> const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorImagePatchOp<-1,-1,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const > const ,class Eigen::TensorReshapingOp<struct Eigen::DSizes<__int64,2> const ,class Eigen::TensorMap<class Eigen::Tensor<float const ,4,1,__int64>,16,struct Eigen::MakePointer> const > const ,struct tensorflow::LaunchFusedConv2DWithOutputKernel<float>::OutputKernelWrapper const > const > const > const ,struct Eigen::ThreadPoolDevice,1,0>::run(class Eigen::TensorAssignOp<class Eigen::TensorMap<class Eigen::Tensor<float,4,1,__int64>,16,struct Eigen::MakePointer>,class Eigen::Tensor	C++
tensorflow.dll!tensorflow::LaunchFusedConv2DWithOutputKernel<float>::operator()<struct tensorflow::BiasAddOutputKernel<float,struct tensorflow::LeakyRelu> >(struct tensorflow::BiasAddOutputKernel<float,struct tensorflow::LeakyRelu> const &,class tensorflow::OpKernelContext *,class tensorflow::Tensor const &,class tensorflow::Tensor const &,class tensorflow::Tensor *)	C++
tensorflow.dll!tensorflow::LaunchFusedConv2DOp<struct Eigen::ThreadPoolDevice,float>::operator()(class tensorflow::OpKernelContext *,bool,bool,class tensorflow::Tensor const &,class tensorflow::Tensor const &,enum tensorflow::FusedComputationType,struct tensorflow::FusedComputationArgs const &,struct tensorflow::Conv2DParameters const &,struct tensorflow::Conv2DDimensions const &,class tensorflow::Tensor *)	C++
tensorflow.dll!tensorflow::FusedConv2DOp<struct Eigen::ThreadPoolDevice,float>::Compute(class tensorflow::OpKernelContext *)	C++
tensorflow.dll!tensorflow::NewLocalExecutor(struct tensorflow::LocalExecutorParams const &,class tensorflow::Graph const &,class tensorflow::Executor * *)	C++
tensorflow.dll!tensorflow::NewLocalExecutor(struct tensorflow::LocalExecutorParams const &,class tensorflow::Graph const &,class tensorflow::Executor * *)	C++
tensorflow.dll!Eigen::ThreadPoolTempl<struct tensorflow::thread::EigenEnvironment>::WorkerLoop(int)	C++
tensorflow.dll!std::_Func_impl_no_alloc<class <lambda_fe7aa395b13fe170862dcdb4d85eb030>,void>::_Do_call(void)	C++
tensorflow.dll!std::thread::_Invoke<class std::tuple<class std::function<void > >,0>(void *)	C++
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>()	
KERNEL32.DLL!BaseThreadInitThunk�()	
ntdll.dll!RtlUserThreadStart�()	

The next callstack is from the calling method in the same dump file:

ntdll.dll!NtWaitForAlertByThreadId�()	
ntdll.dll!RtlSleepConditionVariableSRW()	
KERNELBASE.dll!SleepConditionVariableSRW�()	
MSVCP140.dll!__crtSleepConditionVariableSRW(_RTL_CONDITION_VARIABLE * pCond, _RTL_SRWLOCK * pLock, unsigned long dwMs, unsigned long flags) line 659	C++
[内联框架] MSVCP140.dll!Concurrency::details::stl_condition_variable_win7::wait_for(Concurrency::details::stl_critical_section_interface *) line 216	C++
MSVCP140.dll!Concurrency::details::stl_condition_variable_win7::wait(Concurrency::details::stl_critical_section_interface * lock) line 210	C++
MSVCP140.dll!do_wait(_Cnd_internal_imp_t * cond=0x00000253c8db76c8, _Mtx_internal_imp_t * mtx=0x00000253c8db7678, const xtime * target=0x0000000000000000) line 77	C++
tensorflow.dll!nsync::nsync_mu_semaphore_p_with_deadline(struct nsync::nsync_semaphore_s_ *,struct timespec)	C++
tensorflow.dll!nsync::nsync_sem_wait_with_cancel_(struct nsync::waiter *,struct timespec,struct nsync::nsync_note_s_ *)	C++
tensorflow.dll!nsync::nsync_cv_wait_with_deadline_generic(struct nsync::nsync_cv_s_ *,void *,void (*)(void *),void (*)(void *),struct timespec,struct nsync::nsync_note_s_ *)	C++
tensorflow.dll!nsync::nsync_cv_wait(struct nsync::nsync_cv_s_ *,struct nsync::nsync_mu_s_ *)	C++
tensorflow.dll!tensorflow::Executor::Run(struct tensorflow::Executor::Args const &)	C++
tensorflow.dll!tensorflow::DirectSession::RunInternal(__int64,class tensorflow::RunOptions const &,class tensorflow::CallFrameInterface *,struct tensorflow::DirectSession::ExecutorsAndKeys *,class tensorflow::RunMetadata *,struct tensorflow::thread::ThreadPoolOptions const &)	C++
tensorflow.dll!tensorflow::DirectSession::Run(class tensorflow::RunOptions const &,class std::vector<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class tensorflow::Tensor>,class std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class tensorflow::Tensor> > > const &,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,class std::vector<class tensorflow::Tensor,class std::allocator<class tensorflow::Tensor> > *,class tensorflow::RunMetadata *,struct tensorflow::thread::ThreadPoolOptions const &)	C++
tensorflow.dll!tensorflow::DirectSession::Run(class tensorflow::RunOptions const &,class std::vector<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class tensorflow::Tensor>,class std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class tensorflow::Tensor> > > const &,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,class std::vector<class tensorflow::Tensor,class std::allocator<class tensorflow::Tensor> > *,class tensorflow::RunMetadata *)	C++
tensorflow.dll!absl::lts_2020_02_25::StartsWith(class absl::lts_2020_02_25::string_view,class absl::lts_2020_02_25::string_view)	C++
tensorflow.dll!TF_SessionRun�()	C++
TextureProcessLibrary.dll!TFUtils::RunSession(TF_Session * sess=0x000002553c6d15e0, const TF_Output * inputs=0x000002553b7dd530, TF_Tensor * const * input_tensors=0x00000254c74348a0, unsigned __int64 ninputs=1, const TF_Output * outputs=0x0000025539d60e10, TF_Tensor * * output_tensors=0x000002553c6d1160, unsigned __int64 noutputs=4) line 301	C++
[内联框架] TextureProcessLibrary.dll!TFUtils::RunSession(TF_Session *) line 334	C++
TextureProcessLibrary.dll!TFUtils::RunSession(const std::vector<TF_Output,std::allocator<TF_Output> > & inputs, const std::vector<TF_Tensor *,std::allocator<TF_Tensor *> > & input_tensors={...}, const std::vector<TF_Output,std::allocator<TF_Output> > & outputs, std::vector<TF_Tensor *,std::allocator<TF_Tensor *> > & output_tensors={...}) line 145	C++
TextureProcessLibrary.dll!TexEdit::DeepTextureModelGenerator::generateMap(Base::Path modelPackagePath={...}, QString modelSubPath={...}, cv::Mat & inputMat, std::vector<cv::Mat,std::allocator<cv::Mat> > & outMaps={...}, const char * inputName=0x00007ffa747847b0) 行 543	C++
TextureProcessLibrary.dll!TexEdit::DeepTextureModelGenerator::generateMetallicTexturesFromS11() line 327	C++
TextureProcessLibrary.dll!TexEdit::DeepTextureModelGenerator::generate() line 209	C++
MaterialLibrary.dll!XScan::ScanTaskThread::generateDesPngs() line 504	C++
MaterialLibrary.dll!XScan::ScanTaskThread::generateTextures() line 367	C++
MaterialLibrary.dll!XScan::ScanTaskThread::startScan() line 271	C++
MaterialLibrary.dll!XScan::ScanTaskThread::run() line 38	C++
Qt5Core.dll!QThreadPrivate::start(void * arg=0x00000253c8d84880) line 405	C++
KERNEL32.DLL!BaseThreadInitThunk�()	
ntdll.dll!RtlUserThreadStart�()	

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:runtime c++ runtime, performance issues (cpu) stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.7 Issues related to TF 2.7.0 type:bug Bug
Projects
None yet
Development

No branches or pull requests

7 participants