Skip to content

[firtool] Statically-linked firtool crashes on large design #8595

Open
@tymcauley

Description

@tymcauley

While using firtool-1.119.0 on a large design (a large number of CPU cores that all dedup against each other), we are seeing that firtool crashes with an out-of-memory error. This design runs through firtool successfully when the number of CPUs is small, but ratcheting the number up results in repeatable crashes. The crash only occurs with the statically-linked firtool binary, not with one that links to GNU libc. The machine in question isn't running out of memory (it has about 128 GiB of available RAM, and the firtool process seems to max out at about 5 GiB). Additionally, if we pass the --mlir-disable-threading option to firtool, it doesn't crash, it works just fine. So I'd hypothesize that something related to thread/memory allocation is causing the problem.

OS: RHEL 8.10

Running the statically-linked firtool under GDB yields this backtrace after the crash:

[New LWP 3187095]
LLVM ERROR: out of memory
Allocation failed

Thread 17 "firtool" received signal SIGABRT, Aborted.
[Switching to LWP 3186614]
__restore_sigs (set=set@entry=0x7fffaf9d0c60) at /home/buildozer/aports/main/musl/src/musl-1.2.5/block.c:40
40      /home/buildozer/aports/main/musl/src/musl-1.2.5/block.c: No such file or directory.
(gdb) bt
#0  __restore_sigs (set=set@entry=0x7fffaf9d0c60) at /home/buildozer/aports/main/musl/src/musl-1.2.5/block.c:40
#1  0x0000000001496fae in raise (sig=sig@entry=6) at /home/buildozer/aports/main/musl/src/musl-1.2.5/raise.c:11
#2  0x0000000001493497 in abort () at /home/buildozer/aports/main/musl/src/musl-1.2.5/abort.c:11
#3  0x0000000000437d74 in llvm::report_bad_alloc_error(char const*, bool) ()
#4  0x0000000000437db2 in out_of_memory_new_handler() ()
#5  0x0000000001485901 in operator new(unsigned long, std::align_val_t) ()
#6  0x000000000146e79a in operator new(unsigned long, std::align_val_t, std::nothrow_t const&) ()
#7  0x0000000000442abd in llvm::allocate_buffer(unsigned long, unsigned long) ()
#8  0x00000000008793a6 in llvm::DenseMap<mlir::Operation*, unsigned int, llvm::DenseMapInfo<mlir::Operation*, void>, llvm::detail::DenseMapPair<mlir::Operation*, unsigned int> >::grow(unsigned int) ()
#9  0x00000000012acb8c in std::pair<llvm::DenseMapIterator<mlir::Operation*, unsigned int, llvm::DenseMapInfo<mlir::Operation*, void>, llvm::detail::DenseMapPair<mlir::Operation*, unsigned int>, false>, bool> llvm::DenseMapBase<llvm::DenseMap<mlir::Operation*, unsigned int, llvm::DenseMapInfo<mlir::Operation*, void>, llvm::detail::DenseMapPair<mlir::Operation*, unsigned int> >, mlir::Operation*, unsigned int, llvm::DenseMapInfo<mlir::Operation*, void>, llvm::detail::DenseMapPair<mlir::Operation*, unsigned int> >::try_emplace<unsigned int>(mlir::Operation*&&, unsigned int&&) ()
#10 0x00000000012ac91a in (anonymous namespace)::GreedyPatternRewriteDriver::addSingleOpToWorklist(mlir::Operation*) ()
#11 0x00000000012ac82c in (anonymous namespace)::GreedyPatternRewriteDriver::addToWorklist(mlir::Operation*) ()
#12 0x00000000012acf3a in mlir::WalkResult llvm::function_ref<mlir::WalkResult (mlir::Operation*)>::callback_fn<(anonymous namespace)::RegionPatternRewriteDriver::simplify(bool*) &&::$_1>(long, mlir::Operation*) ()
#13 0x000000000050bfc8 in mlir::WalkResult mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<mlir::WalkResult (mlir::Operation*)>, mlir::WalkOrder) ()
#14 0x00000000012ab1f2 in mlir::applyPatternsGreedily(mlir::Region&, mlir::FrozenRewritePatternSet const&, mlir::GreedyRewriteConfig, bool*) ()
#15 0x00000000009df8c8 in (anonymous namespace)::Canonicalizer::runOnOperation() ()
#16 0x00000000013fba00 in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) ()
#17 0x00000000013fc158 in mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) ()
#18 0x0000000001403bf8 in std::_Function_handler<void (), mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::parallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&&)::{lambda(auto:1&&)#1}>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#19 0x0000000000516a98 in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()> > >, void> >::_M_invoke(std::_Any_data const&) ()
#20 0x00000000005169fa in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) ()
#21 0x000000000149cc5b in __pthread_once_full (control=0x7fffb91cddd8, init=0x1470460 <__once_proxy>) at /home/buildozer/aports/main/musl/src/musl-1.2.5/pthread_once.c:22
#22 __pthread_once_full (control=0x7fffb91cddd8, init=0x1470460 <__once_proxy>) at /home/buildozer/aports/main/musl/src/musl-1.2.5/pthread_once.c:11
#23 0x000000000149ccf8 in __pthread_once (control=<optimized out>, init=<optimized out>) at /home/buildozer/aports/main/musl/src/musl-1.2.5/pthread_once.c:47
#24 0x0000000000516db4 in std::__future_base::_Deferred_state<std::thread::_Invoker<std::tuple<std::function<void ()> > >, void>::_M_complete_async() ()
#25 0x0000000000516e55 in std::_Function_handler<void (), llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#26 0x000000000145b692 in llvm::StdThreadPool::processTasks(llvm::ThreadPoolTaskGroup*) ()
#27 0x000000000145cd0b in void* llvm::thread::ThreadProxy<std::tuple<llvm::StdThreadPool::grow(int)::$_0> >(void*) ()
#28 0x000000000149bb2a in start (p=0x7fffaf9d17f8) at /home/buildozer/aports/main/musl/src/musl-1.2.5/pthread_create.c:207
#29 0x000000000149d2ae in __clone () at /home/buildozer/aports/main/musl/src/musl-1.2.5/clone.s:22
Backtrace stopped: frame did not save the PC

Is there any other information I can provide to help debug this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions