Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exiting due to signal 11 with siginfo 0x... and payload 0x... #265

Open
jeffhammond opened this issue Jan 6, 2016 · 2 comments
Open

Exiting due to signal 11 with siginfo 0x... and payload 0x... #265

jeffhammond opened this issue Jan 6, 2016 · 2 comments

Comments

@jeffhammond
Copy link

Is this to be expected? This is a dual-socket Intel Xeon 2699v3 (Haswell) workstation, if it matters.

$ uname -a
Linux esgmonster 2.6.32-573.12.1.el6.centos.plus.x86_64 #1 SMP Wed Dec 16 16:48:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ mpicxx -show
g++ -I/opt/intel/compilers_and_libraries_2016.0.109/linux/mpi/intel64/include 
-L/opt/intel/compilers_and_libraries_2016.0.109/linux/mpi/intel64/lib/release_mt 
-L/opt/intel/compilers_and_libraries_2016.0.109/linux/mpi/intel64/lib -Xlinker 
--enable-new-dtags 
-Xlinker -rpath -Xlinker /opt/intel/compilers_and_libraries_2016.0.109/linux/mpi/intel64/lib/release_mt 
-Xlinker -rpath -Xlinker /opt/intel/compilers_and_libraries_2016.0.109/linux/mpi/intel64/lib 
-Xlinker -rpath -Xlinker /opt/intel/mpi-rt/5.1/intel64/lib/release_mt 
-Xlinker -rpath -Xlinker /opt/intel/mpi-rt/5.1/intel64/lib 
-lmpicxx -lmpifort -lmpi -lmpigi -ldl -lrt -lpthread
rm -rf *
MPI_ROOT=/opt/intel/compilers_and_libraries_2016.0.109/linux/mpi/intel64
cmake .. -DGRAPPA_INSTALL_PREFIX=/opt/grappa/$COMPILER \
                     -DCMAKE_C_COMPILER="$MPI_ROOT/bin/mpicc" \
                     -DCMAKE_CXX_COMPILER="$MPI_ROOT/bin/mpicxx" \
                     -DMPI_C_COMPILER="$MPI_ROOT/bin/mpicc" \
                     -DMPI_CXX_COMPILER="$MPI_ROOT/bin/mpicxx"
[jrhammon@esgmonster github-official]$ mpirun GRAPPA/Synch_p2p/p2p 10 $((32*36)) 32
. . . 
Parallel Research Kernels version 2.16
Grappa pipeline execution on 2D grid
Number of processes            = 36
Grid sizes                     = 1152x32
Number of iterations           = 10
Solution validates
Rate (MFlops/s): 18.1586  Avg time (s): 0.00392993
Exiting due to signal 11 with siginfo 0x400340f066f0 and payload 0x400340f065c0
[jrhammon@esgmonster github-official]$ mpirun GRAPPA/Stencil/stencil 10 $((32*36))
. . .
Parallel Research Kernels version 2.16
Grappa stencil execution on 2D grid
Number of cores        = 36
Grid size              = 1152
Radius of stencil      = 2
Tiles in x/y-direction = 6/6
Type of stencil        = star
Data type              = double precision
Compact representation of stencil loop body
Number of iterations   = 10
Solution validates
Rate (MFlops/s): 10048.7  Avg time (s): 0.00249188
Exiting due to signal 11 with siginfo 0x400340e406f0 and payload 0x400340e405c0
[jrhammon@esgmonster github-official]$ mpirun GRAPPA/Transpose/transpose 10 $((32*36)) 32
. . .
Parallel Research Kernels version 2.16
Grappa matrix transpose: B = A^T
Number of cores         = 36
Matrix order            = 1152
Number of iterations    = 10
Tile size               = 32
Implementation DEPRECATED: result accumulation not yet implemented
Solution validates
Rate (MB/s): -8210.7 Avg time (s): 0.0025861
Exiting due to signal 11 with siginfo 0x40034116e6b0 and payload 0x40034116e580
@nelsonje
Copy link
Member

nelsonje commented Jan 6, 2016

(summarizing off-github discussion)

This shows up on some systems, and comes from the way we clean up after the Boost shared memory gadget we use. I'm working on replacing it.

@caiwanli
Copy link

caiwanli commented Sep 26, 2023

I have the same problem, so how can I solve it?
[root@glusterfs-02 demos]# mpirun --allow-run-as-root -n 2 ./hello_world.exe

WARNING: No preset parameters were found for the device that Open MPI
detected:

Local host: glusterfs-02
Device name: i40iw0
Device vendor ID: 0x8086
Device vendor part ID: 14291

Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.

Exiting due to signal 11 with siginfo 0x7ffc14f5ddf0 and payload 0x7ffc14f5dcc0

Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[54189,1],0]
Exit code: 1

[glusterfs-02:26194] 5 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[glusterfs-02:26194] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[root@glusterfs-02 demos]# uname -a
Linux glusterfs-02 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants