MPI message loss issue in HPC Cluster #147

kihangyoun · 2020-04-02T05:45:13Z

Hi, All

I have some issues related loss of message passing in hpc cluster.
Here is my problem:
(I don't know the version of CMAQ that I used.)
In y_ppm & y_yamo subroutines, the code use SWAP2D & SWAP3D(in swap_sandia_routines.f).
But there are some loss of message passing with between procs.

I have check that the CMAQ currently being deployed does not have swap_sandia, nor does y_yamo. It would be nice to get a new version and test it, but I want to solve the problem in the code that I have.
Could you tell me if there was an issue related to it ?

Actually, I thought it was MPI problem and I asked intel and there are more details in the posting.
If you want, check the address below or I'll get the contents.

https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/851369#comment-1955679

kihangyoun · 2020-04-02T05:47:29Z

Q#1

I am writing to ask some questions related to CFD model result different by mpi_hosts order.
I would like to hear your opinion theoretically becuase the code is long and complex and it would be difficult to reproduce it through the sample codes.

The current situation is that when nodes in different infiniband switches perform parallel computations, case #1 works well, but case #2 doesn't work well.
"Doesn't work well" means that there is a difference in values.

Background: host01-host04 in IB switch#1 and host99 in IB switch#2.
Case#1: host01, host02, host03, host04, host99(i.e. header node is hosts01)
Case#2: host99, host01, host02, host03, host04(i.e. header node is hosts99)

As far as I can guess(It's a hypothetical scenario with no theoretical basis),

There are miss communication problems while the header node is on another switch.
Myranks are reversed while working on MPI_COMM_RANK several times.
There are some problems(broken or mismatch) in MPI_COMM_WORLD.
Synchronization excludes header nodes.

First of all, for debugging, I'm putting the print statement in several places to see which subroutine or function changes the value.
(I'll post more when the situation is updated.)
However, no matter what function I finally find, I am not sure it's a part of code-level resolution, so I post to the forum to hear a story about a similar experiences.

Additional#1

Additional information:

This program use only MPI library but OpenMP.
I have not tried the structures you recommend (IREQ, THREADPRIVATE, NOVECTOR).
Test results
As I said before, I tried two technique (ISEND and BARRIER) but it doesn't work.

ISEND: Even though it works, the same message loss occurs.
BARRIER: It's a little weird, I'm sure all the procs are going into the subroutine, and they're going into an infinite waiting(hang).
SBUF,RBUF(Jim): I reduce a deallocation as possible, but the same message loss occurs.

I think subroutine is not a problem
The reason I don't think it's a subroutine problem,

When the host is assigned within the same IB switch, the message has never been lost in 20 repeats.
ex)
host004(IB1): 16 17 18 19 20
host003(IB1): 11 12 13 14 15
host002(IB1): 06 07 08 09 10
host001(IB1): 01 02 03 04 05 : always fine
As the East-West communication was always conducted on the same node by adjusting the domain (NROW,NCOL), there was no problem.
(E-W communication uses same subroutine)
It has always been a communication between different IB switches that causes problems in South-North communications.
ex)
host037(IB2): 16 17 18 19 20 <- message lost occurs in S-N communication
host003(IB1): 11 12 13 14 15
host002(IB1): 06 07 08 09 10
host001(IB1): 01 02 03 04 05

Isn't there a similar reason for hang when using a barrier?
Aren't the one IB switch nodes (host001-host004) waiting for the other IB switch (host037) but host037 passing through the barrier?

Are there any of these mpi options that can be improved and modified?
I'm going to check if other MPI libaries(openmpi, mvapich, mpich) have the same error.

kmfoley · 2020-04-03T01:35:32Z

Thank you for your question and your interest in the CMAQ system. We ask that you please post your question to the CMAS Center Forum: https://forum.cmascenter.org/

We would like this question to be documented on the forum to help other users that may run into similar issues.

Please start a 'New Topic' with an informative title and choose 'CMAQ' as the category. This will ensure you are connected to the appropriate developer and user base. I will also pass your question on to the member of our team most familiar with these types of issues so that he can respond to your Forum post if he has any insight.

dwongepa · 2020-04-03T11:57:02Z

Hi kihangyoun,

Could you please contact me directly at wong.david-c@epa.gov? I would like to ask you a few more questions to determine the cause of the problem.

Cheers,
David

kmfoley closed this as completed Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI message loss issue in HPC Cluster #147

MPI message loss issue in HPC Cluster #147

kihangyoun commented Apr 2, 2020

kihangyoun commented Apr 2, 2020 •

edited

Loading

kmfoley commented Apr 3, 2020

dwongepa commented Apr 3, 2020

MPI message loss issue in HPC Cluster #147

MPI message loss issue in HPC Cluster #147

Comments

kihangyoun commented Apr 2, 2020

kihangyoun commented Apr 2, 2020 • edited Loading

kmfoley commented Apr 3, 2020

dwongepa commented Apr 3, 2020

kihangyoun commented Apr 2, 2020 •

edited

Loading