You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a server with a single network card that has virtualized over 200 network interfaces. This causes significant delays when using mpirun, as the process hangs for a long time. I used UCX debug and found that the delays are primarily occurring on the bridged network interface.
Is there a solution for this issue? Any recommendations on how to optimize or configure the network interfaces to improve the performance of mpirun? Thank you!
ucx log
[1742871925.548610] [pod-hpc-02:1702645:0] tcp_iface.c:945 UCX DEBUG filtered out bridge device virbr0
[1742872077.918760] [pod-hpc-02:1702645:0] tcp_iface.c:945 UCX DEBUG filtered out bridge device wlan
The text was updated successfully, but these errors were encountered:
I think I see the problem: uct_tcp_query_devices scans through all the interfaces, build a list of active and non-bridged interfaces and then trim it to the user requested devices. On a system with hundreds virtual interfaces, this is a very costly process as it involves many syscall for each interface.
This is not something we can fix in OMPI, it should be reported and addressed directly in UCX. @janjust@yosefe should be able to help.
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using?
v4.1.7rc1
Describe how Open MPI was installed
installed by MLNX_OFED
Please describe the system on which you are running
Details of the problem
I have a server with a single network card that has virtualized over 200 network interfaces. This causes significant delays when using mpirun, as the process hangs for a long time. I used UCX debug and found that the delays are primarily occurring on the bridged network interface.
Is there a solution for this issue? Any recommendations on how to optimize or configure the network interfaces to improve the performance of mpirun? Thank you!
ucx log
The text was updated successfully, but these errors were encountered: