You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dask is a flexible library for parallel computing in Python. It is growing its popularity among Python ecosystems. Because libyt does the in-situ analysis by running Python script, it is important to support this feature as well.
Current libyt structure
Each MPI rank initializes a Python interpreter, and they work together through mpi4py.
MPI 0 ~ (N-1)
Python
libyt Python Module
libyt C/C++ library
Simulation
How should dask be set up inside embedded Python?
We can make two additional ranks specifically for scheduler and client (not necessarily to be MPI 0 and 1), and the rest of MPI nodes for workers. Each simulation also runs inside workers. By following how dask-mpiinitialize() initializes scheduler, client, and workers, it is possible to wrap this inside libyt.
MPI 0
MPI 1
MPI 2
...
MPI (N-1)
Scheduler
Client
Worker
Worker
Worker
libyt Python Module
libyt Python Module
libyt Python Module
libyt Python Module
libyt Python Module
libyt C/C++ library
libyt C/C++ library
libyt C/C++ library
libyt C/C++ library
libyt C/C++ library
Empty
Empty
Simulation
Simulation
Simulation
Solve data exchange problem
Because we use Remote Memory Access (one-sided MPI) with some settings that required every rank to participate in the procedure (#26). libyt suffers from data exchange process between MPI nodes. Every time yt reads data, all ranks should wait for each other and synchronize.
However, if we move this data exchange process from C/C++ to Python side, then it is possible to exchange data with more flexibility using dask and exchange data in a asynchronous way. By encoding what MPI ranks should get into a Dask graph, asking worker to prepare local grid data, and exchanging data between workers, it will be much easier. (At least much easier than using C/C++. 😅 )
The text was updated successfully, but these errors were encountered:
Support Dask
Dask is a flexible library for parallel computing in Python. It is growing its popularity among Python ecosystems. Because
libyt
does the in-situ analysis by running Python script, it is important to support this feature as well.Current
libyt
structureEach MPI rank initializes a Python interpreter, and they work together through
mpi4py
.How should
dask
be set up inside embedded Python?We can make two additional ranks specifically for scheduler and client (not necessarily to be MPI 0 and 1), and the rest of MPI nodes for workers. Each simulation also runs inside workers. By following how
dask-mpi
initialize()
initializes scheduler, client, and workers, it is possible to wrap this insidelibyt
.Solve data exchange problem
Because we use Remote Memory Access (one-sided MPI) with some settings that required every rank to participate in the procedure (#26).
libyt
suffers from data exchange process between MPI nodes. Every timeyt
reads data, all ranks should wait for each other and synchronize.However, if we move this data exchange process from C/C++ to Python side, then it is possible to exchange data with more flexibility using
dask
and exchange data in a asynchronous way. By encoding what MPI ranks should get into a Dask graph, asking worker to prepare local grid data, and exchanging data between workers, it will be much easier.(At least much easier than using C/C++. 😅 )
The text was updated successfully, but these errors were encountered: