-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This is alarming...Engine and mdserver linked with DB plugin libs! #19565
Comments
If I disable mfem, I get the same problem with conduit. The engine includes and links with conduit so any dependencies for conduit get linked into the engine. |
@iulian787 and @vijaysm if I disable conduit, mfem and fsm in my build, I can get the engine to run and use MOAB in parallel correctly... |
That is excellent news! Thanks for checking this thoroughly @markcmiller86 When you say MOAB parallel engine is working correctly, I assume this means there are no HDF5 property table errors that we were seeing before? I am still puzzled as to why the Ubuntu 22 version of |
Yes, correct.
I am too but I believe linux is more lenient about shared lib dependencies and so think the engine is still loading the serial hdf5 library. An strace would confirm that. |
Was chatting with @brugger1 about this. One idea he has...build But, it would mean we continue to link the engine (and mdsever) against If VisIt was not on such an ancient version of So, we would need to upgrade That said, I still think it would be best to get |
@markcmiller86 conduit and mfem were added as dependencies when avt/Blueprint and avt/MFEM were added. |
Yes, that is right. So, it may mean we have to build those libs two different ways...one way for VisIt components and another way for database plugins...only the latter of which can depend on things like hdf5, netcdf, etc. |
@markcmiller86 Both Conduit and MFEM are used outside of the database plugins. While Conduit's I/O that uses HDF5 does not need to link HDF5, I think that MFEM actually links Conduit relay, which does link HDF5. In general unless we have a fully name mangled serial hdf5 and mpi hdf5, I think we have an issue regardless if it's DB only vs in the engine. Yes we can disable plugins, but that approach won't work for an install that can be widely used |
Sure...but I guess my question is...what do conduit and MFEM need to do in the way of I/O with HDF5 in the engine and mdserver? I don't think the engine or mdserver need to do any I/O with or without HDF5 and so the question remains...why do business this way? It can't be for the convenience of a 3rd party lib dependency that isn't designed to build without HDF5? As an aside, as things are designed now, no one building a custom HDF5 plugin would be able to build against an hdf5 version other than 1.8.XX (1.8.14 is over 10 years old now). So, someone wishing to use newer features in HDF5 would simply not be able to use a newer HDF5. That paricular issue is solved, of course, by updating to newer HDF5 in VisIt. But, that only minimizes that particular issue. It doesn't fix it. |
The answer is simple: MFEM does not have multiple libraries that partition features based on dependencies. Those features are either on or off for a build of mfem. Conduit has multiple libraries - (relay is the one with all the i/o deps), but if MFEM is using Conduit, it will also link those I/O libs. |
We are going to explore building MPI enabled HDF5 will work for all cases (engine_ser and engine_par) |
Ok, I tried a simple test with my build of VisIt 3.4.1. on macOS where I have disabled MFEM and conduit. But, I do have things like Silo (which is using HDF5 in serial) and MOAB (using HDF5 in serial in a serial engine and in parallel in a parallel engine).
So, this works. And, that is because we DO NOT load plugin shared libraries using Below, I use macOS
|
@iulian787 and @vijaysm one option we're considering here is to do away with serial/parallel builds of HDF5. We would build only parallel HDF5 and everything in VisIt that depended on HDF5 would be linked to that one, single parallel HDF5. The "serial" tools would be have to be linked with What would you think of this? |
@cyrush if we can do that with HDF5, why can't we do that with all of VisIt and get away from building all of VisIt with |
Excellent! This was my suggestion long time back. I do this all the time in my workflows using MPI wrappers to build every library in my systems, whether it is serial code or MPI aware one. Then we would just build MOAB+HDF5 without worrying about serial builds, with a guarantee that only MPI aware HDF5 will ever be loaded by Visit. Would also simplify builds in general and reduce distribution size :-) |
Hi Mark, This is convenient for many filters that share logic and then add extra communication for the MPI case. MPI support is controlled by compiler defines, which yield the serial and parallel libs. It's possible to do, but would be a major change. |
@cyrush...right...we simply adjust those |
Summarizing... Ok, so inputs from @cyrush, @qkoziol, and @vijaysm all suggest the right way to proceed is to do away with building dependencies in different ways (e.g. with and without MPI) and just know that running a serial VisIt will never reference any MPI enabled code blocks in VisIt itself or any dependencies. It means a serial VisIt engine is still linked with Above, @cyrush mentioned another issue I hadn't really appreciated before digging into this in detail. In VisIt, we have a lot of code blocks of the form...
This really does mean you have to compile two different ways to get two different behaviors. To do the same thing in VisIt proper with various We're kinda forced into this situation because libraries we want to use in VisIt proper such as MFEM have an indirect dependency on HDF5. So, the engine is going to wind up getting linked with an HDF5 regardless. But, because VisIt's HDF5 is ancient (1.8.14), we really must upgrade that asap to latest HDF5 on Here is the work to complete for this ticket then...
|
I suggest mov8ng all the way up to HDF5 1.14.4 🙂 |
@cyrush, @qkoziol, and @vijaysm, one issue I am encountering is all the libraries VisIt builds which depend on HDF5 (conduit, mfem, cgns, netcdf, xdmf, silo, h5part). They themselves may not use MPI. But, when I compile them against a parallel-enabled HDF5, I get failures for @cyrush ... in particular, will this new approach defeat our ability to ever build VisIt static-only? |
I think this should get naturally resolved when you pass the MPI wrappers
as the default compiler. For example through `export CXX=mpicxx`. We follow
this workflow when we automatically download and build TPLs for MOAB in
many systems.
…On Wed, Aug 7, 2024, 02:29 Mark C. Miller ***@***.***> wrote:
@cyrush <https://github.com/cyrush>, @qkoziol <https://github.com/qkoziol>,
and @vijaysm <https://github.com/vijaysm>, one issue I am encountering is
all the libraries VisIt builds which depend on HDF5 (conduit, mfem, cgns,
netcdf, xdmf, silo, h5part). They themselves may not use MPI. But, when I
compile them against a parallel-enabled HDF5, I get failures for #include
<mpi.h> from #include <hdf5.h>. Maybe this means that now, those libs
will require -I<path-to-mpi-headers> when then are being compiled AND
using hdf5. I guess they will also need a -L<path-to-mpi-library> -lmpi
too anyways in order to link a dynamic library.
@cyrush <https://github.com/cyrush> ... in particular, will this new
approach defeat our ability to ever build VisIt static-only?
—
Reply to this email directly, view it on GitHub
<#19565 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACVEPLUPY55ISYOBSIKA7TZQG5FXAVCNFSM6AAAAABIJCEPOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZSG4YTONJUHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@vijaysm 's suggestion will address the immediate concern, but I think we can also do better than that. @markcmiller86 @cyrush - When building a serial application (or library) against a parallel-enabled build of HDF5, would a compile-time macro (i.e. a Thoughts? If that sounds link it would be a solution (it should be easy to hack the |
@vijaysm and @qkoziol thanks for your inputs on this 💪. @qkoziol I like some of the ideas your are floating here. But, MPI stub functions scare me just a tad due to the possibility of confusion/collision with the real mpi. I was burned by this kind of thing years ago in another project and never forgot it...maybe because of the number of days I spent failing to understand what was happening. How realistice is it for HDF5 library to be structured such that a single installation of HDF5 that supports both serial and parallel involves linking to one lib for serial applications and two libs for parallel applications? Likewise, maybe there is one header for serial applications and two header files to be included for parallel applications? Your But, the library linking issue is still a concern and while I see why HDF5 developers might wanna go the MPI stub route, I just wanna ask about the refactor MPI stuff to a separate lib route. |
@qkoziol I think for HDF5 at least, we haven't had an issue where we were required use MPI includes due to HDF5 headers. @markcmiller86 I think conduit should be ok, can you share if my assumption is wrong? It does look like Usually, the problem is a linking issue -- not a compile header interface issue. |
@cyrush I can answer some of the details of what is happening with Most of the libs that depend on only sereial HDF5 are compiled using Only those libs with parallel variants (e.g. conduit, hdf5, adios) are compiled with So, when were using a parallel-enabled HDF5, we build Silo with gcc, not mpicc and the |
I would much prefer to live in a world where I am not required to specify a path to a header file (or for that matter a library file) my package does not itself use 🤣. @cyrush another detail here that may be relevant, I think a lot of the packages in play here use Autotools, not CMake. As part of the update to HDF5-1.14.4, I have adjusted |
thanks @markcmiller86 , I think I have tried this case outside of build_visit (w/ spack). I can try a simpler case and see if there is an issue. I just don't recall getting tripped up. You can't avoid the link dependency, but with the right setup (defines and generated headers) I am confident that libraries can avoid the implicit header problem, that is to avoid it being an issue in downstream consumers. |
Go to build dir for engine and do a
make clean; make VERBOSE=1 >& junk.out
and then grepjunk.out
forhdf5
. You will get hits. But, you should NOT get hits for hdf5. hdf5 is used only in a database plugin lib. If I look at a link of thelibengine_ser.dylib
, I get all the items listed below.We are also getting hits for conduit, condiut_relay and blueprint. See below. Again, those are only used in DB plugins and should not be being linked into the engine.
I believe the reason this is happening is the use of MFEM in the engine. That is fine. But, we cannot use an MFEM in the engine that depends on I/O libs needed only in the plugins. We need to build MFEM differently for use in the engine.
Taggiging @iulian787 and @vijaysm because this is impacting MOAB plugin which uses HDF5 in either serial or parallel and the fact that engine is loading hdf5 serial prevents parallel MOAB plugin from operating correctly.
The text was updated successfully, but these errors were encountered: