Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra::Distributor: Inheriting from VerboseObject<Distributor> causes crash in Import ctor w/ various Intel compiler versions #63

Closed
mhoemmen opened this issue Dec 24, 2015 · 1 comment
Assignees

Comments

@mhoemmen
Copy link
Contributor

@trilinos/tpetra @amklinv

With Intel 15.0.4 and Intel 16, using either Intel MPI or OpenMPI 1.8, the Tpetra::Import constructor crashes. Micah chased this down and found out that the Tpetra::Distributor instance that Import creates inherits from Teuchos::VerboseObjectTpetra::Distributor (looks weird, I know, but this is how one is supposed to use VerboseObject), and that VerboseObject's constructor was getting into a weird multiply segfaulting infinite loop when calling initializeVerboseObjectBase().

Some versions of the Intel compiler have trouble with name resolution in certain cases. On a hunch, I made Distributor no longer inherit from VerboseObject, and that fixed the problem. Yay!

Thus, the fix is to remove Distributor's inheritance from VerboseObject. This was pretty easy to do but I want to make sure that Distributor's ParameterList still accepts a "VerboseObject" sublist (see e.g., line 430 of Tpetra_Distributor.cpp).

@mhoemmen mhoemmen self-assigned this Dec 24, 2015
mhoemmen pushed a commit that referenced this issue Dec 24, 2015
@trilinos/tpetra A user reported odd crashes with Intel 15.0.4 + OpenMPI
1.8 and Intel 16 + Intel MPI.  The crashes showed up as an infinite loop
of SIGSEGV in the constructor of Teuchos::VerboseObject, as invoked by
Tpetra::Distributor's three-argument constructor, as invoked by
Tpetra::Import's usual two-argument (two Maps) constructor.  It looked
like the initializeVerboseObjectBase method was calling itself recursively,
though it wasn't actually doing that in the code.

I knew that the Intel compiler had some issues with name resolution in
nested scopes involving templated classes, and guessed that this could be
related.  This is because VerboseObject has a template parameter which
is supposed to be its child class, in an odd but apparently standard C++
design pattern.  Distributor thus inherits from VerboseObject<Distributor>.
If I find this confusing, surely the compiler could too, right?

Distributor really didn't need to inherit from VerboseObject.  This was
something I had added a few years ago, to help me debug MPI communication
issues.  When I removed the inheritance, that fixed the issue.  Thus,
Distributor no longer inherits from Teuchos::VerboseObject.  For backwards
compatibility, it retains the "VerboseObject" sublist in its list of
valid parameters (for setParameterList and the constructors that take a
ParameterList), but it now ignores that sublist.

This fixes Issue #63.
@mhoemmen
Copy link
Contributor Author

Github didn't pick up on my commit actually closing this issue, rather than just referencing it.

bartlettroscoe added a commit that referenced this issue May 26, 2021
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'

At commit:

commit 74f61da9a7a97742e941585166025e849c6dcfaf
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Wed May 26 16:56:22 2021 -0600
Summary: Add link back to function call graph (#63)
bartlettroscoe added a commit that referenced this issue Jul 15, 2022
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'
Git describe: Vera4.0-RC1-start-1202-g24463542

At commit:

commit e121d76729c0ebe67a6e09f9e7a6f2cb7c61b5ae
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Thu Jul 14 19:46:24 2022 -0600
Summary: Add entry for <TPLNAME>_LIB_ENABLED_DEPENDENCIES to build ref (#63, #299, #494)
bartlettroscoe added a commit that referenced this issue Aug 11, 2022
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'
Git describe: Vera4.0-RC1-start-1241-gd807b172

At commit:

commit b00ab335494761e6ac48d50971444daa5a502927
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Thu Aug 11 07:35:38 2022 -0600
Summary: Fix test dependencies for subpackages and subpackage tests/examples enabless (#63, #268, #299, #510)
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Feb 9, 2023
When ParMEITS is found by an upstream cmake project (e.g. KokkosKerenls), the
value of HAVE_PARMETIS_VERSION_4_0_3l does not get written into the
ParMETISConfig.cmake file so Zoltan2 was configuring with an error.  But this
is a check for Zoltan2, not other Trilinos packages anyway so this check
should be in Zoltan2, not in the FindTPLParMETIS.cmake

NOTE: If we want to support exporting various variables into the generated
<tplName>Config.cmake files for TriBITS TPLs generated using
tribits_tpl_find_include_dirs_and_libraries(), then we will need to do some
refactoring in TriBITS to make that possible. But that should not be too hard.
This was just not needed in this case.
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Mar 29, 2023
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'
Git describe: Vera4.0-RC1-start-1488-gd9a337d8

At commit:

commit 014a1538939b783602d2af6033b2175edfb51a96
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Wed Mar 29 10:41:41 2023 -0600
Summary: Remove unused legacy RELATIVE_PATH code (trilinos#63, trilinos#560)
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Apr 1, 2023
…s:develop' (28a7b37).

* trilinos-develop:
  Have Kokkos TriBITS build set compiler options as target properties (trilinos#11545)
  Update logic for TPL_ENABLE_Kokkos=ON (trilinos#11545)
  TrilinosInstallTests_find_package_Trilinos: Run in own subdir
  Move check for ParMETS version for Zoltan2 to Zoltan2 (trilinos#63)
  Have Kokkos TriBITS build properly export options to package config files (trilinos#11545)
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Apr 1, 2023
…s:develop' (28a7b37).

* trilinos-develop:
  Have Kokkos TriBITS build set compiler options as target properties (trilinos#11545)
  Update logic for TPL_ENABLE_Kokkos=ON (trilinos#11545)
  TrilinosInstallTests_find_package_Trilinos: Run in own subdir
  Move check for ParMETS version for Zoltan2 to Zoltan2 (trilinos#63)
  Have Kokkos TriBITS build properly export options to package config files (trilinos#11545)
bartlettroscoe added a commit that referenced this issue May 13, 2023
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'
Git describe: vera-release-3.5-start-1789-gb669da78

At commit:

commit 9e6f4b999cfc38fcc40475491dc962ae514c08f2
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Thu May 11 13:36:23 2023 -0600
Summary: Fix some old misspellings caught by codespell tests (#63)
jwillenbring referenced this issue in jwillenbring/Trilinos Jun 12, 2023
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'
Git describe: Vera4.0-RC1-start-1488-gd9a337d8

At commit:

commit 014a1538939b783602d2af6033b2175edfb51a96
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Wed Mar 29 10:41:41 2023 -0600
Summary: Remove unused legacy RELATIVE_PATH code (#63, trilinos#560)
jwillenbring referenced this issue in jwillenbring/Trilinos Jun 12, 2023
When ParMEITS is found by an upstream cmake project (e.g. KokkosKerenls), the
value of HAVE_PARMETIS_VERSION_4_0_3l does not get written into the
ParMETISConfig.cmake file so Zoltan2 was configuring with an error.  But this
is a check for Zoltan2, not other Trilinos packages anyway so this check
should be in Zoltan2, not in the FindTPLParMETIS.cmake

NOTE: If we want to support exporting various variables into the generated
<tplName>Config.cmake files for TriBITS TPLs generated using
tribits_tpl_find_include_dirs_and_libraries(), then we will need to do some
refactoring in TriBITS to make that possible. But that should not be too hard.
This was just not needed in this case.
bartlettroscoe added a commit that referenced this issue Jun 27, 2023
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'
Git describe: vera-release-3.5-start-1802-g6d9eaef0

At commit:

commit 24e96fe74bfca6c42b8aedb96468ee4ed22a8062
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Tue Jun 27 08:14:20 2023 -0600
Summary: Print out <Package>_DIR and error out if not set (#63)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant