Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address failures from merge of updated TriBITS (#10614, TriBITSPub/TriBITS#299) #10774

Closed
14 tasks done
bartlettroscoe opened this issue Jul 19, 2022 · 31 comments
Closed
14 tasks done
Assignees
Labels
ATDM Config Issues that are specific to the ATDM configuration settings type: bug The primary issue is a bug in Trilinos code or tests

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Jul 19, 2022

Description

With the merge of the updated TriBITS yesterday with the merge of PR #10614, we seem to be seeing a lot of ATDM Trilinos configurations with new build failures for ATDM Trilinos testing day 2022-07-19 after the merge of PR #10614 compared to the previous ATDM Trilinos testing day 2022-07-18 before the merge of PR #10614.

In particular, we see new build failures for 20 builds:

  • Trilinos-atdm-ats1-hsw_intel-19.0.4_mpich-7.7.15_openmp_static_dbg
  • Trilinos-atdm-ats1-hsw_intel-19.0.4_mpich-7.7.15_openmp_static_opt
  • Trilinos-atdm-ats1-knl_intel-19.0.4_mpich-7.7.15_openmp_static_dbg
  • Trilinos-atdm-ats1-knl_intel-19.0.4_mpich-7.7.15_openmp_static_opt
  • Trilinos-atdm-ats2-gnu-7.3.1-spmpi-rolling_serial_static_dbg
  • Trilinos-atdm-ats2-gnu-7.3.1-spmpi-rolling_serial_static_opt
  • Trilinos-atdm-cee-rhel7_cuda-10.1.243_gnu-7.2.0_openmpi-4.0.3_shared_dbg
  • Trilinos-atdm-cee-rhel7_cuda-10.1.243_gnu-7.2.0_openmpi-4.0.3_shared_opt
  • Trilinos-atdm-cee-rhel7_intel-19.0.3_intelmpi-2018.4_serial_static_opt
  • Trilinos-atdm-cee-rhel7_intel-19.0.3_mpich2-3.2_openmp_static_opt
  • Trilinos-atdm-cee-rhel7_mini-no-mpi_intel-19.0.3_static_opt
  • Trilinos-atdm-cee-rhel7_mini_intel-19.0.3_mpich2-3.2_static_opt
  • Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_dbg
  • Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_opt
  • Trilinos-atdm-cts1empire-intel-18.0.2_openmpi-4.0.1_openmp_static_dbg
  • Trilinos-atdm-cts1empire-intel-18.0.2_openmpi-4.0.1_openmp_static_opt
  • Trilinos-atdm-sems-rhel7-cuda-10.1-Volta70-complex-shared-release-debug

All of the build errors I have looked at so far in the above list are link errors due to missing symbols. My guess is that these are due to missing TPL dependencies.

I will have to do reference builds for each of these and then builds with updated TriBITS (merged from the branch in PR #10614 carefully) and compare the link lines carefully.

NOTE: The build errors for the builds:

  • Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_dbg
  • Trilinos-atdm-van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt

were due to a a compiler license check like:

clang-9: error: Failed to check out a license. See below for more details. clang-9: note: If you need further help, provide this complete error report to your supplier or support-hpc-sw@arm.com.  - Product information location: /opt/arm/arm-linux-compiler-20.1_Generic-AArch64_RHEL-7_aarch64-linux/sw-mappings  - Toolchain location: /opt/arm/arm-linux-compiler-20.1_Generic-AArch64_RHEL-7_aarch64-linux/llvm-bin  clang-9: note:  - Checkout feature: compiler  - Feature version: 15.20200409  - ALMS error code: -114  - ALMS error message: Timed out while contacting server

NOTE: the build errors for the builds:

  • Trilinos-atdm-sems-rhel7-clang-7.0.1-openmp-shared-release-debug
  • Trilinos-atdm-sems-rhel7-clang-7.0.1-openmp-shared-release

were due to unrelated problems with the Krino package (see #10524).

NOTE: The build errors for the build:

  • Trilinos-atdm-tlcc2-intel-opt-openmp

are source compile errors that seem to have nothing to do with the TriBITS changes (see below).

Tasks

NOTE: Above, a task is marked as complete if the PR that fixes it is ready to merge and is just being held up by a final approval or PR testing.

Deferred Scope

  • Look into link errors in SEACAS due to setting HDF5_ALLOW_PACKAGE_PREFIND:BOOL=TRUE with current FindTPLHDF5.cmake module (see below) ... Going to wait to upgrade the min version of CMake to 3.22 to use modern targets from find_package(HDF5)
@bartlettroscoe bartlettroscoe self-assigned this Jul 19, 2022
@bartlettroscoe bartlettroscoe added type: bug The primary issue is a bug in Trilinos code or tests ATDM Config Issues that are specific to the ATDM configuration settings labels Jul 19, 2022
@ikalash
Copy link
Contributor

ikalash commented Jul 19, 2022

Just to document, I saw issues when building CISM-Albany (a land-ice modeling code) on top of Trilinos and Albany. The nature of the problem was that Trilinos_LIBRARY_DIRS, previously populated with the right path, has become blank. I fixed it by setting this in the CMakeLists.txt file of CISM-Albany (see attachment). I think this fix is OK for our needs, but please let me know if there is a better one.

Error_Fix_CALI.pdf

@bartlettroscoe
Copy link
Member Author

but please let me know if there is a better one.

@ikalash, what is missing is linking to actual targets. I bet if you use CMake 3.23+ and set:

-D CMAKE_LINK_LIBRARIES_ONLY_TARGETS=ON

then the configure will fail and it will show you were it is trying to link against raw string names <libname> instead of a defined target. The way to fix that is to link against actual targets like <Trilinos>::all_selected_libs.

See:

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Jul 20, 2022

There is a bug in the TriBITS-generated <tplName>Config.cmake files with TPL dependencies that get installed that are breaking the SPARC Trilinos Integration builds. The problem is that it is using <tplName>_DIR from the build tree instead of from the install tree. I will create a TriBITS GitHub issue for this and discuss how to address this (see TriBITSPub/TriBITS#500).

NOTE: My local testing of SPARC + Trilinos would not have caught this because the Trilinos build dir does not get removed before configuring SPARC. So this was only going to be caught after the merge to 'develop'.

Details for find_package(Trilinos) errors due to pointing into the build dir

Note that we are also see SPARC Trilinos integration failures starting SPARC testing day 2022-07-19. These are configure-time errors inside of find_package(Trilinos) showing errors like:

REMARK: Trilinos directory
'/projects/atdm_devops/trilinos_installs/2022-07-19/cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_serial_static_opt'
does not exist; using legacy-style directory:
'/projects/atdm_devops/trilinos_installs/2022-07-19/cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt'
-- Searching for Trilinos in Trilinos_ROOT: /projects/atdm_devops/trilinos_installs/2022-07-19/cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt
-- Enabled Kokkos devices: SERIAL
CMake Error at /projects/sparc/tools/cmake/3.23.2/x86_64/share/cmake-3.23/Modules/CMakeFindDependencyMacro.cmake:47 (find_package):
  Could not find a package configuration file provided by "METIS" with any of
  the following names:

    METISConfig.cmake
    metis-config.cmake

  Add the installation prefix of "METIS" to CMAKE_PREFIX_PATH or set
  "METIS_DIR" to a directory containing one of the above files.  If "METIS"
  provides a separate development package or SDK, be sure it has been
  installed.
Call Stack (most recent call first):
  /projects/atdm_devops/trilinos_installs/2022-07-19/cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt/lib/external_packages/ParMETIS/ParMETISConfig.cmake:27 (find_dependency)
  /projects/atdm_devops/trilinos_installs/2022-07-19/cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt/lib/cmake/Zoltan/ZoltanConfig.cmake:151 (include)
  /projects/atdm_devops/trilinos_installs/2022-07-19/cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt/lib/cmake/SEACASIoss/SEACASIossConfig.cmake:158 (include)
  /projects/atdm_devops/trilinos_installs/2022-07-19/cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt/lib/cmake/SEACAS/SEACASConfig.cmake:188 (include)
  /projects/atdm_devops/trilinos_installs/2022-07-19/cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt/lib/cmake/Trilinos/TrilinosConfig.cmake:114 (include)
  CMakeLists.txt:246 (find_package)
  CMakeLists.txt:323 (find_package_Trilinos)

That code that is failing in ParMETISConfig.cmake is:

set(METIS_DIR "/scratch/trilinos/atdm-trilinos-nightly-builds/Trilinos-atdm-cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt/SRC_AND_BUILD/BUILD/external_packages/METIS")
find_dependency(METIS REQUIRED CONFIG ${ParMETIS_SearchNoOtherPathsArgs})
unset(METIS_DIR)

Hum, that is the wrong directory location:

$ ls /scratch/trilinos/atdm-trilinos-nightly-builds/Trilinos-atdm-cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt/SRC_AND_BUILD/BUILD/external_packages/METIS
ls: cannot access /scratch/trilinos/atdm-trilinos-nightly-builds/Trilinos-atdm-cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt/SRC_AND_BUILD/BUILD/external_packages/METIS: No such file or directory

Oops, that is a bug! How did that slip through TriBITS testing?

@bartlettroscoe
Copy link
Member Author

The top builds to prioritize fixing are the ones used in SPARC Trilinos Integration testing listed here and are:

  • ats1-hsw_intel-19.0.4_mpich-7.7.15_openmp_static_opt
  • ats1-knl_intel-19.0.4_mpich-7.7.15_openmp_static_opt
  • ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_opt
  • cee-rhel7_clang-9.0.1_openmpi-4.0.3_serial_static_opt
  • cee-rhel7_gnu-7.2.0_openmpi-4.0.3_serial_shared_opt
  • cee-rhel7_intel-19.0.3_intelmpi-2018.4_serial_static_opt
  • cee-rhel7_intel-19.0.3_mpich2-3.2_openmp_static_opt
  • cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_opt
  • van1-tx2_arm-20.1_openmpi-4.0.5_openmp_static_opt

@bartlettroscoe
Copy link
Member Author

FYI: I created the branch 10774-pre-tribits-update-ref off of the commit bdcac66 which corresponds to the version of Trilinos 'develop' for the 'atdm-nighlty' branch for testing day 2022-07-18 just before the merge of PR #10614. Details are below.

Creating the branch '10774-pre-tribits-update-ref' for ATDM Trilinos testing day 2022-07-18 details: (click to expand)

The version of that ATDM Trilinos builds for the testing day 2022-07-18 just before PR #10614 was synced was f15068c. That comes from:

$ git log-short --name-status --graph f15068c6c536560f3f30e62d5dd84c025edeb601

*   f15068c6c53 "Merge remote-tracking branch 'origin/develop' into atdm-nightly"
|\  Author: trilinos-autotester <trilinos@sandia.gov>
| | Date:   Sun Jul 17 21:05:07 2022 -0600 (2 days ago)
| |   
| *   bdcac66d1be "Merge Pull Request #10736 from iyamazaki/Trilinos/FastILU-metis"
| |\  Author: trilinos-autotester <trilinos@sandia.gov>
| | | Date:   Sun Jul 17 07:19:08 2022 -0600 (2 days ago)
| | | 
| | * d087ab4891f "FastILU : sort only not Metis, and sorted Tpetra CrsMatrix"
| | | Author: iyamazaki <ic.yamazaki@gmail.com>
| | | Date:   Wed Jul 13 21:59:22 2022 -0600 (6 days ago)
| | | 
| | | M packages/ifpack2/src/Ifpack2_Details_Filu_def.hpp

So the Trilinos version of 'develop' for the ATDM Trilinos testing day

So I create the Trilinos branch '10774-pre-tribits-update-ref' off of 'develop' at the commit bdcac66:

$ git checkout -b 10774-pre-tribits-update-ref bdcac66d1be
Switched to a new branch '10774-pre-tribits-update-ref'

$ git push -u rab-github HEAD
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
remote: 
remote: Create a pull request for '10774-pre-tribits-update-ref' on GitHub by visiting:
remote:      https://github.com/bartlettroscoe/Trilinos/pull/new/10774-pre-tribits-update-ref
remote: 
To github.com:bartlettroscoe/Trilinos
 * [new branch]              HEAD -> 10774-pre-tribits-update-ref
Branch '10774-pre-tribits-update-ref' set up to track remote branch '10774-pre-tribits-update-ref' from 'rab-github'.

@bartlettroscoe
Copy link
Member Author

FYI: The fix for the bug with installed <tplName>Config.cmake files pointing into the build tree is in PR #10784. Just need someone to approve that PR and then PR builds need to pass. In the meantime, I am going to manually merge into the 'atdm-nightly' branch.

@bartlettroscoe
Copy link
Member Author

I manually merged the PR branch for #10784 into the 'atdm-nightly-manual-updates' branch so this will get run in the ATDM Trilinos builds tomorrow and we should see (some of) the SPARC Trilinos Integration builds clear up the day after that.

Update of 'atdm-nightly-manual-updates' branch: (click to expand)
$ git checkout atdm-nightly-manual-updates 

$ git fetch pull

$ git merge --ff github/atdm-nightly

$ git merge 10774-fix-tpl-config-install-tree 
Merge made by the 'recursive' strategy.
 cmake/tribits/CHANGELOG.md                                                  | 8 ++++++++
 cmake/tribits/core/package_arch/TribitsExternalPackageWriteConfigFile.cmake | 2 +-
 cmake/tribits/core/package_arch/TribitsProcessTplsLists.cmake               | 2 +-
 cmake/tribits/ctest_driver/README                                           | 8 ++++----
 cmake/tribits/ctest_driver/TribitsGetCDashUrlsInsideCTestS.cmake            | 2 +-
 cmake/tribits/examples/MixedSharedStaticLibs/README                         | 2 +-
 6 files changed, 16 insertions(+), 8 deletions(-)

$ git log-short --name-status HEAD --not github/develop github/atdm-nightly

35735a8b7c9 "Merge branch '10774-fix-tpl-config-install-tree' into atdm-nightly-manual-updates (#10774)"
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date:   Wed Jul 20 13:29:52 2022 -0600 (45 seconds ago)

4038856c026 "Merge branch 'tribits_github_snapshot' into 10774-fix-tpl-config-install-tree (#10774)"
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date:   Wed Jul 20 13:12:54 2022 -0600 (18 minutes ago)

f7f6ed3495f "Automatic snapshot commit from tribits at 5d4696b2"
Author: Roscoe A. Bartlett <rabartl@sandia.gov>
Date:   Wed Jul 20 13:10:37 2022 -0600 (20 minutes ago)

M       cmake/tribits/CHANGELOG.md
M       cmake/tribits/core/package_arch/TribitsExternalPackageWriteConfigFile.cmake
M       cmake/tribits/core/package_arch/TribitsProcessTplsLists.cmake
M       cmake/tribits/ctest_driver/README
M       cmake/tribits/ctest_driver/TribitsGetCDashUrlsInsideCTestS.cmake
M       cmake/tribits/examples/MixedSharedStaticLibs/README

$ git push
Enumerating objects: 20, done.
Counting objects: 100% (18/18), done.
Delta compression using up to 32 threads
Compressing objects: 100% (8/8), done.
Writing objects: 100% (8/8), 1.03 KiB | 350.00 KiB/s, done.
Total 8 (delta 6), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (6/6), completed with 4 local objects.
To github.com:trilinos/Trilinos.git
   a0cabff009f..35735a8b7c9  atdm-nightly-manual-updates -> atdm-nightly-manual-updates

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Jul 21, 2022

Looking over all of the errors in the builds listed above for ATDM Trilinos testing day 2022-07-19 falls into a few different categories:

1) Link errors for libcgns missing hdf5 symbols

These errors look like:

/projects/sparc/tpls/cee-rhel7/cgns-c09a5cd/a80e7a6e298e722aef76764ff37c4fc0aec5fabc/cee-cpu_intel-19.0.3_intelmpi-2018.4/lib/libcgns.a(ADFH.c.o): In function `get_str_att':
ADFH.c:(.text+0x259): undefined reference to `H5Aopen_name'
/projects/sparc/tpls/cee-rhel7/cgns-c09a5cd/a80e7a6e298e722aef76764ff37c4fc0aec5fabc/cee-cpu_intel-19.0.3_intelmpi-2018.4/lib/libcgns.a(ADFH.c.o): In function `delete_children':
ADFH.c:(.text+0x58a): undefined reference to `H5Aopen_name'
ADFH.c:(.text+0x8af): undefined reference to `H5Aopen_name'

This includes the builds:

  • Trilinos-atdm-ats2-gnu-7.3.1-spmpi-rolling_serial_static_dbg
  • Trilinos-atdm-ats2-gnu-7.3.1-spmpi-rolling_serial_static_opt
  • Trilinos-atdm-cee-rhel7_intel-19.0.3_intelmpi-2018.4_serial_static_opt
  • Trilinos-atdm-cee-rhel7_intel-19.0.3_mpich2-3.2_openmp_static_opt
  • Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_opt
  • Trilinos-atdm-cts1empire-intel-18.0.2_openmpi-4.0.1_openmp_static_opt
  • Trilinos-atdm-cts1-intel-19.0.4_openmpi-4.0.3_openmp_static_dbg
  • Trilinos-atdm-cts1empire-intel-18.0.2_openmpi-4.0.1_openmp_static_dbg
  • Trilinos-atdm-cee-rhel7_intel-19.0.3_intelmpi-2018.4_serial_static_opt
  • Trilinos-atdm-cee-rhel7_intel-19.0.3_mpich2-3.2_openmp_static_opt
  • Trilinos-atdm-cee-rhel7_mini-no-mpi_intel-19.0.3_static_opt
  • Trilinos-atdm-cee-rhel7_mini_intel-19.0.3_mpich2-3.2_static_opt

I think these can all be fixed by adding FindTPLCGNSDependencies.cmake and adding a dependency of CGNS on HDF5.

ToDo: Characterize the other failures!

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jul 21, 2022
This should fix the build errors for the SEACAS exectuable
SEACASIoss_Utst_structured_decomp.exe on a bunch of ATDM Trilinos builds (see
bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jul 21, 2022
bartlettroscoe added a commit to bartlettroscoe/TriBITS that referenced this issue Jul 21, 2022
…rilinos/Trilinos#10774)

THe logic for constructing the full path for the
FindTPL<tplName>Dependencies.cmake file is not correct when <tplName>_FINDMOD
is an absolute path instead of a relative path.  This causes all of the
'TribitsExampleProject2_find_tpl_parts' tests to fail.
bartlettroscoe added a commit that referenced this issue Jul 21, 2022
Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'
Git describe: Vera4.0-RC1-start-1219-g8b3872ed

At commit:

commit 4b26997a2b19c29cbc6deaba5ad303b2336b63e6
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Thu Jul 21 10:35:22 2022 -0600
Summary: Add dependency of CGNS on HDF5 (#10774)
@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Jul 21, 2022

Continuing from above, the next failure seen in many different builds is:

2) CUDA build failures due to duplicate CUDA library function definitions when creating libIntercept.so.13.5

These errors look like:

/projects/sierra/linux_rh7/SDK/compilers/nvidia/cuda_10.1.243/bin/..//lib64/libcudart_static.a(libcudart_static.a.o): In function `cudaMemcpy2DFromArray':
(.text+0x2c300): multiple definition of `cudaMemcpy2DFromArray'
CMakeFiles/Intercept.dir/Intercept.cpp.o:tmpxft_000073c0_00000000-5_Intercept.cudafe1.cpp:(.text+0x830): first defined here
/projects/sierra/linux_rh7/SDK/compilers/nvidia/cuda_10.1.243/bin/..//lib64/libcudart_static.a(libcudart_static.a.o): In function `cudaMemcpy3D':
(.text+0x2bc20): multiple definition of `cudaMemcpy3D'
CMakeFiles/Intercept.dir/Intercept.cpp.o:tmpxft_000073c0_00000000-5_Intercept.cudafe1.cpp:(.text+0xcd0): first defined here
/projects/sierra/linux_rh7/SDK/compilers/nvidia/cuda_10.1.243/bin/..//lib64/libcudart_static.a(libcudart_static.a.o): In function `cudaMemcpy2DArrayToArray':
(.text+0x2bdc0): multiple definition of `cudaMemcpy2DArrayToArray'
CMakeFiles/Intercept.dir/Intercept.cpp.o:tmpxft_000073c0_00000000-5_Intercept.cudafe1.cpp:(.text+0x700): first defined here
...
<and many more>
...

We see these in the CUDA builds:

ToDo: Characterize the rest of the failures!

jwillenbring added a commit that referenced this issue Jul 22, 2022
…stall-tree

Fix <tplName>Config.cmake files to not point into build dir (#10774)

We are force merging this PR because all of the PR tests passed except the new CUDA 11 build, which had 2 failing tests. There were 2 Tpetra tests that failed that have been randomly failing for other PRs and the failures are not related to these changes.
@bartlettroscoe
Copy link
Member Author

Continuing from above, the next failure seen in many different builds is:

3) Link errors missing '__dlopen'

These errors look like:

/usr/lib/../x86_64-suse-linux/bin/ld: packages/kokkos/core/src/libkokkoscore.a(Kokkos_Profiling.cpp.o): in function `Kokkos::Tools::initialize(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)':
[CTest: warning matched] Kokkos_Profiling.cpp:(.text+0x2994): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/lib/../x86_64-suse-linux/bin/ld: /usr/lib/../lib64/libdl.a(dlopen.o): in function `dlopen':
/home/abuild/rpmbuild/BUILD/glibc-2.26/dlfcn/dlopen.c:30: undefined reference to `__dlopen'
/usr/lib/../x86_64-suse-linux/bin/ld: /usr/lib/../lib64/libdl.a(dlsym.o): in function `dlsym':
/home/abuild/rpmbuild/BUILD/glibc-2.26/dlfcn/dlsym.c:29: undefined reference to `__dlsym'
/usr/lib/../x86_64-suse-linux/bin/ld: /usr/lib/../lib64/libdl.a(dlerror.o): in function `dlerror':
/home/abuild/rpmbuild/BUILD/glibc-2.26/dlfcn/dlerror.c:33: undefined reference to `__dlerror'

These errors start in Sacado for the 'ats1' opt builds on 'mutrino':

and these errors start in SEACAS for the 'ats1' dbg builds on 'mutrino'

That is all of the systematic errors that impact all of the builds.

@bartlettroscoe
Copy link
Member Author

4) Compile errors in Kokkos for coming from gtest.h

The last new set of build errors are just for the build:

and they look like:

In file included from /gpfs/jenkins/skybridge-slave/workspace/Trilinos-atdm-tlcc2-intel-opt-openmp/SRC_AND_BUILD/Trilinos/packages/kokkos/core/unit_test/category_files/TestOpenMP_Category.hpp(48),
                 from packages/kokkos/core/unit_test/openmp/TestOpenMP_AtomicOperations_unsignedlongint.cpp(1):
/gpfs/jenkins/skybridge-slave/workspace/Trilinos-atdm-tlcc2-intel-opt-openmp/SRC_AND_BUILD/Trilinos/packages/kokkos/tpls/gtest/gtest/gtest.h(4588): error: no instance of function template "testing::internal::ElemFromListImpl<testing::internal::IndexSequence<I...>>::Apply [with I=<0UL>]" matches the argument list
            argument types are: (bool (*)(), bool (*)())
        decltype(ElemFromListImpl<typename MakeIndexSequence<N>::type>::Apply(
                 ^
[CTest: warning matched] /gpfs/jenkins/skybridge-slave/workspace/Trilinos-atdm-tlcc2-intel-opt-openmp/SRC_AND_BUILD/Trilinos/packages/kokkos/tpls/gtest/gtest/gtest.h(4582): note: this candidate was rejected because arguments do not match
    static R Apply(Ignore<0 * I>..., R (*)(), ...);
             ^
          detected during:
            instantiation of class "testing::internal::ElemFromList<N, T...> [with N=1UL, T=<bool, bool>]" at line 4602
            instantiation of class "testing::internal::FlatTupleElemBase<testing::internal::FlatTuple<T...>, I> [with T=<bool, bool>, I=1UL]" at line 4615
            instantiation of class "testing::internal::FlatTupleBase<testing::internal::FlatTuple<T...>, testing::internal::IndexSequence<Idx...>> [with Idx=<0UL, 1UL>, T=<bool, bool>]" at line 4655
            instantiation of class "testing::internal::FlatTuple<T...> [with T=<bool, bool>]" at line 8741
            instantiation of class "testing::internal::ValueArray<Ts...> [with Ts=<bool, bool>]" at line 9059

compilation aborted for packages/kokkos/core/unit_test/openmp/TestOpenMP_AtomicOperations_unsignedlongint.cpp (code 2)

and they are all in Kokkos. I am very curious to see what is causing this.

But from this query it seems these build errors started on testing day 2022-05-21 which was the first since testing day 2022-05-16 as shown below:

image

So these errors are unrelated to the TriBITS update (which is good because I have no idea how the TriBITS updates could be causing this).

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Aug 24, 2022
…os#10774, TriBITSPub/TriBITS#516)

A TriBITS update is exporting package cache vars to the <Package>Config.cmake
file and you can't have a local var with the same name as a cache var with
different values.

In this case, it was just lucky that no downstream package was reading this
var (through the cache var) because they would have gotten the wrong value.
It seems that only code in CMakeLists.txt files under packages/pliris/ were
reading this var.
@bartlettroscoe
Copy link
Member Author

FYI: The failure to tentatively enable the BinUtils TPL shown above for the build Trilinos-atdm-ats1-knl_intel-19.0.4_mpich-7.7.15_openmp_static_opt was occurring before the merge of PR #10614 on 2022-07-18. If you go back to the build Trilinos-atdm-ats1-knl_intel-19.0.4_mpich-7.7.15_openmp_static_opt on testing day 2022-07-17 you will see:

Processing enabled TPL: BinUtils (enabled explicitly, disable with -DTPL_ENABLE_BinUtils=OFF)
-- BinUtils_LIBRARY_NAMES='bfd;iberty'
-- TPL_BinUtils_LIBRARIES='/usr/lib64/libbfd.a;-lz;/usr/lib64/libiberty.a'
-- Must find at least one header in each of the header sets "link.h;bfd.h"
-- Searching for headers in BinUtils_INCLUDE_DIRS=''
-- Searching for a header file in the set "link.h":
--   Searching for header 'link.h' ...
--     Found header '/usr/include/link.h'
-- Searching for a header file in the set "bfd.h":
--   Searching for header 'bfd.h' ...
--     Found header '/usr/include/bfd.h'
-- Found TPL 'BinUtils' include dirs '/usr/include'
-- TPL_BinUtils_INCLUDE_DIRS='/usr/include'
-- Performing Test HAS_TPL_BINUNTILS_STACKTRACE
-- Performing Test HAS_TPL_BINUNTILS_STACKTRACE - Failed
-- Extended attempt to enable tentatively enabled TPL 'BinUtils' failed!  Setting TPL_ENABLE_BinUtils=OFF
-- TPL_ENABLE_BinUtils='OFF'

And looking at one of the recent builds today for Trilinos-atdm-ats1-knl_intel-19.0.4_mpich-7.7.15_openmp_static_opt you can see:

...

-- Tentatively enabling TPL 'DLlib'

...

Processing enabled TPL: DLlib (enabled explicitly, disable with -DTPL_ENABLE_DLlib=OFF)
-- Attempting to tentatively enable TPL 'DLlib' ...
-- DLlib_LIBRARY_NAMES='dl'
-- TPL_DLlib_LIBRARIES='-ldl'
-- Attempt to tentatively enable TPL 'DLlib' passed!

...

Therefore, the failure to tentatively enable the BinUtils TPL has nothing to do with the TriBITS upgrade merged in PR #10614.

@bartlettroscoe
Copy link
Member Author

With the post of PR #10930, all of these issues should be addressed. Putting this issue In Review.

trilinos-autotester added a commit that referenced this issue Aug 25, 2022
Automatically Merged using Trilinos Pull Request AutoTester
PR Title: Final fixes for TriBITS upgrade to modern CMake targets (#10614, #10774) 
PR Author: bartlettroscoe
@bartlettroscoe
Copy link
Member Author

With the merge of PR #10930, this is (finally) complete!

jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 26, 2022
…s:develop' (8906842).

* trilinos-develop: (128 commits)
  Intrepid2: update TensorData.setFirstComponentExtentInDimension0 to modify extents_[0] (trilinos#10929)
  Tpetra: Adding configure option to disable Kokkos integration test
  Automatic snapshot commit from tribits at 142e5362
  Disable Pliris tests in ATS2 GenConfig builds (trilinos#10931)
  Force disable Pliris in ATS2 builds (trilinos#10931)
  Automatic snapshot commit from tribits at ab419429
  Change cmake_minimum_required() from 3.17.1 to 3.0 (TriBITSPub/TriBITS#522)
  Pliris: Remove local var hiding cache var Pliris_ENABLE_DREAL (trilinos#10774, TriBITSPub/TriBITS#516)
  Remove printing of vars that are now empty (TriBITSPub/TriBITS#299)
  Panzer: move periodic helper typedefs into namespace
  Revert incorrect fix in previous commit
  Fix typos in some docs
  fix scratch typos
  STK: Snapshot 08-22-22 12:44
  Phalanx: remove cuda compiler warnings and add test for new use case for vov
  changed a double to a scalar_type to compile for complex arith
  MueLu: Fix signed vs unsigned comparison in Aggregates_kokkos.cpp
  Amesos2 : trying to fix MKL header including issues
  MueLu: Add Aggregates_kokkos.ComputeNodesInAggregate
  Testing on Geminga: Do not disable Kokkos in Epetra build
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 27, 2022
…s:develop' (8906842).

* trilinos-develop: (130 commits)
  Intrepid2: update TensorData.setFirstComponentExtentInDimension0 to modify extents_[0] (trilinos#10929)
  Tpetra: Adding configure option to disable Kokkos integration test
  MueLu: Allow to print Kokkos config when default node type is used
  Automatic snapshot commit from tribits at 142e5362
  Disable Pliris tests in ATS2 GenConfig builds (trilinos#10931)
  Force disable Pliris in ATS2 builds (trilinos#10931)
  Automatic snapshot commit from tribits at ab419429
  Change cmake_minimum_required() from 3.17.1 to 3.0 (TriBITSPub/TriBITS#522)
  Pliris: Remove local var hiding cache var Pliris_ENABLE_DREAL (trilinos#10774, TriBITSPub/TriBITS#516)
  Remove printing of vars that are now empty (TriBITSPub/TriBITS#299)
  Panzer: move periodic helper typedefs into namespace
  Revert incorrect fix in previous commit
  Fix typos in some docs
  fix scratch typos
  STK: Snapshot 08-22-22 12:44
  Phalanx: remove cuda compiler warnings and add test for new use case for vov
  changed a double to a scalar_type to compile for complex arith
  MueLu: Fix signed vs unsigned comparison in Aggregates_kokkos.cpp
  Amesos2 : trying to fix MKL header including issues
  MueLu: Add Aggregates_kokkos.ComputeNodesInAggregate
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 28, 2022
…s:develop' (8906842).

* trilinos-develop: (130 commits)
  Intrepid2: update TensorData.setFirstComponentExtentInDimension0 to modify extents_[0] (trilinos#10929)
  Tpetra: Adding configure option to disable Kokkos integration test
  MueLu: Allow to print Kokkos config when default node type is used
  Automatic snapshot commit from tribits at 142e5362
  Disable Pliris tests in ATS2 GenConfig builds (trilinos#10931)
  Force disable Pliris in ATS2 builds (trilinos#10931)
  Automatic snapshot commit from tribits at ab419429
  Change cmake_minimum_required() from 3.17.1 to 3.0 (TriBITSPub/TriBITS#522)
  Pliris: Remove local var hiding cache var Pliris_ENABLE_DREAL (trilinos#10774, TriBITSPub/TriBITS#516)
  Remove printing of vars that are now empty (TriBITSPub/TriBITS#299)
  Panzer: move periodic helper typedefs into namespace
  Revert incorrect fix in previous commit
  Fix typos in some docs
  fix scratch typos
  STK: Snapshot 08-22-22 12:44
  Phalanx: remove cuda compiler warnings and add test for new use case for vov
  changed a double to a scalar_type to compile for complex arith
  MueLu: Fix signed vs unsigned comparison in Aggregates_kokkos.cpp
  Amesos2 : trying to fix MKL header including issues
  MueLu: Add Aggregates_kokkos.ComputeNodesInAggregate
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 29, 2022
…s:develop' (8906842).

* trilinos-develop: (130 commits)
  Intrepid2: update TensorData.setFirstComponentExtentInDimension0 to modify extents_[0] (trilinos#10929)
  Tpetra: Adding configure option to disable Kokkos integration test
  MueLu: Allow to print Kokkos config when default node type is used
  Automatic snapshot commit from tribits at 142e5362
  Disable Pliris tests in ATS2 GenConfig builds (trilinos#10931)
  Force disable Pliris in ATS2 builds (trilinos#10931)
  Automatic snapshot commit from tribits at ab419429
  Change cmake_minimum_required() from 3.17.1 to 3.0 (TriBITSPub/TriBITS#522)
  Pliris: Remove local var hiding cache var Pliris_ENABLE_DREAL (trilinos#10774, TriBITSPub/TriBITS#516)
  Remove printing of vars that are now empty (TriBITSPub/TriBITS#299)
  Panzer: move periodic helper typedefs into namespace
  Revert incorrect fix in previous commit
  Fix typos in some docs
  fix scratch typos
  STK: Snapshot 08-22-22 12:44
  Phalanx: remove cuda compiler warnings and add test for new use case for vov
  changed a double to a scalar_type to compile for complex arith
  MueLu: Fix signed vs unsigned comparison in Aggregates_kokkos.cpp
  Amesos2 : trying to fix MKL header including issues
  MueLu: Add Aggregates_kokkos.ComputeNodesInAggregate
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 29, 2022
…s:develop' (8906842).

* trilinos-develop: (130 commits)
  Intrepid2: update TensorData.setFirstComponentExtentInDimension0 to modify extents_[0] (trilinos#10929)
  Tpetra: Adding configure option to disable Kokkos integration test
  MueLu: Allow to print Kokkos config when default node type is used
  Automatic snapshot commit from tribits at 142e5362
  Disable Pliris tests in ATS2 GenConfig builds (trilinos#10931)
  Force disable Pliris in ATS2 builds (trilinos#10931)
  Automatic snapshot commit from tribits at ab419429
  Change cmake_minimum_required() from 3.17.1 to 3.0 (TriBITSPub/TriBITS#522)
  Pliris: Remove local var hiding cache var Pliris_ENABLE_DREAL (trilinos#10774, TriBITSPub/TriBITS#516)
  Remove printing of vars that are now empty (TriBITSPub/TriBITS#299)
  Panzer: move periodic helper typedefs into namespace
  Revert incorrect fix in previous commit
  Fix typos in some docs
  fix scratch typos
  STK: Snapshot 08-22-22 12:44
  Phalanx: remove cuda compiler warnings and add test for new use case for vov
  changed a double to a scalar_type to compile for complex arith
  MueLu: Fix signed vs unsigned comparison in Aggregates_kokkos.cpp
  Amesos2 : trying to fix MKL header including issues
  MueLu: Add Aggregates_kokkos.ComputeNodesInAggregate
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 29, 2022
…s:develop' (8906842).

* trilinos-develop: (130 commits)
  Intrepid2: update TensorData.setFirstComponentExtentInDimension0 to modify extents_[0] (trilinos#10929)
  Tpetra: Adding configure option to disable Kokkos integration test
  MueLu: Allow to print Kokkos config when default node type is used
  Automatic snapshot commit from tribits at 142e5362
  Disable Pliris tests in ATS2 GenConfig builds (trilinos#10931)
  Force disable Pliris in ATS2 builds (trilinos#10931)
  Automatic snapshot commit from tribits at ab419429
  Change cmake_minimum_required() from 3.17.1 to 3.0 (TriBITSPub/TriBITS#522)
  Pliris: Remove local var hiding cache var Pliris_ENABLE_DREAL (trilinos#10774, TriBITSPub/TriBITS#516)
  Remove printing of vars that are now empty (TriBITSPub/TriBITS#299)
  Panzer: move periodic helper typedefs into namespace
  Revert incorrect fix in previous commit
  Fix typos in some docs
  fix scratch typos
  STK: Snapshot 08-22-22 12:44
  Phalanx: remove cuda compiler warnings and add test for new use case for vov
  changed a double to a scalar_type to compile for complex arith
  MueLu: Fix signed vs unsigned comparison in Aggregates_kokkos.cpp
  Amesos2 : trying to fix MKL header including issues
  MueLu: Add Aggregates_kokkos.ComputeNodesInAggregate
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 29, 2022
…s:develop' (8906842).

* trilinos-develop: (130 commits)
  Intrepid2: update TensorData.setFirstComponentExtentInDimension0 to modify extents_[0] (trilinos#10929)
  Tpetra: Adding configure option to disable Kokkos integration test
  MueLu: Allow to print Kokkos config when default node type is used
  Automatic snapshot commit from tribits at 142e5362
  Disable Pliris tests in ATS2 GenConfig builds (trilinos#10931)
  Force disable Pliris in ATS2 builds (trilinos#10931)
  Automatic snapshot commit from tribits at ab419429
  Change cmake_minimum_required() from 3.17.1 to 3.0 (TriBITSPub/TriBITS#522)
  Pliris: Remove local var hiding cache var Pliris_ENABLE_DREAL (trilinos#10774, TriBITSPub/TriBITS#516)
  Remove printing of vars that are now empty (TriBITSPub/TriBITS#299)
  Panzer: move periodic helper typedefs into namespace
  Revert incorrect fix in previous commit
  Fix typos in some docs
  fix scratch typos
  STK: Snapshot 08-22-22 12:44
  Phalanx: remove cuda compiler warnings and add test for new use case for vov
  changed a double to a scalar_type to compile for complex arith
  MueLu: Fix signed vs unsigned comparison in Aggregates_kokkos.cpp
  Amesos2 : trying to fix MKL header including issues
  MueLu: Add Aggregates_kokkos.ComputeNodesInAggregate
  ...
jmgate pushed a commit to tcad-charon/Trilinos that referenced this issue Aug 29, 2022
…s:develop' (8906842).

* trilinos-develop: (130 commits)
  Intrepid2: update TensorData.setFirstComponentExtentInDimension0 to modify extents_[0] (trilinos#10929)
  Tpetra: Adding configure option to disable Kokkos integration test
  MueLu: Allow to print Kokkos config when default node type is used
  Automatic snapshot commit from tribits at 142e5362
  Disable Pliris tests in ATS2 GenConfig builds (trilinos#10931)
  Force disable Pliris in ATS2 builds (trilinos#10931)
  Automatic snapshot commit from tribits at ab419429
  Change cmake_minimum_required() from 3.17.1 to 3.0 (TriBITSPub/TriBITS#522)
  Pliris: Remove local var hiding cache var Pliris_ENABLE_DREAL (trilinos#10774, TriBITSPub/TriBITS#516)
  Remove printing of vars that are now empty (TriBITSPub/TriBITS#299)
  Panzer: move periodic helper typedefs into namespace
  Revert incorrect fix in previous commit
  Fix typos in some docs
  fix scratch typos
  STK: Snapshot 08-22-22 12:44
  Phalanx: remove cuda compiler warnings and add test for new use case for vov
  changed a double to a scalar_type to compile for complex arith
  MueLu: Fix signed vs unsigned comparison in Aggregates_kokkos.cpp
  Amesos2 : trying to fix MKL header including issues
  MueLu: Add Aggregates_kokkos.ComputeNodesInAggregate
  ...
@bartlettroscoe
Copy link
Member Author

Shoot, reopening. Seems that there was some delayed testing with Nalu-Wind and just got flagged in #10954. Repopening this story and adding #10954 to the list of tasks.

@bartlettroscoe bartlettroscoe reopened this Sep 6, 2022
cgcgcg pushed a commit to cgcgcg/Trilinos that referenced this issue Sep 12, 2022
…os#10774, TriBITSPub/TriBITS#516)

A TriBITS update is exporting package cache vars to the <Package>Config.cmake
file and you can't have a local var with the same name as a cache var with
different values.

In this case, it was just lucky that no downstream package was reading this
var (through the cache var) because they would have gotten the wrong value.
It seems that only code in CMakeLists.txt files under packages/pliris/ were
reading this var.
@bartlettroscoe
Copy link
Member Author

With the PRs posted #11093 and #11099, all of the known issues for the TriBITS upgrade should be resolved. Moving to "In Review".

@bartlettroscoe
Copy link
Member Author

With the merges of PRs #11093 and #11099, the last of these issues should be resolved.

Closing again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATDM Config Issues that are specific to the ATDM configuration settings type: bug The primary issue is a bug in Trilinos code or tests
Projects
Development

No branches or pull requests

3 participants