-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMPLICITLY MERGED: Fix ATDM Trilinos builds broken by TriBITS update (#10774) #10791
IMPLICITLY MERGED: Fix ATDM Trilinos builds broken by TriBITS update (#10774) #10791
Conversation
The ATDM customers don't need Krino and it has been failing the build for all recorded time on CDash since the package was first added (see trilinos#10524).
Origin repo remote tracking branch: 'github/master' Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git' Git describe: Vera4.0-RC1-start-1219-g8b3872ed At commit: commit 4b26997a2b19c29cbc6deaba5ad303b2336b63e6 Author: Roscoe A. Bartlett <rabartl@sandia.gov> Date: Thu Jul 21 10:35:22 2022 -0600 Summary: Add dependency of CGNS on HDF5 (trilinos#10774)
Origin repo remote tracking branch: 'github/master' Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git' Git describe: Vera4.0-RC1-start-1221-g0d1da434 At commit: commit 6d15ef8ea26694f154f89efe3a609a2a4a7e7f30 Author: Roscoe A. Bartlett <rabartl@sandia.gov> Date: Thu Jul 21 20:55:22 2022 -0600 Summary: Fix FindTPLCUDA.cmake (trilinos#299)
FYI: With the merge commit bd772e6, we should see the CUDA link errors described above fixed in the ATDM Trilinos builds tomorrow. (I had to manually merge directly to the 'atdm-nightly' branch because the auto-update completed about 25 minutes ago but none of the ATDM Trilinos builds should have fired off yet so we should still get a consistent set of ATDM Trilinos builds tomorrow.) The details on how I did a reference build and reproduced the link errors and how I explained the link lines to find the problem are given below. Reproducing and examining the build errors for the build 'cee-rhel7_cuda-10.1.243_gnu-7.2.0_openmpi-4.0.3_shared_opt': (click to expand)Now to try to reproduce the CUDA link errors. But first, I want to do a reference build so I can compare. The CUDA multiple defined errors described in #10774 (comment) occur in the creation of the shared lib
So I should just have to build Tpetra to figure this out. Also, this is a TESTONLY lib so it can't (directly) impact installed versions of Trilinos. Let's do the reference build on the branch '10774-pre-tribits-update-ref' at version:
and we do the reference build on 'ascicgpu17' with:
Now, reproducing the updated TriBITS build error on the branch '10774-fix-atdm-builds' at the repo state:
reproducing the updated TriBITS build error with:
and it showed the build error:
Now, let's get the link line for the reference build:
which produced the reference build link command:
Comparing that to the updated TriBITS build link line:
which produced the updated link command:
Now, lets list out the link line arguments, one option per line. First, here is the flattened out link line for the reference build:
Now, here is the flattened out link line for the updated build:
Comparing these two link lines, they are very different. The order of some of the libraries is different and even what libraries are listed and how they are listed on the link line is different. So it turns out that having:
on the link line is critical. In the reference build, that comes in through
But the problem is that when I updated the
and the end. The fix was simple, just allow those libs to be used by changing this to:
And in fact, I significantly simplified that file in TriBITSPub/TriBITS#503 taking out the found check since |
I was hasty and put Krino in the wrong disable list. I had put it in the list for extra disables for complex builds.
Origin repo remote tracking branch: 'github/master' Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git' Git describe: Vera4.0-RC1-start-1224-g46b634b9 At commit: commit 1ecbb99ecf5615b8d06fd1cc16c8d8ee02d65888 Author: Roscoe A. Bartlett <rabartl@sandia.gov> Date: Mon Jul 25 13:07:08 2022 -0600 Summary: Change type to IMPORTED STATIC for *.a file (trilinos#10774)
dbc3946
to
042bdd7
Compare
FYI: With the update to the 'atdm-nightly' branch yesterday, the all of the 'ats1' configurations on 'mutrino' now all build successfully as shown for the ATDM Trilinos builds today 2022-07-26. There are 6 new test failures for for the build
This looks like failures we have seen before on 'mutrino' as reported in #3942 and #3499. Not sure if this is related to the TriBITS updates or some other changes or if this is just a fluke. As for the 4 tests failing in the build |
All of the new ATDM Trilinos build errors due to the TriBITS upgrade from #10614 now appear to be resolved as of today, ATDM Trilinos testing day 2022-07-26. The only remaining failures are those that already existed which include configure errors for three SPARC 'mini' builds and build errors for the 'tlcc2' builds as reported at the top of #10774. So this PR is ready to merge and it fixes all of the ATDM Trilinos builds. |
Here are the notes for my local builds reproducing and fixing the CGNS/HDF5 link errors I had above. I was able to reproduce the CGNS/HDF5 link errors described inhttps://github.com//issues/10774#issuecomment-1191702159 and then verify the fix on the machine 'ceerws1113' with the build 'cee-rhel7_intel-19.0.3_intelmpi-2018.4_serial_static_opt' build on 'ceerws1113' (click to expand)For an earlier version of this branch (before putting this into TriBITS proper and snapshotting TriBITS 'master' back in):
On 'ceerws1113' I ran:
Note that the build failed the first time due to a compiler crash and I finished running the build and tests with:
I tested the other changes in this PR in: |
Can one of you please approve this PR to allow it to merge to the Trilinos 'develop' branch? I have verified it fixes all of the ATDM Trilinos build errors caused by the initial merge of #10614. (That was verified by manually merging this topic branch to the 'atdm-nightly' branch and watching the ATDM Trilinos builds as described above). This fixes the SPARC Trilinos Integration builds and merging this will allow SPARC to upgrade to a new version of Trilinos 'develop'. This will also likely fix many other customer builds of Trilinos on various platforms, especially CUDA builds on some platforms. |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
…lds (trilinos#10791, trilinos#10840) I am manually merging the tip of 'develop' into this topic branch so that I can see if the PR testing results change (see trilinos#10840).
FYI: I created the issue TRILINOSHD-156 requesting: Hello Trilinos Framework team, Can someone please manually merge the PR: It fixes issues for several customers. It is approved to be merged and it passed both of its PR testing iterations except for a randomly failing Tpetra that is impacting several PRs (and therefore the failure has nothing to do with the changes in PR #10791, see: #10847). With the logjam going on with PR testing over the last several weeks and given that it takes upwards of 3 days to more to get a PR build to run, one can argue that merging PRs that are clearly okay to merge and remove them from PR testing is a good thing to do. Thanks, -Ross |
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
NOTICE: The AutoTester has encountered an internal error (usually a Communications Timeout), testing will be restarted, previous tests may still be running but will be ignored by the AutoTester... |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: bartlettroscoe |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
NOTE: The branch 10774-install-run-demo for PR #10813 was created from this PR branch bartlettroscoe:10774-fix-atdm-builds. All of the unique commits on this PR shown here with the top unique commit 042bdd7 as shown as the initial commits in the branch bartlettroscoe:10774-install-run-demo for PR #10813 shown here (again, the same commit 042bdd7 is shown in the PR #10813). Therefore, when PR #10813 was merged, it effectively merged this PR branch. SIDE NOTE: I have no idea why GitHub is showing a conflict with the file When I merge this topic branch with 'develop' locally, it merges fine as shown by:
So, in other words, GitHub seems to have a defect so we can ignore this. Therefore, because all of the unique commits from this PR were merged along with PR #10813, we can close this PR and mark it as merged. |
Internal issues
Description
This PR contains updates to TriBITS and Trilinos to address failures in the ATDM Trilinos builds described in #10774. As I fix these builds and update this topic-branch, I will manually merge this topic branch to the 'atdm-nightly-manual-updates' branch so this gets run in the ATDM Trilinos builds. This way, we can see how this works in the ATDM Trilinos builds without getting help up by issues with PR testing (e.g. #10782). (See the motivation and description for this workflow in Restoring productivity through the advanced usage of Git.)
Issues addressed in the PR:
__dlopen
for 'ats1' builds (pulls in Change type to IMPORTED STATIC for *.a file (trilinos/Trilinos#10774) TriBITSPub/TriBITS#504)NOTE: I also removed Krino from the set of disabled packages since the ATDM customers are not using it and is failing (see #10524).
NOTE: This also pulls in TriBITS changes for the PRs:
Testing
All of the new ATDM Trilinos build errors triggered by triggered by the merge of #10614 have been resolved as of 2022-07-26. See below.