Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the libraries and gocart #1745

Merged
merged 40 commits into from
Aug 7, 2023
Merged

Conversation

junwang-noaa
Copy link
Collaborator

@junwang-noaa junwang-noaa commented May 9, 2023

Description

In this PR, the following libraries will be updated:

  1. hdf5:1.14.0
  2. netcdf: 4.9.2
  3. fms: 2023.01
  4. esmf: 8.4.2
  5. mapl: 2.35.2

The GOCART is updated to the latest develop branch with the feature of not running Nitrates.

Top of commit queue on: TBD

Input data additions/changes

  • No changes are expected to input data.
  • There will be new input data.
  • Input data will be updated.

Anticipated changes to regression tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:
    All tests with GOCART will have gocart output files changed.
    All tests will have attributes data type change in the atm history files.

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Combined with PR's (If Applicable):

Commit Queue Checklist:

  • Link PR's from all sub-components involved
  • Confirm reviews completed in sub-component PR's
  • Add all appropriate labels to this PR.
  • Run full RT suite on either Hera/Cheyenne with both Intel/GNU compilers
  • Add list of any failed regression tests to "Anticipated changes to regression tests" section.

Linked PR's and Issues:

Testing Day Checklist:

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.

Testing Log (for CM's):

  • RDHPCS
    • Intel
      • Hera
      • Orion
      • Jet
      • Gaea
      • Cheyenne
    • GNU
      • Hera
      • Cheyenne
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

natalie-perlin
natalie-perlin previously approved these changes May 10, 2023
@ulmononian
Copy link
Collaborator

ulmononian commented Jun 1, 2023

hi @junwang-noaa. we are testing spack-stack/1.4.0 (hdf5/1.14.0, fms/2023.01, mapl/2.35.2, esmf/8.4.2, pio/2.5.9) on orion w/ intel. the cpld_control_p8 test fails with what appears to be a mapl/gocart issue. i was wondering if, in your testing for this PR and the associated issues, you encountered any library version, submodule hashes, or component configuration changes that needed to be made? cpld_control_p8 passes fine w/ spack-stack/1.3.0 which uses esmf/8.3.0b09 and mapl/2.22.0, so my hunch is that relates to the library version updates.

in case you encountered this in your testing, the err output for the coupled run shows:

107: pe=00107 FAIL at line=03053    Base_Base_implementation.F90             <status=57>
107: pe=00107 FAIL at line=00685    SU2G_GridCompMod.F90                     <status=57>
107: pe=00107 FAIL at line=01818    MAPL_Generic.F90                         <status=57>

the model fails here:

  0:  in radiation_clouds_prop=           8 F           4 F F           2           1
  0: PASS: fcstRUN phase 1, n_atmsteps =                0 time is         9.509046
  0: UFS Aerosols: Advancing from 2021-03-22T06:00:00 to 2021-03-22T06:12:00

the environment at runtime was:

Currently Loaded Modules:
  1) intel/2022.1.2                   20) esmf/8.4.2
  2) stack-intel/2022.0.2             21) fms/2023.01
  3) impi/2022.1.2                    22) bacio/2.4.1
  4) stack-intel-oneapi-mpi/2021.5.1  23) crtm-fix/2.4.0_emc
  5) miniconda/3.9.7                  24) git-lfs/2.12.0
  6) stack-python/3.9.7               25) crtm/2.4.0
  7) cmake/3.23.1                     26) g2/3.4.5
  8) libjpeg/2.1.0                    27) g2tmpl/1.10.2
  9) jasper/2.0.32                    28) ip/3.3.3
 10) zlib/1.2.13                      29) sp/2.3.3
 11) libpng/1.6.37                    30) w3emc/2.9.2
 12) pkg-config/0.27.1                31) gftl/1.8.3
 13) hdf5/1.14.0                      32) gftl-shared/1.5.0
 14) curl/8.0.1                       33) ecbuild/3.7.2
 15) zstd/1.5.2                       34) yafyaml/0.5.1
 16) netcdf-c/4.9.2                   35) mapl/2.35.2-esmf-8.4.2
 17) netcdf-fortran/4.6.0             36) scotch/7.0.3
 18) parallel-netcdf/1.12.2           37) ufs_common
 19) parallelio/2.5.9                 38) modules.fv3

thanks!!!

@mathomp4
Copy link

mathomp4 commented Jun 1, 2023

@ulmononian That Base_base error looks familiar. I'll ask around the GMAO.

@ulmononian
Copy link
Collaborator

@ulmononian That Base_base error looks familiar. I'll ask around the GMAO.

thanks @mathomp4 📡

@junwang-noaa
Copy link
Collaborator Author

@ulmononian If I remember correctly, you need to update the CAP.rc file, you can take a look at my branch.

@ulmononian
Copy link
Collaborator

@ulmononian If I remember correctly, you need to update the CAP.rc file, you can take a look at my branch.

thanks for that suggestion, @junwang-noaa! i'll test with those changes.

@ulmononian
Copy link
Collaborator

ulmononian commented Jun 1, 2023

@junwang-noaa @mathomp4 @Hang-Lei-NOAA: cpld_control_p8 ran successfully by making the changes to tests/parm/gocart/CAP.rc and /tests/parm/gocart/GOCART2G_GridComp.rc files as done in @junwang-noaa's PR branch (note that i also updated the gocart hash to the same used in jun's branch).

@zach1221
Copy link
Collaborator

zach1221 commented Aug 7, 2023

Ok, so updating the TPN to 16 for regional_atmaq and regional_atmaq_faster, for cheyenne, under the tests/tests/ directory allowed those two cases to pass. I'll keep you posted on the compile_s2sw_pdlib_intel scotch issue.

@zach1221
Copy link
Collaborator

zach1221 commented Aug 7, 2023

I still can't get compile_s2sw_pdlib_intel to find the scotch library's (scotch_lib scotch_inc). I re-loaded the scotch module and manually exported the library/installation paths displayed under "module show scotch", but no success. Fyi @natalie-perlin . I can create an issue for this, and we can skip Cheyenne for this PR. Thoughts?

@AlexanderRichert-NOAA
Copy link
Collaborator

@zach1221 this is just a shot in the dark-- are the installed libraries shared or static? Currently WW3 only recognizes the static versions.

@natalie-perlin
Copy link
Collaborator

@zach1221 - what are the steps you are taking to build it? How could I reproduce your issue of not finding scotch libraries? (And yes, they are static)

@zach1221
Copy link
Collaborator

zach1221 commented Aug 7, 2023

@zach1221 - what are the steps you are taking to build it? How could I reproduce your issue of not finding scotch libraries? (And yes, they are static)

@natalie-perlin I'll reach out to you and provide the build/run steps.

@DeniseWorthen
Copy link
Collaborator

I've been able to build Jun's branch for PDLIB+GNU using

./compile.sh cheyenne '-DAPP=S2SW -DCCPP_SUITES=FV3_GFS_v17_coupled_p8 -DPDLIB=ON -DDEBUG=ON' s2sw_pdlib gnu YES NO 2>&1 | tee s2s.pdlib.gnu.log 

@zach1221
Copy link
Collaborator

zach1221 commented Aug 7, 2023

@DeniseWorthen It seems this command will work for me as well.. I wonder why ./rt.sh does not allow me to build the suite.

@DeniseWorthen
Copy link
Collaborator

@zach1221 I also cannot build using rt.sh

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Aug 7, 2023

@zach1221 can you push code change for TPN case on cheyenne? Regarding scotch/hpc-stack issue, I think we can skip cheyenne for this pr since you are pre checking #1707 on cheyenne.

@BrianCurtis-NOAA
Copy link
Collaborator

The Acorn system issues are confirmed to be due to a bug in the python 3.8.6 library loaded in rt.sh. As ecflow is imported into python there is a shared object file missing causing a failure.

I've tested using the spack-stack PR commit where nccmp is now used on acorn/WCOSS2, and confirmed that the previous issue does not exist on Acorn.

Acorn can be skipped. The baselines are completed for this PR on Acorn.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Aug 7, 2023

Ok! Cheyenne can be skipped. We can start merging in. To get all code managers on same page, please go ahead for final reviews and approvals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet