Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined PR: hpc-stack locations, Disable Fused Multiply Add, and Enable usage of shared pio #1645

Merged

Conversation

SamuelTrahanNOAA
Copy link
Collaborator

@SamuelTrahanNOAA SamuelTrahanNOAA commented Mar 7, 2023

Description

This is #1617, #1618, and #1640 combined into a single PR solely for the purposes of final testing and merge.

Top of commit queue on: TBD

Input data additions/changes

  • No changes are expected to input data.
  • There will be new input data.
  • Input data will be updated.

Anticipated changes to regression tests:

  • Changes are expected to the following tests: hafs_regional_storm_following_1nest_atm_ocn_wav on WCOSS2 and Acorn only.

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Combined with PR's (If Applicable):

This is the combination of #1617, #1618, and #1640

Commit Queue Checklist:

  • Link PR's from all sub-components involved
  • Confirm reviews completed in sub-component PR's
  • Add all appropriate labels to this PR.
  • Run full RT suite on either Hera/Cheyenne with both Intel/GNU compilers
  • Add list of any failed regression tests to "Anticipated changes to regression tests" section.

Linked PR's and Issues:

Testing Day Checklist:

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.

Testing Log (for CM's):

  • RDHPCS
    • Intel
      • Hera
      • Orion
      • Jet
      • Gaea
      • Cheyenne
    • GNU
      • Hera
      • Cheyenne
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@zach1221
Copy link
Collaborator

zach1221 commented Mar 8, 2023

@SamuelTrahanNOAA We would like to combine @natalie-perlin's PR#1617 with this PR, would you be able to pull it in?

@natalie-perlin note, on this, if possible can create a direct PR to Sam's branch?

@SamuelTrahanNOAA
Copy link
Collaborator Author

I want to make sure #1617 didn't break Rocoto support, so I'll test that now in this branch.

@SamuelTrahanNOAA
Copy link
Collaborator Author

I've combined all three PRs, and I'm testing now to ensure Rocoto support still works.

@SamuelTrahanNOAA SamuelTrahanNOAA changed the title Combined PR: Disable Fused Multiply Add AND Enable usage of shared pio Combined PR: hpc-stack locations, Disable Fused Multiply Add, and Enable usage of shared pio Mar 8, 2023
@zach1221
Copy link
Collaborator

zach1221 commented Mar 8, 2023

Thank you, Sam!

@BrianCurtis-NOAA, PRs, #1617 #1640 and #1618 have all been combined here.

@zach1221 zach1221 added the jenkins-ci Jenkins CI: ORT build/test on docker container label Mar 8, 2023
@SamuelTrahanNOAA
Copy link
Collaborator Author

I can confirm that the rt.sh Rocoto mode still works on Jet and Hera, which are the only two platforms where it is supported.

@SamuelTrahanNOAA
Copy link
Collaborator Author

@ulmononian @natalie-perlin - Could you please review this PR and make sure your changes are in here correctly?

@ulmononian
Copy link
Collaborator

@ulmononian @natalie-perlin - Could you please review this PR and make sure your changes are in here correctly?

pio/cmakemodules updates look correct.

@natalie-perlin
Copy link
Collaborator

@jkbk2004 -
why are totally totally different issues that have been already submitted as separate PRs are now combined into yet another PR that is very misleading - it has neither connection to original contributors and testers on various platforms, not has any basis for such additional work? This looks like a very obscured approach of what specific issues has the PR actually addresses, and it does not seem to follow accepted GitHub Community Guidelines.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Mar 8, 2023

@jkbk2004 - why are totally totally different issues that have been already submitted as separate PRs are now combined into yet another PR that is very misleading - it has neither connection to original contributors and testers on various platforms, not has any basis for such additional work? This looks like a very obscured approach of what specific issues has the PR actually addresses, and it does not seem to follow accepted GitHub Community Guidelines.

@natalie-perlin I don't get your point. The article you uploaded is only about communication in PR. Yes, we need speedy communication in PR. PRs #1618 #1640 are compile related. Your is hpc stack modulefule update. Can you tell me why we should run full regression tests repeatedly for each PR? We have limited resource and time. Decision was made with CMs discussion. I agree new PR is not needed. But sometime mis-communication happens. @zach1221 made a note about that. I am sorry I still see no issue. I like to keep processing with this PR. BTW, thanks for your comment!

@zach1221
Copy link
Collaborator

zach1221 commented Mar 8, 2023

Please see jenkins-ci logs attached. ORTs have passed.
ufs-weather-model » ort-docker-pipeline » PR-1645 #1 Console [Jenkins].pdf

@SamuelTrahanNOAA
Copy link
Collaborator Author

@BrianCurtis-NOAA - Try again; it should work now. When I was adding the $PWD, I put it on the wrong line in fv3_qsub.IN_wcoss2. That mistake has been corrected, and I double-checked the rest of the module logic. It looks right now.

SamuelTrahanNOAA and others added 2 commits March 8, 2023 19:17
…NOAA/ufs-weather-model into pio-cmake-disable-fma
on-behalf-of @ufs-community <jong.kim@noaa.gov>
@jkbk2004
Copy link
Collaborator

jkbk2004 commented Mar 8, 2023

Automated RT Failure Notification
Machine: hera
Compiler: gnu
Job: RT
[RT] Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/1267030309/20230308184516/ufs-weather-model
[RT] Error: Test cpld_control_p8 046 failed in run_test failed
Please make changes and add the following label back: hera-gnu-RT

@zach1221
Copy link
Collaborator

zach1221 commented Mar 8, 2023

System timeout issue on hera gnu, regarding cpld_control_p8. Will try to run manually.

@SamuelTrahanNOAA
Copy link
Collaborator Author

I reported the cpld_control_p8 problem in September of last year, and it has been going on for much longer than that:

#1432

Maybe someone will fix it someday?

@DeniseWorthen
Copy link
Collaborator

and this one is also related #1440

@zach1221
Copy link
Collaborator

zach1221 commented Mar 8, 2023

@SamuelTrahanNOAA could you provide me with permission to push logs to this PR, please?

@SamuelTrahanNOAA
Copy link
Collaborator Author

Could you provide me with permission to push logs to this PR, please?

@zach1221 - you should get an invitation any second now

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Mar 9, 2023

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: RT
[RT] Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/1267030309/20230308190009/ufs-weather-model
[RT] Error: Test cpld_control_p8_faster 021 failed in run_test failed
Please make changes and add the following label back: gaea-intel-RT

@BrianCurtis-NOAA
Copy link
Collaborator

@SadeghTabas-NOAA Will the new faster RT's also need new baselines with the changes from this PR?

@SadeghTabas-NOAA
Copy link
Collaborator

SadeghTabas-NOAA commented Mar 9, 2023

@BrianCurtis-NOAA I don't think so, I don't see any baseline changes needed to be applied for the faster tests.

@BrianCurtis-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA I don't think so, I don't see any baseline changes needed to be applied for the faster tests.

I think they will for WCOSS2, the -no-fma option is added to all WCOSS2 tests that use the -DFASTER compile option.

@jkbk2004 I'll need to generate new baselines for the new faster RT's on the WCOSS2 machines and run RT's again.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Mar 9, 2023

@BrianCurtis-NOAA I don't think so, I don't see any baseline changes needed to be applied for the faster tests.

I think they will for WCOSS2, the -no-fma option is added to all WCOSS2 tests that use the -DFASTER compile option.

@jkbk2004 I'll need to generate new baselines for the new faster RT's on the WCOSS2 machines and run RT's again.

Sure!

@SamuelTrahanNOAA
Copy link
Collaborator Author

The -DFASTER=ON is only enabled for one test.

@BrianCurtis-NOAA
Copy link
Collaborator

The -DFASTER=ON is only enabled for one test.

Right, we added 5 (i think) new RT's to test -DFASTER on other parts of the UFS in a previous PR. Look in rt.conf for _faster tests i.e.: https://github.com/SamuelTrahanNOAA/ufs-weather-model/blob/867add1732aeecb6b68a2580c9f37f94ba746f7d/tests/rt.conf#L126-L127

@zach1221
Copy link
Collaborator

zach1221 commented Mar 9, 2023

Hi, @SamuelTrahanNOAA . All the testing is now complete, so we'll go ahead to merge this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet