-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Changes for Derecho, a new platform #894
[develop] Changes for Derecho, a new platform #894
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natalie-perlin - Thanks for opening this PR to allow the SRW App to build and run on Derecho!
Since Cheyenne will be decommissioned at the end of the year and given that the NRAL0032 account is out of resources on Cheyenne, should we keep Cheyenne in the various files still, or would it be best to fully transition to Derecho?
If we fully cut support for Cheyenne and fully transition to Derecho, then the modification made in ush/get_crontab_contents.py
can be changed so that line 61 would read:
if MACHINE == "DERECHO"
which should allow the Python unittests
to pass (currently, the Python unittests
are failing in test_get_crontab_contents
because the crontab_cmd
is being set as usr/bin/crontab
rather than crontab
).
Has an EPIC Platform ticket been created to create a new Derecho pipeline so that we can add Derecho to the .cicd/Jenkinsfile to run the automated tests on the new platform? If not, please let me know and I can open a ticket for this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to my other suggested change, we should remove all the nodesize:
lines from all the files in parm/wflow/
. It turns out this <nodesize>
tag in the Rocoto XML actually does nothing without a corresponding <cores>
tag, which we do not have. And the newer Rocoto build on Derecho gives a bunch of deprecation warnings for this tag each time you run rocotorun, so we should just get rid of it.
Negative news aside, I did confirm I was able to run tests successfully on Derecho! So hopefully once these changes are addressed and the latest development merged in this will be good to go.
Thank you, @mkavulich! Co-authored-by: Michael Kavulich <kavulich@ucar.edu>
@mkavulich - addressed your comments on yaml files in wflow/ directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about those late comments, thanks for addressing them!
Merged changes from develop, and tested without additional cmake options file for UFS WM. After fixing a default for EXTRN_MDL_DATA_STORES: aws in ./ush/machine/derecho.yaml, all the fundamental test have passed. (before correcting derecho.yaml):
after correcting derecho.yaml:
|
Running comprehensive tests now on Derecho. |
@MichaelLueken - are there any additional tests needed for Derecho? As to CI/CD we may not have the account yet. |
Comprehesive tests:
|
@natalie-perlin - With the decommissioning of Cheyenne, using the Are there plans to add GNU to Derecho at a later time? If there are plans, then we can bring in the I'm wrapping up my testing of the Jenkins build and run scripts to ensure that the SRW will build and run using these on Derecho. Additionally, this will also test the coverage suite for the machine. Once they pass, I will give my approval and test the rest of the systems using Jenkins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natalie-perlin - The SRW App successfully builds on Derecho using the Jenkins .cicd/scripts/srw_build.sh
script. Additionally, the coverage.derecho
tests were successfully run using .cicd/scripts/srw_test.sh
and all tests successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km COMPLETE 21.29
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot COMPLETE 35.41
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 COMPLETE 42.15
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR COMPLETE 26.62
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 16.56
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 38.69
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 23.09
pregen_grid_orog_sfc_climo COMPLETE 12.96
specify_template_filenames COMPLETE 14.32
----------------------------------------------------------------------------------------------------
Total COMPLETE 231.09
Approving this PR now and running the Jenkins tests for the rest of the platforms (since there is no Jenkins runner for Derecho at this time).
The Jenkins Hera Intel WE2E coverage tests failed for
It failed with a strange NetCDF failure:
A rerun of the test was successful:
The Orion and Gaea Jenkins tests have successfully passed. Awaiting completion of Hera GNU and Jet tests now. |
Both the Hera GNU and Jet WE2E coverage tests successfully passed on Jenkins. Now moving forward with merging this work. |
Modulefile and other configuration files to adapt the SRW to Derecho system.
Software stacks used for testing are hdf5/1.14.0, netcdf/4.9.2-based, similar to those used in #889.
DESCRIPTION OF CHANGES:
Adding Derecho system at UCAR/NCAR at Tier-1 machine.
Type of change
TESTS CONDUCTED:
All fundamental tests pass.
DEPENDENCIES:
This PR will resolve the issue 884:
#884
This PR depends on #889 - MERGED
DOCUMENTATION:
ISSUE:
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@mark-a-potts
Fundamental tests are successful.
#894 (comment)
WE2E_summary_20230823001411.txt
WE2E_summary_20230823013603.txt