Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Modify get_lbcs and make_lbcs. #666

Merged

Conversation

danielabdi-noaa
Copy link
Collaborator

DESCRIPTION OF CHANGES:

Addresses issue #665. Get_lbcs need to download 0th hour LBCS and make_lbcs need to be parallelized.
Note that there is no workflow so this PR can not be run independently until then.
However, it did successfully run for both spinup/production cycles in PR #540

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@panll
Copy link
Collaborator

panll commented Apr 12, 2023

Tested on Hera. Running time error:
.../ufs-srweather-app/jobs/JREGIONAL_MAKE_ICS: line 63: NWGES_DIR: unbound variable
End JREGIONAL_MAKE_ICS at Tue Apr 11 17:42:24 UTC 2023 with error code 1 (time elapsed: 00:00:00)

@danielabdi-noaa
Copy link
Collaborator Author

danielabdi-noaa commented Apr 13, 2023

@panll I have added the necessary NWGES variables but this PR requires a workflow update for the specific way RRFS downloads LBCS, so it can not be tested without that.

jobs/JREGIONAL_MAKE_ICS Show resolved Hide resolved
parm/input.nml.FV3 Outdated Show resolved Hide resolved
scripts/exregional_get_extrn_mdl_files.sh Outdated Show resolved Hide resolved
scripts/exregional_make_ics.sh Outdated Show resolved Hide resolved
scripts/exregional_make_ics.sh Show resolved Hide resolved
scripts/exregional_make_lbcs.sh Show resolved Hide resolved
scripts/exregional_make_lbcs.sh Show resolved Hide resolved
Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daniel, thanks for preparing this PR. I've left some comments below that focus on making sure that we don't change existing behavior while capturing RRFS configuration settings, and that the mods stay in scope for the intent of this PR.

scripts/exregional_get_extrn_mdl_files.sh Outdated Show resolved Hide resolved
scripts/exregional_get_extrn_mdl_files.sh Show resolved Hide resolved
scripts/exregional_make_lbcs.sh Show resolved Hide resolved
scripts/exregional_make_lbcs.sh Show resolved Hide resolved
scripts/exregional_make_lbcs.sh Outdated Show resolved Hide resolved
ush/config_defaults.yaml Outdated Show resolved Hide resolved
ush/config_defaults.yaml Show resolved Hide resolved
@danielabdi-noaa danielabdi-noaa added the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Apr 25, 2023
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Apr 25, 2023
@venitahagerty
Copy link
Collaborator

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1271094432/20230425193511/ufs-srweather-app
Build was Successful
Rocoto jobs started
Experiment done: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
2023-04-25 20:13:24 +0000 :: hfe07 :: This cycle is complete: Success
Experiment failed: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
2023-04-25 20:27:14 +0000 :: hfe07 :: Task run_MET_PcpCombine_fcst_APCP01h_mem000, jobid=44194882, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=2 (of 2)
Experiment done: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
2023-04-25 20:11:40 +0000 :: hfe07 :: This cycle is complete: Success
Experiment done: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2023-04-25 20:07:58 +0000 :: hfe07 :: This cycle is complete: Success
Experiment done: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
2023-04-25 20:09:30 +0000 :: hfe07 :: This cycle is complete: Success
Experiment done: pregen_grid_orog_sfc_climo
2023-04-25 20:07:17 +0000 :: hfe07 :: This cycle is complete: Success
Experiment done: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
2023-04-25 20:07:31 +0000 :: hfe07 :: This cycle is complete: Success
If test failed, please make changes and add the following label back:
ci-hera-intel-WE

@danielabdi-noaa danielabdi-noaa self-assigned this Apr 26, 2023
Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa Overall, these changes look good to me. I did want to point out a potential issue between NCO implementation and symlinks. NCO created a Bugzilla ticket to remove symlinks from the global workflow. From Steven Earle:

The issue described in the ticket goes much further than just the recent wave issue; I can update the bugzilla if that's desired.
DATA needs to be self contained for several reasons....
-- Portability
-- Contained IO, making management of the system/storage possible
-- Pristine place to save after failures for debug/troubleshooting later

While the use of symlinks are fine in develop, it is possible that NCO will require that they are replaced with copy/move/rsync in operations. While this work can be done in the release branch for RRFS, it is still something to keep in mind moving forward.

jobs/JREGIONAL_MAKE_ICS Show resolved Hide resolved
@danielabdi-noaa
Copy link
Collaborator Author

@MichaelLueken Thanks, I agree with your assessment of use of symlinks. The NCO implementation here that uses NWGES does not conform with latest NCO standards, however, RRFS_dev1 uses it throughout. It was decided a couple of months ago to use it as a temporary solution, but the plan was indeed to get rid of the NWGES directory and thus associated symlinks in the future.

Copy link
Collaborator

@panll panll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Hera, it works fine.

@danielabdi-noaa
Copy link
Collaborator Author

@christinaholtNOAA Is there something I need to address other than the potential AQM issue? I did run the make_lbcs test with an AQM test case a while ago and it seems to work.

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the delay in re-reviewing. I've left a few more comments below. It seems we're still waiting on @chan-hoo to weigh in, as well.

ush/config_defaults.yaml Show resolved Hide resolved
ush/config_defaults.yaml Show resolved Hide resolved
ush/config_defaults.yaml Show resolved Hide resolved
ush/config_defaults.yaml Show resolved Hide resolved
@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label May 12, 2023
@MichaelLueken
Copy link
Collaborator

@danielabdi-noaa There were a few failures in the Jenkins tests, but they were expected failures:

On Cheyenne GNU - the nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson_mynn_lam3km test failed intermittently in the run_post tasks. This has been an issue since the test has been introduced.

Following the PrgEnv update on Gaea, it is currently impossible to build and run the SRW and weather model on the machine. Working with Natalie to get this corrected as soon as possible.

On Hera Intel - the grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta test failed in run_fcst (issue #731).

Since the failures are expected, I will now move forward with merging this PR to develop.

@MichaelLueken MichaelLueken merged commit 188a4f8 into ufs-community:develop May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Download 0th hour LBCS and parallelize make_lbcs for RRFS
6 participants