[develop] Update weather model hash and remove "_vrfy" from bash commands #1074

MichaelLueken · 2024-04-19T13:14:28Z

DESCRIPTION OF CHANGES:

The weather model hash has been updated to 4f32a4b (April 15).

Additionally, _vrfy has been removed from the cd, cp, ln, mkdir, mv, and rm bash commands in jobs, scripts, ush, and ush/bash_utils. The modified commands don't function as intended (issue #861) and aren't accepted by NCO (issue #1021).

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

TESTS CONDUCTED:

The fundamental tests were run on all tier-1 platforms. Comprehensive tests were run on Hera Intel, Orion, and Hercules. The AQM WE2E test was run on Hera Intel and the sample warm start AQM configuration was run on Hercules.

DEPENDENCIES:

None

DOCUMENTATION:

None

ISSUE:

Fixes #861
Fixes #1021

CHECKLIST

My code follows the style guidelines in the Contributor's Guide
I have performed a self-review of my own code using the Code Reviewer's Guide
My changes do not require updates to the documentation (explain).
My changes generate no new warnings
New and existing tests pass with my changes

…and remove _vrfy from bash cp, mv, rm, ln, cd, and mkdir commands

…nd removed ush/bash_utils/filesys_cmds_vrfy.sh reference from ush/source_util_funcs.sh

…_files.sh and ush/job_preamble.sh

RatkoVasic-NOAA

Looks good!

chan-hoo · 2024-04-19T17:06:48Z

@MichaelLueken, thanks for this change:

Calculating core-hour usage and printing final summary
----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
aqm_grid_AQM_NA13km_suite_GFS_v16_20240419100352                   COMPLETE            4829.55
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            4829.55

Approving.

gsketefian · 2024-04-19T17:31:38Z

@MichaelLueken It would probably be a good idea to keep what these were meant to do, which is, in bash scripts, check whether the cp, mv, etc functions were successful. These functions originally returned an error message and stopped execution, so that every time one needs to use say cp, one doesn't have to put in a line like this:

cp file1 file2 || {print "Copy failed"; exit 1}

or something longer, and instead use

cp_vrfy file1 file2

I saw in issue #861 that:

For example, if you want to verify a file was copied with cp_vrfy, but that file does not exist, a message will be printed out, but the exit code will be set to 0, and the program will happily hum along with no failure.

Obviously, this wasn't the original intent or behavior of these functions, but this erroneous behavior was introduced somewhere along the line due to insufficient testing. So instead of just removing these, what should be done is restore the original functionality of detecting errors and exiting.

chan-hoo · 2024-04-19T17:55:03Z

@gsketefian, as far as I know (if my memory serves me correctly), the bash commands like cp exit with an error message when they fail (while the current _vrfy function does not exit; this was a big issue in my previous runs). For delivery to NCO, the -p flag should be added to the copy command like (cp -p) though. So, I think cp file1 file2 is good enough.

gsketefian · 2024-04-19T18:43:31Z

@***@***.*** I think I put in the “_vrfy” commands a long time ago because I wanted to have clearer and standardized error messages for these commands. The fact that they don’t actually exit is a bug that was introduced later, probably during the partial conversion to python. It would be great to restore that functionality to get more comprehensive error messages as before. You can still add the -p to “cp_vrfy”.

…

On Friday, April 19, 2024, Chan-Hoo.Jeon-NOAA ***@***.***> wrote: @gsketefian <https://github.com/gsketefian>, as far as I know (if my memory serves me correctly), the bash commands like cp exit with an error message when they fail (while the current _vrfy function does not exit; this was a big issue in my previous runs). For delivery to NCO, the -p flag should be added to the copy command like (cp -p) though. So, I think cp file1 file2 is good enough. — Reply to this email directly, view it on GitHub <#1074 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHM3ZYTNU3RHYBTMGJOCWHTY6FLA3AVCNFSM6AAAAABGPFNOFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXGA2DEOJQGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Gerard Ketefian Research Scientist NOAA/OAR/ESRL/GSD/EMB, R/GSD1 325 Broadway Boulder, CO 80305 phone: 970-703-3048

MichaelLueken · 2024-04-19T19:02:32Z

@gsketefian -

Thanks for expressing your concern on the removal of _vrfy from the bash commands. With this PR, I was simply following through with @mkavulich's and @chan-hoo's recommendation of fully removing the _vrfy. As @chan-hoo noted, The standard bash commands should exit automatically on a failure.

Would it be worthwhile to surround cd, cp, ln, mkdir, mv, and rm commands with set -x and set +x, so that they should error out if an issue is encountered? I'm open to more opinions and suggestions, but I don't think that we want to go back to using _vrfy once again.

gsketefian · 2024-04-19T20:10:09Z

After thinking about this a bit more, it seems to me surrounding (wrapping) those commands with set -e would be the most important thing to do. I’m away from my computer and can’t check, but probably the reason an error with the cp command causes scripts to exit (as Chan-Hoo mentioned) is that those scripts have set -e declared at the top (I think default bash behavior is not to exit a script if a command fails). So if you use the wrapped versions of these commands that include set -e, then you’ll be sure they will exit on error regardless of whether the calling script uses set -e. Just my two cents. Please proceed as you see fit Thanks.

…

On Friday, April 19, 2024, Michael Lueken ***@***.***> wrote: @gsketefian <https://github.com/gsketefian> - Thanks for expressing your concern on the removal of _vrfy from the bash commands. With this PR, I was simply following through with @mkavulich <https://github.com/mkavulich>'s and @chan-hoo <https://github.com/chan-hoo>'s recommendation of fully removing the _vrfy. As @chan-hoo <https://github.com/chan-hoo> noted, The standard bash commands should exit automatically on a failure. Would it be worthwhile to surround cd, cp, ln, mkdir, mv, and rm commands with set -x and set +x, so that they should error out if an issue is encountered? I'm open to more opinions and suggestions, but I don't think that we want to go back to using _vrfy once again. — Reply to this email directly, view it on GitHub <#1074 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHM3ZYVBRLPZRL6IORANMO3Y6FS55AVCNFSM6AAAAABGPFNOFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXGEZTIMZZG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Gerard Ketefian Research Scientist NOAA/OAR/ESRL/GSD/EMB, R/GSD1 325 Broadway Boulder, CO 80305 phone: 970-703-3048

mkavulich · 2024-04-25T14:47:31Z

@gsketefian Since it was my issue originally advocating for removal of these scripts I'll provide a defense: I don't see a reason to carry this burden and overhead of custom wrappers around simple shell builtins like this, and as I say in the original issue, they introduce additional problems! Unless I'm missing something, they provide no additional value over just using the -e flag within exscripts (which is something we should think about doing!), and just add further things into the workflow that need maintenance, can cause issues, and (as they are doing now) can mask other issues.

For the record, here is an example of a failure using the shell builtin mv:


mv: cannot stat 'fire.nml': No such file or directory
FATAL ERROR:
ERROR:
  From script:  "JREGIONAL_RUN_FCST"
  Full path to script:  "/glade/derecho/scratch/kavulich/FIRE/2024/april_updates/ufs-srweather-app/jobs/JREGIONAL_RUN_FCST"
Call to ex-script corresponding to J-job "JREGIONAL_RUN_FCST" failed.
Exiting with nonzero status.

And here is with the custom mv_vrfy, after it is "fixed" by adding set -e:



"mv_vrfy" operation returned with a message.  This command was
issued from the script in file:

  ""

Message from "mv_vrfy" function's "mv" operation:
  mv: cannot stat 'fire.nml': No such file or directory

========================================================================
Exiting script:  "JREGIONAL_RUN_FCST"
In directory:    "/glade/derecho/scratch/kavulich/FIRE/2024/april_updates/ufs-srweather-app/jobs"
========================================================================

That is nothing but a whole bunch of unhelpful extra text around the same error message. It seems as if it was meant to echo the script it was called from, but again, it does not do this! And that information would be mostly redundant anyway, since exscripts all print a message when they start.

I did not realize until this PR that set -e is not turned on in all scripts; regardless of the outcome of this discussion we should see if we need to do this as well. I did notice that certain commands (mv as one example) will fail even with the -e flag not set.

Is this a good time to mention that converting everything to python would make this problem moot? 😉

MichaelLueken · 2024-04-26T18:28:28Z

All scripts load the ush/preamble.sh script. Inside the ush/preamble.sh script, the default behavior is to use set -euo pipefail. No other changes should be required, as set -euo is already on by default for all scripts.

Addressing issue related to failures with the grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0 metric test, then will push updates and resubmit Jenkins tests.

…h script to allow the test to run successfully while ran as part of the coverage and comprehensive test suite

MichaelLueken · 2024-04-29T15:04:16Z

After correcting the grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0 test, I have resubmitted the Jenkins tests.

Gaea, Jet, and Orion are currently not available via Jenkins, so I will go ahead and manually run the WE2E coverage tests on those machines.

MichaelLueken · 2024-04-29T15:06:09Z

The WE2E tests were manually run on Orion and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_SF_1p1km_20240429084427                             COMPLETE             437.36
deactivate_tasks_20240429084428                                    COMPLETE               0.93
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me  COMPLETE            1878.41
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_  COMPLETE            1006.63
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240  COMPLETE             374.77
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202404290  COMPLETE              21.40
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240429084  COMPLETE             971.98
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_  COMPLETE              64.03
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2  COMPLETE             728.04
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202404  COMPLETE              46.58
2020_CAD_20240429084435                                            COMPLETE              71.58
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            5601.71

Additionally, the metric test was also ran and successfully completed:

COL_NAME:                           FCST_MODEL                REF_MODEL N_INIT N_TERM N_VLD SS_INDEX
 GO_INDEX FV3_WoFS_v0_SUBCONUS_3km_test_mem000 FV3_GFS_v16_SUBCONUS_3km      1     17    17  0.99807

MichaelLueken · 2024-04-29T17:48:02Z

The Jet WE2E coverage tests were manually run and all passed successfully:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240429155928                                           COMPLETE              17.51
custom_ESGgrid_20240429155929                                      COMPLETE              25.79
custom_ESGgrid_Great_Lakes_snow_8km_20240429155930                 COMPLETE              20.44
custom_GFDLgrid_20240429155932                                     COMPLETE              10.79
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202404  COMPLETE               8.00
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20  COMPLETE              84.78
get_from_HPSS_ics_RAP_lbcs_RAP_20240429155935                      COMPLETE              15.93
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240429155936  COMPLETE             617.36
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              64.04
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               7.36
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024  COMPLETE             927.52
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1799.52

Awaiting results from Gaea and Hera GNU now.

MichaelLueken · 2024-04-30T12:51:58Z

All Jenkins tests have successfully passed and the Gaea WE2E coverage tests were manually run and all successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240429144427                                           COMPLETE              46.66
custom_ESGgrid_NewZealand_3km_20240429144429                       COMPLETE             113.58
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              45.98
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240429144  COMPLETE              50.24
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024042914  COMPLETE              49.07
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson  COMPLETE             589.88
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024042  COMPLETE              38.81
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20  COMPLETE             810.44
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202  COMPLETE              21.65
2020_CAPE_20240429144455                                           COMPLETE              50.25
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1816.56

Merging this work now.

MichaelLueken added 3 commits April 16, 2024 10:01

feature/hash_update] Update weather model hash to 4f32a4b (April 15) …

efbd49f

…and remove _vrfy from bash cp, mv, rm, ln, cd, and mkdir commands

[feature/hash_update] Removed _vfry from ush/launch_FV3LAM_wflow.sh a…

65f3db6

…nd removed ush/bash_utils/filesys_cmds_vrfy.sh reference from ush/source_util_funcs.sh

[feature/hash_update] Remove _vrfy from bash commands in ush/get_mrms…

e3b7156

…_files.sh and ush/job_preamble.sh

RatkoVasic-NOAA approved these changes Apr 19, 2024

View reviewed changes

MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Apr 19, 2024

chan-hoo approved these changes Apr 19, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/develop' into feature/hash_update

05a48ed

Merge remote-tracking branch 'upstream/develop' into feature/hash_update

0a16dd0

[feature/hash_update] Correct test naming scheme for the srw_metric.s…

4e9192b

…h script to allow the test to run successfully while ran as part of the coverage and comprehensive test suite

MichaelLueken merged commit eea4c29 into ufs-community:develop Apr 30, 2024
4 of 5 checks passed

MichaelLueken deleted the feature/hash_update branch April 30, 2024 12:54

MichaelLueken mentioned this pull request Apr 30, 2024

Updates to the authoritative develop branch since the SRW v2.2 release (10/31/2023) #981

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[develop] Update weather model hash and remove "_vrfy" from bash commands #1074

[develop] Update weather model hash and remove "_vrfy" from bash commands #1074

MichaelLueken commented Apr 19, 2024

RatkoVasic-NOAA left a comment

chan-hoo commented Apr 19, 2024

gsketefian commented Apr 19, 2024

chan-hoo commented Apr 19, 2024

gsketefian commented Apr 19, 2024 via email

MichaelLueken commented Apr 19, 2024

gsketefian commented Apr 19, 2024 via email

mkavulich commented Apr 25, 2024 •

edited

Loading

MichaelLueken commented Apr 26, 2024

MichaelLueken commented Apr 29, 2024

MichaelLueken commented Apr 29, 2024

MichaelLueken commented Apr 29, 2024

MichaelLueken commented Apr 30, 2024

[develop] Update weather model hash and remove "_vrfy" from bash commands #1074

[develop] Update weather model hash and remove "_vrfy" from bash commands #1074

Conversation

MichaelLueken commented Apr 19, 2024

DESCRIPTION OF CHANGES:

Type of change

TESTS CONDUCTED:

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

RatkoVasic-NOAA left a comment

Choose a reason for hiding this comment

chan-hoo commented Apr 19, 2024

gsketefian commented Apr 19, 2024

chan-hoo commented Apr 19, 2024

gsketefian commented Apr 19, 2024 via email

MichaelLueken commented Apr 19, 2024

gsketefian commented Apr 19, 2024 via email

mkavulich commented Apr 25, 2024 • edited Loading

MichaelLueken commented Apr 26, 2024

MichaelLueken commented Apr 29, 2024

MichaelLueken commented Apr 29, 2024

MichaelLueken commented Apr 29, 2024

MichaelLueken commented Apr 30, 2024

mkavulich commented Apr 25, 2024 •

edited

Loading