Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Add new RRFS variables such as NWGES and workflow control variables #647

Conversation

danielabdi-noaa
Copy link
Collaborator

@danielabdi-noaa danielabdi-noaa commented Mar 7, 2023

DESCRIPTION OF CHANGES:

Addresses issue #654.

  • General (non-task specific) variables are added to config_defaults.yaml - New directory names NWGES, RRFSE_NWGES, ENSCTRL_COMIN etc,
  • new variables such as TAG.
  • Workflow control flags such as DO_SPINUP, DO_DACYCLE, DO_SOIL_ADJUST
  • Also RRFS_dev1 defines a bunch of new cycles besides forecast, initial etc, so those are added too.
  • Modifcations in generate_FV3LAM_wflow.py needed for creating input namelist files for DA cycles
  • Modifications to machine files, mostly worfklow related, but on jet/hera paths to data are also added

I think this PR is priority number 1 because of this since all tasks will depend on it, and can serve as a central place to coordinate workflow variables, common task variables etc.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@danielabdi-noaa danielabdi-noaa changed the title Feature/rrfs params [develop] Add new RRFS variables such as NWGES and controls Mar 7, 2023
@danielabdi-noaa danielabdi-noaa marked this pull request as ready for review March 7, 2023 14:08
@danielabdi-noaa danielabdi-noaa linked an issue Mar 8, 2023 that may be closed by this pull request
danielabdi-noaa added a commit to danielabdi-noaa/ufs-srweather-app that referenced this pull request Mar 9, 2023
@danielabdi-noaa danielabdi-noaa force-pushed the feature/rrfs_params branch 2 times, most recently from e9f8a61 to 7881c7f Compare March 10, 2023 02:38
Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa Overall, these changes look good to me. I agree that I'm not entirely sure what the purpose of the SERVICE_ACCOUNT is in the ush/config_defaults.yaml file. I have reached out to @hu5970 to see if he can shine some light on it (or if it even needs to be included). Also, the newly added RRFS_CONUS_3km_HRRRIC grid appears to be identical to the RRFS_CONUScompact_3km grid. The only difference is that the DT_ATMOS value was decreased recently to use less wasteful values. So, I would suggest using RRFS_CONUScompact_3km rather than RRFS_CONUS_3km_HRRRIC.

#-----------------------------------------------------------------------
MACHINE: "BIG_COMPUTER"
ACCOUNT: ""
SERVICE_ACCOUNT: '{{ user.ACCOUNT }}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the NOAA-GSL ush/config_defaults.yaml file has both a SERVICE_ACCOUNT and HPSS_ACCOUNT, where HPSS_ACCOUNT defaults to SERVICE_ACCOUNT if not set. @hu5970, are these entries required? I'm seeing that they appeared to be added for Jet. Any additional information would be greatly appreciated. Thanks!

WRTCMP_lon_lwr_left: -122.17364391
WRTCMP_lat_lwr_left: 21.88588562
WRTCMP_dx: 3000.0
WRTCMP_dy: 3000.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the RRFS_CONUS_3km_HRRRIC grid is identical to the RRFS_CONUScompact_3km grid, with the exception of DT_ATMOS being set to 36. This modification was merged at fd3c778 and was to use less wasteful values of DT_ATMOS. It would probably be best to have the scripts use the RRFS_CONUScompact_3km grid, rather than add the new RRFS_CONUS_3km_HRRRIC grid with a different DT_ATMOS value.

@danielabdi-noaa danielabdi-noaa changed the title [develop] Add new RRFS variables such as NWGES and controls [develop] Add new RRFS variables such as NWGES and workflow control variables Apr 3, 2023
@danielabdi-noaa
Copy link
Collaborator Author

@MichaelLueken Thanks for capturing the predefined grid duplicates. It looks like there are no new predef grids to add, so I have removed all changes to predef_grid file.

@MichaelLueken
Copy link
Collaborator

@danielabdi-noaa Going through the latest changes in the NOAA-GSL/regional_workflow/RRFS_dev1 branch associated with ush/config_defaults.sh, I'm noting the following missing or changed options:

Until this PR has been merged, I will keep an eye out on further changes to the ush/config_defaults.sh file.

@danielabdi-noaa
Copy link
Collaborator Author

@MichaelLueken Thanks a lot for going through the commits and summarizing the changes! I have updated this PR with most of the new variable changes, except for two, OUTPUT_FH goes under task_run_fcst and PREP_FVCOM goes under task_make_ics so those are put in future PRs for the respective tasks.

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielabdi-noaa I appreciate your contribution here.

I am very concerned about this approach to adding a bunch of variables that aren't used yet. We have no way to test them, and no way to know whether they will be relevant or needed when the tasks that currently use them in RRFS_dev1 are contributed.

I am also concerned that it will be difficult to communicate what does and does not work to the rest of the community. What I mean is, if someone sees a flag that says "do_something" they expect "something" to work. Given that is not true, I do not think it's fair for us to include those sorts of flags until the feature is enabled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be added yet. We don't have the capabilities in place to run or test any of it, and many of the settings here will not ever be supported in SRW.

Copy link
Collaborator Author

@danielabdi-noaa danielabdi-noaa Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you can run it even now. It will generate a workflow file that doesn't have any of the rrfs tasks.

@@ -5,7 +5,7 @@ metadata:
description: >-
Default configuration for an experiment. The valid values for most of the
parameters are specified in valid_param_vals.yaml
version: !!str '1.0'
version: !!str '2.0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how we will want to bump versions, but it definitely doesn't seem like the time to do that now.

Copy link
Collaborator Author

@danielabdi-noaa danielabdi-noaa Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not the right time? Infact, everytime a variable is added is time for a minor version update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is generally unclear to me whether this should be the SRW version, the release version, etc.

#
# RESERVATION_POST:
# The reservation for post tasks.
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partitions and queues are now all set in the parm/wflow files. None of these are needed here. Reservations, when they are added, should also live in the parm/wflow files.

Copy link
Collaborator Author

@danielabdi-noaa danielabdi-noaa Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the parm/wflow/default_worklfow.yaml files get the there settings from config files, infact, we aready have partitions/queues in them already, so I am just extending that for jobs RRFS adds

    PARTITION_DEFAULT: '{{ platform.get("PARTITION_DEFAULT") }}'
    PARTITION_FCST: '{{ platform.get("PARTITION_FCST") }}'
    PARTITION_HPSS: '{{ platform.get("PARTITION_HPSS") }}'
    QUEUE_DEFAULT: '{{ platform.get("QUEUE_DEFAULT") }}'
    QUEUE_FCST: '{{ platform.get("QUEUE_FCST") }}'
    QUEUE_HPSS: '{{ platform.get("QUEUE_HPSS") }}'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting that we don't necessarily have to set them explicitly here in config defaults. Here, the default queue is the main queue on a platform. FCST is special because it might need bigger nodes with more memory, and HPSS has internet access -- essentially it's the front-end node on platforms where that makes sense. Because each of these three function in separate ways for the workflow, we can choose from those 3 which task runs under which queue in the task definition.

There is no need for a one-to-one correspondence between available config defaults variables and what values are set as entities.

So, in the definition of the post jobs, we can modify the entry to specifically be debug if we want to change it for a given user/platform.

PARTITION_POST: ""
QUEUE_POST: ""
RESERVATION: ""
RESERVATION_POST: ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these will be needed here as they can be set explicitly in the parm/wflow files for the tasks when they are added to the workflow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above! We are not setting them directly, but getting them from platform section of the config file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I argue above, we do not need them as they generally don't serve any additional function on our platforms than the current functions allow in the XML generation.

ush/config_defaults.yaml Outdated Show resolved Hide resolved
ush/fill_jinja_template.py Show resolved Hide resolved
@@ -360,6 +359,8 @@ def generate_FV3LAM_wflow(ushdir, logfile: str = "log.generate_FV3LAM_wflow", de
"target_lon": LON_CTR,
"target_lat": LAT_CTR,
"nrows_blend": HALO_BLEND,
"regional_bcs_from_gsi": False,
"write_restart_with_bcs": False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this impact AQM?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea @chan-hoo probably knows.

RAPHRR_SOIL_ROOT: ""
FIRE_RAVE_DIR: ""
AIRCRAFT_REJECT: "/scratch2/BMC/zrtrr/rli/data/amdar_reject_lists"
SFCOBS_USELIST: "/scratch2/BMC/zrtrr/rli/data/mesonet_uselists"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the appropriate structure for retrieve_data.py to gather these. I believe there are other efforts to get the data paths incorporated into the workflow.

Copy link
Collaborator Author

@danielabdi-noaa danielabdi-noaa Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these "fixed files" that do not change from test case to test case? retrieve_data.py has so far been used to get ICS/LBCS that change from one test case to test case. @EdwardSnyder-NOAA has put some of these files under epic role account and these paths can be changed as new files are added to locations that SRW can use.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths are a combination of fixed and observation files. We were trying to replicate the functionality from the set_rrfs_config.sh found in the dev1 branch. Essentially this script sets paths based on machine and model run (retro/current) type. It made sense to add the logic here even though the retrieve_data.py script isn't used to grab these files. We are definitely open for suggestions on placing this logic elsewhere.

In addition, how often should we utilize the retrieve_data.py script? As the process lightning task grabs lightning data in the jobs script without the retrieve_data.py script. Would we need to integrate all the rrfs obs data into the retrieve_data.py script?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should be using the retrieve_data script any time we need to pull data from disk and/or external resources.

One assumption made in RRFS_dev was that we'd always have data staged on disk for real-time or retro runs. That is not an appropriate assumption for the community who likely have different goals. SRW users working outside the real-time paradigm, or on platforms where real-time data streams aren't available, or on retrospective case studies, will need a way to pull data from archives like HPSS or AWS.

The retrieve_data tool is meant to allow a user to define whether they have staged data on disk via their own doing, or a supported data stream, or whether the workflow should go get it for them from known external locations.

@@ -58,7 +58,7 @@ def set_FV3nml_ens_stoch_seeds(cdate):
#
# -----------------------------------------------------------------------
#
fv3_nml_ensmem_fp = f"{os.getcwd()}{os.sep}{FV3_NML_FN}"
fv3_nml_ensmem_fp = f"{os.getcwd()}{os.sep}{FV3_NML_FN}_base"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the name of the file changes, it should change in config_defaults so that it's clear what it is from the beginning. One of the features of having configuration settings is that there is rarely a need to change these types of things in the scripts.

@@ -174,7 +174,7 @@ def setUp(self):
mkdir_vrfy("-p", EXPTDIR)
cp_vrfy(
os.path.join(PARMdir, "input.nml.FV3"),
os.path.join(EXPTDIR, "input.nml"),
os.path.join(EXPTDIR, "input.nml_base"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, it seems like we should be referencing the configuration setting instead of repeating this information, especially if we'd like to change it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what your concern is here, but the way RRFS handles input namelist files is different from how SRW does it. RRFS generates four different input namelist files and symlinks it to input.nml depending on the configuration, as opposed to in-place modification of one namelist file that SRW uses. This also applies for the stochastic physics options modification and I tested that this modification works for the WE2E test community_ensemble_2mems_stoch.

@@ -5,7 +5,7 @@ metadata:
description: >-
Default configuration for an experiment. The valid values for most of the
parameters are specified in valid_param_vals.yaml
version: !!str '1.0'
version: !!str '2.0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is generally unclear to me whether this should be the SRW version, the release version, etc.

#
# RESERVATION_POST:
# The reservation for post tasks.
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting that we don't necessarily have to set them explicitly here in config defaults. Here, the default queue is the main queue on a platform. FCST is special because it might need bigger nodes with more memory, and HPSS has internet access -- essentially it's the front-end node on platforms where that makes sense. Because each of these three function in separate ways for the workflow, we can choose from those 3 which task runs under which queue in the task definition.

There is no need for a one-to-one correspondence between available config defaults variables and what values are set as entities.

So, in the definition of the post jobs, we can modify the entry to specifically be debug if we want to change it for a given user/platform.

PARTITION_POST: ""
QUEUE_POST: ""
RESERVATION: ""
RESERVATION_POST: ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I argue above, we do not need them as they generally don't serve any additional function on our platforms than the current functions allow in the XML generation.

# Setup default locations for FIRE_RRFS files and update time
# FIRE_RAVE_DIR
# FIRE_RRFS_ROOT
# FIRE_RRFS_update_hour
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data: section of the machine files in SRW currently is meant to work with retrieve_data.py script to set where to get the datas on disk and is a structured section with the structure strictly defined by that tool. The section should remain consistent with the tool's existing behavior.

Comment on lines +1050 to +1055
CYCL_HRS: []
CYCL_HRS_SPINSTART: []
CYCL_HRS_PRODSTART: []
CYCL_HRS_PRODSTART_ENS: []
CYCL_HRS_RECENTER: []
CYCL_HRS_STOCH: []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I'm suggesting that the necessary variables be added with the tool that requires them. They should be added when they are needed, not separately where they cannot be tested on their own, or recognized as to why they are useful.

They are included in PR #638 , and I suggest that be the only place they are added.

POSTPROC_LEN_HRS: 1
POSTPROC_LONG_LEN_HRS: 1
FCST_LEN_HRS_SPINUP: 1
FCST_LEN_HRS_CYCLES: []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggest, this one should not be included. A proposed change for the current solution for AQM is being discussed in Issue #709

RRFSE_NWGES: '{{ OPSROOT }}/nwges'
RRFSE_NWGES_BASEDIR: '{{ RRFSE_NWGES }}'
ENSCTRL_NWGES: '{{ OPSROOT }}/nwges'
ENSCTRL_NWGES_BASEDIR: '{{ ENSCTRL_NWGES}}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting in a clean implementation now is preferred. A main goal of the RRFS_dev1 merge is to take care of the refactoring needed to address some of the accidental complexities that have snuck in over time.

START_TIME_CONVENTIONAL: 00:40:00
START_TIME_NSSLMOSIAC: 00:45:00
START_TIME_LIGHTNINGNC: 00:45:00
START_TIME_PROCSMOKE: 00:45:00
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rocoto section of the config files are 100% configurable. I am definitely not suggesting that the users edit any of the parm/wflow files. The sections defined in those parm/wflow files are completely configurable by the user in their own user config. There is no need to add the sections to the config_defaults.yaml file.

#-----------------------------------------------------------------------
#
LBCS_SEARCH_HRS: 6
EXTRN_MDL_LBCS_SEARCH_OFFSET_HRS: 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my YAGNI argument above. It is a standard, industry wide best practice, and does apply in this case.

RAPHRR_SOIL_ROOT: ""
FIRE_RAVE_DIR: ""
AIRCRAFT_REJECT: "/scratch2/BMC/zrtrr/rli/data/amdar_reject_lists"
SFCOBS_USELIST: "/scratch2/BMC/zrtrr/rli/data/mesonet_uselists"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should be using the retrieve_data script any time we need to pull data from disk and/or external resources.

One assumption made in RRFS_dev was that we'd always have data staged on disk for real-time or retro runs. That is not an appropriate assumption for the community who likely have different goals. SRW users working outside the real-time paradigm, or on platforms where real-time data streams aren't available, or on retrospective case studies, will need a way to pull data from archives like HPSS or AWS.

The retrieve_data tool is meant to allow a user to define whether they have staged data on disk via their own doing, or a supported data stream, or whether the workflow should go get it for them from known external locations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Define shared variables and predefined grids for all tasks
7 participants