Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need a plan for connecting CIME to the public input data sources #15

Closed
mvertens opened this issue Dec 6, 2019 · 61 comments
Closed

Comments

@mvertens
Copy link
Collaborator

mvertens commented Dec 6, 2019

In order for CIME to be extended to support different forecast initial conditions flexibly, we need to understand the directory structure in the FTP site that will be made available. We need to understand the requirements for CIME to obtain this data and the list of files that will be needed.
A major concern is the potentially large size of some of the files - and we need to determine if they need to be manually downloaded or have CIME download them automatically as part of its workflow.

@arunchawla-NOAA
Copy link
Collaborator

We have an ftp site that was opened up for staging data. Documentation is still weak. Suggest a meeting with Jun Wang and Kate Friedman to bolster this. We should explore using GIT-LFS added to mr-weather-app or to the weather-model to see if the data can be downloaded automatically

@junwang-noaa
Copy link

@mvertens, could we have a meeting to discuss this issue? Thanks.

@rsdunlapiv
Copy link
Collaborator

I will try to put something on the calendar to discuss. I am interested in @arunchawla-NOAA suggestion of GIT-LFS and how that repo could be connected to CIME to download data as needed. Another question is whether there is a standard directory structure expected on supported platforms, e.g., fixed files, ICs, etc. - is there a standard structure already used on NOAA systems, e.g., Hera, that should be generalized and adopted across all platforms. CESM uses this idea of a shared root data directory - how well does this apply here?

@rsdunlapiv
Copy link
Collaborator

Could @KateFriedman-NOAA please provide a list of current input data sources expected to be used by the release? e.g., the FTP site and any others, GIT-LFS, etc.

@uturuncoglu
Copy link
Collaborator

In the current ftp site the input files (except static ones) are in tar format and it is hard to access individual files without extracting the file. We need to have a hierachical and standardized folder structure rather than having single tar file for input. It could be also nice to have the input files for different resolution (i.e. C384) and also support for CCPP.

@rsdunlapiv
Copy link
Collaborator

Given the large size of many of these files - should we support a compression mechanism?

@KateFriedman-NOAA
Copy link
Collaborator

KateFriedman-NOAA commented Dec 9, 2019

I know of the following on the EMC ftp site:

  1. FIX files: https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix

...which are broken into sub-groups:

  • fix_am
  • fix_chem
  • fix_fv3
  • fix_fv3_gmted2010
  • fix_gldas
  • fix_orog
  • fix_sfc_climo
  • fix_verif

This is the full collection of FV3GFS fix files as of November 12th. Most likely overkill. Do we need a paired down collection?

  1. C96 canned case tarball (originally made available for NASA folks)

https://ftp.emc.ncep.noaa.gov/EIB/UFS/RT/fv3_gfdlmprad.tar

@rsdunlapiv
Copy link
Collaborator

@KateFriedman-NOAA this is great and already very helpful! Can you provide (or is there already) a brief description of what's in each of those subdirectories? Are these subdirs expected to be reproduced as is on all supported platforms?

@rsdunlapiv
Copy link
Collaborator

rsdunlapiv commented Dec 9, 2019

@KateFriedman-NOAA The global/fix directory has the fixed files. What is the public source for users to retrieve initial conditions and boundary conditions? Is the plan to host those separately?

@KateFriedman-NOAA
Copy link
Collaborator

@KateFriedman-NOAA this is great and already very helpful! Can you provide (or is there already) a brief description of what's in each of those subdirectories? Are these subdirs expected to be reproduced as is on all supported platforms?

I'm mostly a facilitator of copying these files to our supported platforms so I can't give a detailed description. Is one needed? Here is a very brief description:

  • fix_am - atmospheric
  • fix_chem - chemistry
  • fix_fv3 - fv3
  • fix_fv3_gmted2010 - another fv3
  • fix_gldas - GLDAS
  • fix_orog - orography
  • fix_sfc_climo - surface climo
  • fix_verif - verification

When we get updated fix files myself or another developer copies it into one of the main collections (FIX_DIR) on the WCOSS-Dell and then I copy them to the FIX_DIRs on WCOSS-Dell (other side), both WCOSS-Crays, Hera, and Jet. I also save the whole collection in a new tarball on HPSS.

If needed I can copy whatever final set of fix files folks land on for the release to the supported platforms. We hold them under a group account.

@KateFriedman-NOAA The global/fix directory has the fixed files. What is the public source for users to retrieve initial conditions and boundary conditions? Is the plan to host those separately?

There is no public source currently. Our archival server (HPSS) is not accessible by the public, only from NOAA machines. The release team would need to post a sample set online somewhere (our ftp server maybe). NCEI is a public source for model output but I don't see the restart files there, just some post-processed grib output. Access to model output, especially initial conditions, has become a real issue for anyone who doesn't have access to a NOAA machine. :(

@ligiabernardet
Copy link
Collaborator

ligiabernardet commented Dec 10, 2019 via email

@mvertens
Copy link
Collaborator Author

@arunchawla-NOAA @KateFriedman-NOAA @ligiabernardet -

@rsdunlapiv @jedwards4b and @mvertens propose the following structure for managing and hosting input data for the release:

  • fixed files:
    • The structure of the FTP repository for fixed files should have a date string in the directory name for those fixed files along with a README documenting the date. That way when these files are downloaded they would be accompanied by a README documenting the date of the directory.
    • New or updated fixed files should be created in a new directory with a new timestamp in the directory name and a new README. This is essential for establishing experimental provenance.
  • initial files:
    • We need to provide a mechanism for the community to access these files.
    • For the release, the simplest is to just use the FTP site
    • There should be no tar files for this data. CIME input data mechanism is built to automatically retrieve individual files required for a given forecast and that are not on local disk.
    • Directories containing the input data should have the date of the start of the forecast in the directory name.
    • We understand that at the time of the release this might be a limited set of data - but a convention should be established that permits the extensibility of this mechanism for data retrieval as more initial data is made available.
    • Note that the CIME input data mechanism allows the user community to define their own FTP site and populate with their own data.
    • We are not sure if boundary conditions files are needed and if so how they should be provided.

@ligiabernardet
Copy link
Collaborator

The proposal about fix files seems good to me.

The proposal for initial files is fine along as it is extensible. It is my understanding that the Data Prep release team is working on having the capability for the MR Weather App to run from GFS GRIB2 files available through public archives (such as NOAA NOMADS and NCAR). This is very important for the community to use this app for research. It would be good if CIME can start the app from those public files. User may have to download and stage the data on disk by hand - that is fine.

Lateral boundary conditions are not needed for the global MR Weather app configuration because it is a global domain.

@jedwards4b
Copy link
Collaborator

@ligiabernardet Can you provide an example of files from these public archives that would allow us to run the model? Thanks

@ligiabernardet
Copy link
Collaborator

No, I cannot. The current model I have access to does not work from GFS GRIB2 files available in public archives. The UFS release data prep team was doing enhancements to allow this capability. @LarissaReames-NOAA Do you have any update on this?

@LarissaReames-NOAA
Copy link
Collaborator

LarissaReames-NOAA commented Dec 11, 2019

@ligiabernardet @jedwards4b The files that we have been testing against are located on NCEP's http server of the form gfs.tCCz.pgrb2.0p25.fFFF or gfs.tCCz.pgrb2.0p50.fFFF. We're also looking to support older archived files on NCDC's nomads server. This is just one example of those files. Functionally, these files should be very similar.

There is also a NCEI server (Arun).

@jedwards4b
Copy link
Collaborator

@LarissaReames-NOAA Do you have a timeline of when I could expect to be able to run the ufs_mrweather_app using files from nomads?

@LarissaReames-NOAA
Copy link
Collaborator

@jedwards4b We fixed one last bug in implementing the surface parameter processing code yesterday, so we'll be able to start testing that soon. @arunchawla-NOAA might have more information on how long he thinks that might take.

@uturuncoglu
Copy link
Collaborator

@LarissaReames-NOAA is it possible to extract fv3_gfdlmprad.tar under /EIB/UFS/RT directory in the ftp side. By this way, CIME could access individual required files in it.

@KateFriedman-NOAA
Copy link
Collaborator

@LarissaReames-NOAA is it possible to extract fv3_gfdlmprad.tar under /EIB/UFS/RT directory in the ftp side. By this way, CIME could access individual required files in it.

@uturuncoglu I put that tarball up on our ftp server and have access to it. @arunchawla-NOAA @junwang-noaa Any objections if I unpack the tarball on our ftp server?

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA @yangfanglin @arunchawla-NOAA Is the FTP directory structure changed? I could not see RT directory anymore. There is a simple-test-case/ directory but it is a tar file. We were using RT directory to get some files such as tables etc.

@arunchawla-NOAA
Copy link
Collaborator

why are you using RT? That is a snapshot of one of the regression test cases. What files are you needing ? I am adding @junwang-noaa and @DusanJovic-NOAA

@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Dec 20, 2019

@arunchawla-NOAA @DusanJovic-NOAA @junwang-noaa The list of files that are retrieved from RT is

data_table
diag_table
field_table
nems.configure -> i could create it with script for mrweather
gfs_ctrl.nc
sfc_data.tile*.nc
gfs_data.tile*.nc

The nc files are required for the example test case, that will be default for the application. The table files can be retrieved from the source directory but i am not sure.

@arunchawla-NOAA
Copy link
Collaborator

arunchawla-NOAA commented Dec 20, 2019 via email

@uturuncoglu
Copy link
Collaborator

I think that it would be nice to have sample set of ICs that could be used to run the model without chgres. I also did not implement restart capability yet. We might need some files for it. BTW, is there any documentation that has information about restarting the standalone FV3.

@uturuncoglu
Copy link
Collaborator

@arunchawla-NOAA Then, we need to extract it just as we did before for fv3_gfdlmprad.

@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Dec 20, 2019

@arunchawla-NOAA @GeorgeGayno-NOAA @KateFriedman-NOAA @yangfanglin How do we handle required fixed resolution dependent input files for CHGRES? So, following files needed for each supported resolution and need to be placed under FTP

C96.facsf.*.nc
C96.maximum_snow_albedo.*.nc
C96.slope_type.*.nc
C96.snowfree_albedo.*.nc
C96.soil_type.*.nc
C96.substrate_temperature.*.nc
C96.vegetation_greenness.*.nc
C96.vegetation_type.*.nc

@uturuncoglu
Copy link
Collaborator

Currently, i am getting them from the directory that i used for the prototype version of workflow.

@uturuncoglu
Copy link
Collaborator

CHGRES also required to

global_hyblev.l65.txt

So, maybe it could be good to create POST and PRE directories and place required fixed input files there. Any idea?

@arunchawla-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA and @LarissaReames-NOAA

Kate set up an ftp server with all the fix files needed for the model. Can you check to see if all the files needed for chgres are also there? If not can you let Kate know where to add them for, or alternatively place them in a directory there? Location of ftp server below

https://ftp.emc.ncep.noaa.gov/EIB/UFS/

@GeorgeGayno-NOAA
Copy link
Collaborator

@GeorgeGayno-NOAA and @LarissaReames-NOAA

Kate set up an ftp server with all the fix files needed for the model. Can you check to see if all the files needed for chgres are also there? If not can you let Kate know where to add them for, or alternatively place them in a directory there? Location of ftp server below

https://ftp.emc.ncep.noaa.gov/EIB/UFS/

The files needed for chgres are there.

@uturuncoglu
Copy link
Collaborator

I think we need to open this again. Following issues are not solved yet.

1 - We need to extract the tar file under
https://ftp.emc.ncep.noaa.gov/EIB/UFS/simple-test-case/

2 - global_hyblev.l65.txt is in
https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix/fix_am.v20191213/
that is fine but we don't have following files and those are resolution dependent

C96.facsf..nc
C96.maximum_snow_albedo.
.nc
C96.slope_type..nc
C96.snowfree_albedo.
.nc
C96.soil_type..nc
C96.substrate_temperature.
.nc
C96.vegetation_greenness..nc
C96.vegetation_type.
.nc

There are some files under this directory but those are in grib format. I think it would be better to put those netcdf files under

https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix/fix_fv3_gmted2010.v20191213/

based on their resolution.

@uturuncoglu uturuncoglu reopened this Dec 23, 2019
@GeorgeGayno-NOAA
Copy link
Collaborator

I think we need to open this again. Following issues are not solved yet.

1 - We need to extract the tar file under
https://ftp.emc.ncep.noaa.gov/EIB/UFS/simple-test-case/

2 - global_hyblev.l65.txt is in
https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix/fix_am.v20191213/
that is fine but we don't have following files and those are resolution dependent

C96.facsf..nc C96.maximum_snow_albedo..nc
C96.slope_type..nc C96.snowfree_albedo..nc
C96.soil_type..nc C96.substrate_temperature..nc
C96.vegetation_greenness..nc C96.vegetation_type..nc

The above files are under the ./fix_sfc subdirectory.

There are some files under this directory but those are in grib format. I think it would be better to put those netcdf files under

https://ftp.emc.ncep.noaa.gov/EIB/UFS/global/fix/fix_fv3_gmted2010.v20191213/

based on their resolution.

@uturuncoglu
Copy link
Collaborator

Okay. I could see them now. Thanks @GeorgeGayno-NOAA. We still need to extract the tar file.

@KateFriedman-NOAA
Copy link
Collaborator

KateFriedman-NOAA commented Dec 30, 2019

@uturuncoglu I have unpacked the tarball:

https://ftp.emc.ncep.noaa.gov/EIB/UFS/simple-test-case/

I also moved the gzipped tarball up one level so it wasn't within the unpacked folder. Let me know if I should put it back down within the simple-test-case folder. The path to that tarball is now:

https://ftp.emc.ncep.noaa.gov/EIB/UFS/simple-test-case.tar.gz

[emc.glopara@vm-lnx-emcrzdm01 UFS]$ pwd
/home/ftp/emc/EIB/UFS
[emc.glopara@vm-lnx-emcrzdm01 UFS]$ ll
total 128464
drwxr-xr-x. 3 emc.glopara emc      4096 Dec 18 14:01 global
drwxr-xr-x. 4 emc.glopara emc      4096 Dec 30 15:08 simple-test-case
-rw-r--r--. 1 emc.glopara emc 131018461 Dec 20 19:15 simple-test-case.tar.gz
[emc.glopara@vm-lnx-emcrzdm01 UFS]$ ll simple-test-case
total 66816
-rw-r--r--. 1 emc.glopara emc  3135393 Dec 20 19:13 aerosol.dat
-rw-r--r--. 1 emc.glopara emc  3111408 Dec 20 19:13 CFSR.SEAICE.1982.2012.monthly.clim.grb
-rw-r--r--. 1 emc.glopara emc    24484 Dec 20 19:13 co2historicaldata_2016.txt
-rw-r--r--. 1 emc.glopara emc        0 Dec 20 19:13 data_table
-rw-r--r--. 1 emc.glopara emc    22919 Dec 20 19:13 diag_table
-rw-r--r--. 1 emc.glopara emc      678 Dec 20 19:13 field_table
-rw-r--r--. 1 emc.glopara emc  1394712 Dec 20 19:13 global_albedo4.1x1.grb
-rw-r--r--. 1 emc.glopara emc     8274 Dec 20 19:13 global_glacier.2x2.grb
-rw-r--r--. 1 emc.glopara emc     8274 Dec 20 19:13 global_maxice.2x2.grb
-rw-r--r--. 1 emc.glopara emc    39994 Dec 20 19:13 global_mxsnoalb.uariz.t126.384.190.rg.grb
-rw-r--r--. 1 emc.glopara emc   568856 Dec 20 19:13 global_o3prdlos.f77
-rw-r--r--. 1 emc.glopara emc  5468834 Dec 20 19:13 global_shdmax.0.144x0.144.grb
-rw-r--r--. 1 emc.glopara emc  5468834 Dec 20 19:13 global_shdmin.0.144x0.144.grb
-rw-r--r--. 1 emc.glopara emc    89184 Dec 20 19:13 global_slope.1x1.grb
-rw-r--r--. 1 emc.glopara emc   438768 Dec 20 19:13 global_snoclim.1.875.grb
-rw-r--r--. 1 emc.glopara emc  1919712 Dec 20 19:13 global_snowfree_albedo.bosu.t126.384.190.rg.grb
-rw-r--r--. 1 emc.glopara emc  1743072 Dec 20 19:13 global_soilmgldas.t126.384.190.grb
-rw-r--r--. 1 emc.glopara emc    21524 Dec 20 19:13 global_soiltype.statsgo.t126.384.190.rg.grb
-rw-r--r--. 1 emc.glopara emc    20094 Dec 20 19:13 global_tg3clim.2.6x1.5.grb
-rw-r--r--. 1 emc.glopara emc 15713952 Dec 20 19:13 global_vegfrac.0.144.decpercent.grb
-rw-r--r--. 1 emc.glopara emc    24602 Dec 20 19:13 global_vegtype.igbp.t126.384.190.rg.grb
-rw-r--r--. 1 emc.glopara emc   681408 Dec 20 19:13 global_zorclim.1x1.grb
drwxr-xr-x. 2 emc.glopara emc     4096 Dec 30 15:05 INPUT
-rw-r--r--. 1 emc.glopara emc     7315 Dec 20 19:13 input.nml
-rw-r--r--. 1 emc.glopara emc     1149 Dec 20 19:13 model_configure
-rw-r--r--. 1 emc.glopara emc       76 Dec 20 19:13 nems.configure
drwxr-xr-x. 2 emc.glopara emc     4096 Dec 30 15:05 RESTART
-rw-r--r--. 1 emc.glopara emc 27993768 Dec 20 19:13 RTGSST.monthly.clim.grb
-rw-r--r--. 1 emc.glopara emc    32484 Dec 20 19:13 seaice_newland.grb
-rw-r--r--. 1 emc.glopara emc    65679 Dec 20 19:13 sfc_emissivity_idx.txt
-rw-r--r--. 1 emc.glopara emc     3873 Dec 20 19:13 solarconstant_noaa_an.txt

@DusanJovic-NOAA
Copy link
Collaborator

Why do we need this tar file unpacked? It's only purpose is to be used as a canned case for testing the ufs-weather-model executable. Nothing else. It should be not used as a source of any configuration or input data.

@uturuncoglu
Copy link
Collaborator

@KateFriedman-NOAA thanks. @DusanJovic-NOAA Actually, we are copying *_table files, nems.configure and also default initial conditions but i'll include *_table files and nems.configure to FV3 CIME interface. The initial condition is also generated by the chgres and if we don't have any objections we could always use cghres to produce data from GFS in a desired resolution. Then, we will remove the dependency to simple-test-case/ directory.

@arunchawla-NOAA
Copy link
Collaborator

I am closing this issue as most things seem to be resolved here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment