-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simple-test-case run fails on ubuntu #114
Comments
Is there a machine where simple-test-case is known to work? Are there any tests for simple-test-case which could help reveal what is going wrong here? |
I just tested this on my Ubuntu 19.10 laptop using GCC 9.2.1 and mpich 3.3.1. It works.
Make sure you have cmake (>= 3.15), python2 (>= 2.7), gcc, g++, gfortran, mpicc, mpicxx, mpif90 and mpiexec installed and available in $PATH, as well as other standard development tools. For exact packages needed on Ubuntu see: https://github.com/DusanJovic-NOAA/simple-ufs/blob/master/docker/Dockerfile.ubuntu If you are on some other Linux distribution see other Dockerfiles in docker directory. I have docker files for Debian and CentOS (aka RHEL) 7 and 8. |
OK, I'll try that. It worked! ;-) Thanks for all the help. Now I will dig into the difference between your scripts and what I did to see if I can figure out what I did wrong. |
OK, so one big difference is that the scripts by @DusanJovic-NOAA do not use the NCEPLIBS-external project. Instead, the external tools are downloaded and installed individually. Also your script seems to install both OpenMPI and MPICH. Which one is used in the build? Also the values for these environment vars are different in your build. You have:
I had other version numbers in some cases. Is there some reason these libraries cannot be found in the usual cmake manner? This is the only cmake build I've worked with which requires that I specify these numbers. I see that part of the problem is the non-standard naming used with the NCEPLIB libraries. The library name should not include the version. For example, netcdf-c current version is 4.7.4, and the library is called libnetcdf.a. When the library is updated, programs that use it can be recompiled without changing their build files. |
Neither. Those two MPI libraries are not installed and used by the build system (top-level build.sh is not running libs/mpilibs/build.sh). The version of MPI library used by the build is whatever mpicc, mpicxx and mpif90 wrappers point to (or on Crays cc, CC and ftn). On many HPCs this is determined by whatever modules you have loaded. Or in your case on Ubuntu, whatever mpi package you installed. I usually install
Different than what? These are the library versions I tested my system with.
Yes. There is. And the reason is that on our production systems, installed ncep libraries do not provide cmake packages. The model build system can not use cmake's find_package functionality. Version numbers and location of libraries is provided by modules via environment variables. This is an example on Hera:
cmake build does not require that you specify any numbers. cmake build requires that these environment variables are set. How you set those variables is different question. On systems that provide modules you just load the required modules.
How do you support multiple netcdf library versions (let's say 4.7.4 and 4.8.0_develop) and how do you tell program which one to use? You can recompile the model without changing any of the model's build files. There's no any library version specified in any CMakeLists.txt file. Switching to a new library version is just a matter of unloading and loading new module. The reasons why NCO appends version numbers to a library archive (.a) files are probably historical. They've been doing that for at least last 20 or so years, if not longer. |
When handling various versions of netCDF (and other libraries, like HDF5, etc.), install different versions in different directories. Then provide the directory to the build system as the place to find netCDF. Since the library is always called libnetcdf.a, there is no need to set a special environment variable. This is the standard approach. Many builders of this code will not have modules to set all the environment variables. I do not on my system, hence all my questions. ;-) I have entered an issue in NCEPLIBS about the library names. |
OK, sadly the method that worked previously is no longer working. Sigh. Every time I try to build this software it fails in a new and interesting way. I will try rolling back to the v1 release to see if that works... and it does not. It fails in the same way.
|
@DusanJovic-NOAA is your simple build working for you with current develop branch? |
It does. On Ubuntu 20.04. What's the error message in log_model? |
OK, I started again, completely from scratch. It is now failing in a new and interesting way. Actually what has happened is that it apparently hanged building the 3rd party libraries. OK, right now I am trying to figure out how to turn off whatever bash settings you have activated which makes bash -x not work. I would like to see what each line of each of these scripts is attempting to do, perhaps that will help get it building... |
OK, seems like the problem is esmf. I am investigating... |
OK, I set this all up on Jenkins so I don't have to keep typing everything by hand, and so that once I find the formula for a working build, I can have it handy as a reference. My build is failing in the model, and here's how it fails:
|
@edwardhartnett see here NCAR#31 - essentially the UFS and CCPP explicitly allow using the Fortran 2008 standard, and it turned out that GNU < 9 does not support all those features. |
OK, so then the CMake build system must be updated to check this. Then, instead of a Fortran build error in a log file, I will get a nice error message from CMake. I will add an issue for that... |
@DusanJovic-NOAA you say that the develop branch is working for you currently? How are you getting around the problem of affinity.c as described in #109? |
I'm getting the exact same error when attempting to run V1.1.0 of ufs_weather_model on a raspberry pi cluster. It seems that the error occurs when attempting to read the "seaice_newland.grb" GRIB file. I can read this file using the wgrib utility: Unfortunately the conversation on this thread doesn't address this failure directly. Has there been anymore progress in resolving this issue? |
@edwardhartnett so FMS has cmake capability now, can we close this issue? Thanks |
* Removing use of mpp_io_mod and fms_io_mod from the dycore code. Replacing the necessary functions with fms2_io_mod functions * Adding a call to set_filename_appendix so that nest is added to filename when needed and removing unneccessary code in fv_io_mod * FV3 Documentation - formatted PDF and source files for FV3 documentation. * Documentation and defaults changes - Updated defaults for hord options to use 8 and 10, and removal of mention of hord = 9 (experimental, unsupported) scheme. * Initialize {sw,se,nw,ne}_corner to .false in model/fv_arrays.F90 (cherry picked from commit bf0630f) * merge of latest dev work from GFDL Weather and Climate Dynamics Division (ufs-community#114) * read ak/bk from user specified files (ufs-community#115) * add input.nml parameter fv_eta_file for user specified ak/bk; change ks calculation when npz_type=input; use newunit to replace fixed file unit for npz_type=input (cherry picked from commit 3a0d35a) * FV3 Example Notebooks and cleanup of docs directory (ufs-community#117) * removed module use of INPUT_STR_LENGTH in fv_control.F90 (ufs-community#122) * add check on eta levels to ensure their monotonicity * update Jili Dong's ak/bk external input to - use the FMS ascii_read (single read/broadcast) - error check input to ensure the proper number of levels present * added a format description for the external eta file and ensured a correct the file length check for FMS 2021.03 and greater * merge of minor updates from GFDL Weather and Climate Dynamics Division (20210804) (ufs-community#127) Co-authored-by: Lauren Chilutti <Lauren.Chilutti@noaa.gov> Co-authored-by: laurenchilutti <60401591+laurenchilutti@users.noreply.github.com> Co-authored-by: lharris4 <53020884+lharris4@users.noreply.github.com> Co-authored-by: Dusan Jovic <dusan.jovic@noaa.gov> Co-authored-by: Jili Dong <jili.dong@noaa.gov>
Sync with authoritative repository
Unfortunately, the run failed for me. ;-(
This is for the v1.0.0 build of NCEPLIBS-external, NCEPLIBS, and ufs_weather_model.
I followed the instructions here:
https://github.com/ufs-community/ufs-weather-model/wiki/Getting-Started
I downloaded the simple-test-case and built v1.0.0 of the ufs-weather-model. I ran it like this:
mpiexec -np 8 ./ufs_weather_model
It ran for a while, then fails. Output below.
I am going to try and get the unit tests, mentioned in the documentation, working for me, to see if that helps.
If you have any suggestions or advice, that would be helpful.
The text was updated successfully, but these errors were encountered: