Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt.sh is not working on Orion #2365

Open
uturuncoglu opened this issue Jul 12, 2024 · 20 comments
Open

rt.sh is not working on Orion #2365

uturuncoglu opened this issue Jul 12, 2024 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@uturuncoglu
Copy link
Collaborator

Description

The rt.sh is complaining about old modules and not run on Orion. I wonder if Orion is supported by UFS Weather Model or not at this point.

To Reproduce:

What compilers/machines are you seeing this with? Intel but probably GNU will have same issue
Give explicit steps to reproduce the behavior.

  1. checkout head of develop
  2. try to run one of the test

Additional context

N/A

Output

output logs

rt.sh: Setting up orion...
Lmod has detected the following error:  The following module(s) are unknown: "git/2.28.0"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "git/2.28.0"

Also make sure that all modulefiles written in TCL start with the string #%Module
@uturuncoglu uturuncoglu added the bug Something isn't working label Jul 12, 2024
@uturuncoglu uturuncoglu changed the title RT is not working on Orion rt.sh is not working on Orion Jul 12, 2024
@BrianCurtis-NOAA
Copy link
Collaborator

@jkbk2004 looks like Orion bumped a few versions of files we use in UFS. Can you have your team look through what version bumps we'll need to make?

@jkbk2004
Copy link
Collaborator

@zach1221 @FernandoAndrade-NOAA can you check and run rt.sh on Orion ?

@zach1221
Copy link
Collaborator

zach1221 commented Jul 12, 2024

I cloned the develop repo and ran an RT successfully on Orion. I did not see any complaints about git modules. I'm not sure if the specific test case would matter but I ran against some atm_dyn32 cases using intel from work2. It is strange that git/2.28.0 looks like it's gone, and I don't see a later version available. I can follow up on that.

Edit: git version is now git/2.31.1

@FernandoAndrade-NOAA are you able to replicate?

@FernandoAndrade-NOAA
Copy link
Collaborator

I cloned the develop repo and ran an RT successfully on Orion. I did not see any complaints about git modules. I'm not sure if the specific test case would matter but I ran against some atm_dyn32 cases using intel from work2. It is strange that git/2.28.0 looks like it's gone, and I don't see a later version available. I can follow up on that.

@FernandoAndrade-NOAA are you able to replicate?

I had actually run on Orion earlier this morning to pretest Sam's UPP PR, I can try again now and see if any git errors are showing up.

@FernandoAndrade-NOAA
Copy link
Collaborator

There is no issue on my side running RTs on Orion, I have a couple tests set up and queued. @zach1221 @jkbk2004 FYI

@FernandoAndrade-NOAA
Copy link
Collaborator

Apologies I spoke too soon, I'm running into CMake errors in the atm_dyn32_intel compile err about not being able to find netcdf 4.7.4 which caused the tests to abort.

@zach1221
Copy link
Collaborator

Apologies I spoke too soon, I'm running into CMake errors in the atm_dyn32_intel compile err about not being able to find netcdf 4.7.4 which caused the tests to abort.

Ok. I will keep trying. May be inconsistent.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jul 12, 2024

@uturuncoglu I had previously needed to a module load git/2.28.0 to my bashrc on Orion but if I remove that (it complains that it isn't found) and try to run the RT, the compile for datm_cdeps_debug fails with

CMake Error at CMakeLists.txt:149 (find_package):
  By not providing "FindNetCDF.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "NetCDF", but
  CMake did not find one.

  Could not find a package configuration file provided by "NetCDF" (requested
  version 4.7.4) with any of the following names:

    NetCDFConfig.cmake
    netcdf-config.cmake

  Add the installation prefix of "NetCDF" to CMAKE_PREFIX_PATH or set
  "NetCDF_DIR" to a directory containing one of the above files.  If "NetCDF"
  provides a separate development package or SDK, be sure it has been
  installed.

@uturuncoglu
Copy link
Collaborator Author

Thanks all. I am not sure but maybe this could be some king of environmental issue in user level. I could not see module purge in the rt.sh but there is one somewhere else. If everybody could run except me then maybe issue is in my environment. Any idea?

@zach1221
Copy link
Collaborator

Thanks all. I am not sure but maybe this could be some king of environmental issue in user level. I could not see module purge in the rt.sh but there is one somewhere else. If everybody could run except me then maybe issue is in my environment. Any idea?

@uturuncoglu what happens if you start with a fresh login, re-clone ufs-community:develop, and try to re-run the test? May not work, just curious to see what happens.

@uturuncoglu
Copy link
Collaborator Author

@zach1221 Let me try again.

@uturuncoglu
Copy link
Collaborator Author

@zach1221 Okay. I checked again by using following commands,

git clone --recursive https://github.com/ufs-community/ufs-weather-model.git ufs-weather-model_dev
cd ufs-weather-model_dev/tests
./rt.sh -a nems -k -n "control_p8 intel"

and still failing for me. I also check the loaded modules before running the test and there was no module.

@zach1221
Copy link
Collaborator

@zach1221 Okay. I checked again by using following commands,

git clone --recursive https://github.com/ufs-community/ufs-weather-model.git ufs-weather-model_dev
cd ufs-weather-model_dev/tests
./rt.sh -a nems -k -n "control_p8 intel"

and still failing for me. I also check the loaded modules before running the test and there was no module.

Running the same series of commands, the regression test is successful for me. I'll dig some more and look through previous issues to see if there are any similar examples,

@uturuncoglu
Copy link
Collaborator Author

@zach1221 Thanks again for you help and sorry about getting your time. If it works for others and then the only explanation is something in my environment. So, maybe I could try to look at more carefully what is different in my side.

@zach1221
Copy link
Collaborator

zach1221 commented Jul 15, 2024

@uturuncoglu Yes, seems to be something account specific. I reached out to Orion rdhpcs support as well, to see if they have any pointers.

@zach1221
Copy link
Collaborator

@uturuncoglu rdhpcs is requested that you reach out to them, to provide additional information. Could you send an email to rdhpcs.orion.help@noaa.gov?

@uturuncoglu
Copy link
Collaborator Author

uturuncoglu commented Jul 17, 2024

@zach1221 Sure. I'll do it. Thanks for your help.

@zach1221
Copy link
Collaborator

zach1221 commented Aug 1, 2024

Hi, @uturuncoglu are you still experiencing this issue on Orion?

@zach1221
Copy link
Collaborator

Hi, @uturuncoglu are you still experiencing this issue on Orion?

Hi, @uturuncoglu following up again here. Are we ok to close this issue?

@uturuncoglu
Copy link
Collaborator Author

@zach1221 Thanks for checking. No, it is fine now. Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Status: No status
Development

No branches or pull requests

6 participants