-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example simulation 201 fails using Intel compiler #82
Comments
@bss116 this has no assignee -- are you looking into this already or do you want to me have a look? If so could you please assign me to it? If so may have some time at the end of next week. |
I'm not going to look into this, at least for a while now. Sure, if you have some time, please have a look! But don't spend all your time and nerves on it, it will probably involve going over lots of array-bound warnings, for which we will need to allocate some time, sometime... |
OK I must have misunderstood the issue then -- I though this was something to do with the HPC specifically but I think this a general issue with Intel (flag) that catches a general problem that is not covered when running with GNU -- I will un-assign myself then 🤭... |
Yeah I'm afraid so... 😄 |
@bss116 cool but in this case I would probably change the title to something a bit more specific as it is not due to the HPC per se. This only happens because you use Intel on the HPC rather than GNU so this issue is about Intel. |
@bss116 @samoliverowens I have checked this again running 201 for 1000 s both with Intel and GNU and on the ICL cluster. The first thing I noticed is that in both cases simulations run when using 1 core without errors. Then simulations do fail consistently across compilers but for seems to be related with the number of cores used. E.g. with 32 cores the simulation runs but with 48 it fails with error So I think we should close this issue as it does not seem to apply any longer -- at least for 201 -- and, instead (1) clarify/add a check for the n cores to use and improve the error logs. I think that the for (1) we could simply clarify this in the docs for now with the option to implement a check in the code (future milestone) and for (2) add target it in future milestone. What do you guys think? |
The number of cores has to divide |
this is good then -- has this been documented? If so I would close this and open and issue about adding a check that can be targeted in a future release |
I thought we already mentioned it somewhere, but I cannot find it in the docs now. Where do you think would be a good place? In the getting started guide under Run, or in the simulation setup notes? Line 505 in 9c5e8e3
I'm happy that 201 runs now also on the ICL cluster. Let's close this issue. |
@bss116 cool -- how about under https://github.com/uDALES/u-dales/blob/master/docs/udales-getting-started.md#run? And sure -- will close this and open an issue for fixing that check. Will actually put this under 0.1 and I can take care of this. |
The energybalance example simulation 201, which run without issues locally, fails when running on the ICL cluster. Strangely, running in debug mode does not produce an error stack to show where the code terminates, but running it in release does, pointing to the poisson solver (modpois.f90 line 1045). This makes me think that the error may at a different spot than indicated in the error stack. A similar issue arises when running the example simulation 501 with an extended vertical domain size, where debug does not produce an error stack. However, the error (in release) comes from a different line, this time from the subgrid model (modsubgrid.f90 391). Again this simulation works well on my local machine. I am not sure where to start looking for the error. Log files are attached.
output.201-debug.txt
output.201-release.txt
output.501-debug.txt
output.501-release.txt
The text was updated successfully, but these errors were encountered: