Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organize output files by cycle/node. #952

Merged
merged 20 commits into from
Nov 11, 2022

Conversation

mgjarrett
Copy link
Contributor

@mgjarrett mgjarrett commented Oct 27, 2022

Description

ARMI can output a large number of input files at each cycle and node. Most of these file names are re-used at different cycle/node points, so they get overwritten. To improve organization and retain all important inputs and outputs, the directory changer is being updated to copy these files to a permanent location in the working directly, under folders named c0n0, c0n1, etc, using a new setting savePhysicsFiles.

Note

This PR does not change the functionality of dumpSnapshot, which is now used to run detailed physics calculations at selected time points.


Checklist

  • This PR has only one purpose or idea.
  • Tests have been added/updated to verify that the new/changed code works.
  • The release notes (location doc/release/0.X.rst) are up-to-date with any bug fixes or new features.
  • The documentation is still up-to-date in the doc folder.
  • The dependencies are still up-to-date in setup.py.

ARMI can a large number of input files at each cycle and node.
Most of these file names are re-used at different cycle/node points,
so they get overwritten. To improve organization and retain all
important inputs and outputs, the directory changer is being updated to
copy these files to a permanent location in the working directly, under
folders named c0n0, c0n1, etc.
@mgjarrett mgjarrett added the enhancement New feature or request label Oct 27, 2022
@mgjarrett mgjarrett marked this pull request as ready for review October 27, 2022 22:25
@keckler
Copy link
Member

keckler commented Oct 27, 2022

Yes, IT is going to love this! 😆

@mgjarrett
Copy link
Contributor Author

Yes, IT is going to love this! 😆

Ideally it will be easier to delete old files you don't want because they'll be grouped into a handful of folders, rather than just having 1000+ loose files in the working directory. For now we still are copying to the working directory so the files are there for codes to interface with each other, but hopefully we can get to a point where the physics codes can find the files they need within these directories.

@john-science john-science self-requested a review October 28, 2022 02:03
@john-science john-science added feature request Smaller user request and removed enhancement New feature or request labels Oct 28, 2022
@john-science
Copy link
Member

Why did you change the TemporaryDirectoryChanger? The goal of that feature is to make sure that a test can create as many files as it needs, willy nilly, and be easily disposable. We do not want unit tests to be sharing a scratch space.

@john-science
Copy link
Member

@jakehader Please do not merge this yet.

@mgjarrett Have you tested this downstream in our other important repos and projects? It appears to me that most of the projects that use ARMI to run simulations would have to change to meet your new change. I would like you to propose this change at the next Monday meeting, and spearhead the work in what I am guessing is about 12 other pull requests in other repositories. Thanks so much!

@mgjarrett
Copy link
Contributor Author

Why did you change the TemporaryDirectoryChanger? The goal of that feature is to make sure that a test can create as many files as it needs, willy nilly, and be easily disposable. We do not want unit tests to be sharing a scratch space.

There are downstream repos that use the TemporaryDirectoryChanger for things other than testing. The temporary directory is still cleaned up, but this provides the option to provide an outputPath where the files will be transferred to. If no outputPath is passed, then there should be no additional permanent files created that weren't created before.

@mgjarrett mgjarrett marked this pull request as draft October 28, 2022 15:17
@mgjarrett
Copy link
Contributor Author

mgjarrett commented Oct 28, 2022

@jakehader Please do not merge this yet.

@mgjarrett Have you tested this downstream in our other important repos and projects? It appears to me that most of the projects that use ARMI to run simulations would have to change to meet your new change. I would like you to propose this change at the next Monday meeting, and spearhead the work in what I am guessing is about 12 other pull requests in other repositories. Thanks so much!

The goal of this PR is to retain all physics kernel outputs from a simulation in a location where they will not be overwritten at the next time step.

I would like to get input from others at the next meeting. I am not in any rush for this to be merged, and it will be important to get collective agreement about how this should be implemented. The hope is that we can improve the organization of the many, many physics kernel outputs that are typically generated during an ARMI run. Right now, you might just have 1000+ files sitting loose in a directory with no organization.

No downstream projects need to change in order to work with this proposed PR, in its current form. The intent is to add the option for anyone using a DirectoryChanger to have the files retrieved from that DirectoryChanger to be stored in a folder with a specific name based on cycle, time step, and the interface or Executer that created it, in addition to them being stored in the main working directory. The new outputPath parameter is optional, and if it is not provided it should not change the behavior of the DirectoryChanger at all. It may be better to not provide the outputPath on the DefaultExecuter but instead leave it up to the subclasses to decide whether or not to provide an outputPath.

@john-science
Copy link
Member

@mgjarrett Can you verify something for me? Something that would be Bad would be if running 2 simulations in a row could now result in over-writing the first results because the second results are in the same folder.

That's not the case with your re-design here, right?

Have you tested that?
Thanks!

Return `dumpSnapshot` closer to its original purpose of dumping the
physics kernel inputs and outputs without performing any detailed
analysis. Moved the existing functionality of `dumpSnapshot` to a new
setting called `runSnapshot`, which is a clearer description of what the
setting does: run detailed analysis at the specified "snapshots" (i.e.,
state points).
@mgjarrett
Copy link
Contributor Author

@mgjarrett Can you verify something for me? Something that would be Bad would be if running 2 simulations in a row could now result in over-writing the first results because the second results are in the same folder.

This is the current behavior of ARMI, and this PR does not change that. If you re-run ARMI in the same folder where you have previously run it, it will overwrite any existing files as new ones are generated.

The benefit of this PR is that it allows you to store inputs/outputs from a given time step in the simulation. Previously, any auxiliary I/O from a physics kernel would be overwritten unless it was given a name with a unique cycle/step identifier.

@mgjarrett
Copy link
Contributor Author

After our internal user meeting, @ntouran brought up that what we are implementing here is basically the old behavior of the dumpSnapshot setting. dumpSnapshot is now used to run detailed physics calculations at selected time points.

I am proposing with this PR to change the name of the current dumpSnapshot setting to runDetailedSnapshot. This makes it more clear to the user that the setting is running detailed physics calculations at the specified "snapshots" (i.e., cycle/node time points). This setting is used during a runType: Snapshots run.

With this change, we can reclaim the dumpSnapshot setting for the functionality that has been re-introduced here: dumping all of the physics kernel I/O at selected "snapshot" statepoints during a runType: Standard run.

To summarize:

  • Old dumpSnapshot + runType: Snapshots will become runDetailedSnapshot + runType: Snapshots
  • New dumpSnapshot + runType: Standard will dump the I/O files into a folder at the specified cycle/node steps.

For example:

dumpSnapshot:
  - 000000
  - 000001
  - 001000
runType: Standard

will dump I/O at c0n0, c0n1, and c1n0, but no other points. If no dumpSnapshot setting is provided, none of the additional I/O will be dumped. Without dumpSnapshot, the behavior of ARMI should be identical to before this PR.

@jakehader
Copy link
Member

This looks great to me! Holding on any approvals and landing until we get the greenlight from @john-science

@john-science john-science marked this pull request as ready for review November 3, 2022 15:39
@john-science
Copy link
Member

Sorry, I can't tell, did we get rid of the "dumpSnapshots" setting in this PR?

It looks like we did, but I thought we were keeping that and adding a new level of snapshot detail?

(I just mention it because I quickly found 9 downstream repos that use that setting. So, even just changing the spelling of that name would generate a lot of work that feels unnecessary if we can leave the name unchanged.)

@mgjarrett
Copy link
Contributor Author

I am looking at the PR now.

I'm not 100% convinced about the naming. Will a user understand the difference between "dumpSnapshot" and "runDetailedSnapshot" from the names? Maybe "dumpSnapshot" and "dumpDetailedSnapshot" makes it more clear they are related and have only a difference in level of detail?

Naming is hard, I'm just thinking. I'll be willing to expect the name if you think it's clear to the user.

(Also, I know changing the name now is 100 times easier than changing it a year from now, after everyone is already using it.)

We can change the names to whatever we want. I am suggesting these names because runDetailedSnapshot actually runs additional calculations, while dumpSnapshot only dumps I/O but doesn't perform any additional calculation. dumpDetailedSnapshot is fine if we think that's better, but I was trying to highlight that it runs something, it doesn't just dump files.

@mgjarrett
Copy link
Contributor Author

Sorry, I can't tell, did we get rid of the "dumpSnapshots" setting in this PR?

I am proposing renaming it to runDetailedSnapshot. We can keep it the same, or make it anything else we want; I'm putting that forth as a reasonable representation of what the setting does.

It looks like we did, but I thought we were keeping that and adding a new level of snapshot detail?

We're not adding the detailed calculation; the current setting dumpSnapshot does the detailed calculation. We're trying to add a setting that doesn't do the detailed stuff, which is what dumpSnapshot used to do before it was commandeered for its current purpose.

(I just mention it because I quickly found 9 downstream repos that use that setting. So, even just changing the spelling of that name would generate a lot of work that feels unnecessary if we can leave the name unchanged.)

Yes, we'd have to update the name of the setting in downstream repos. They're trivial PRs, but it is unnecessary work. I am arguing that the current setting name is confusing because it doesn't just dump data, it also runs additional calculations. But that might not outweigh the burden of changing a setting name. If that's the case, we can come up with another name for what this PR is doing. Maybe savePhysicsIO?

@john-science
Copy link
Member

@mgjarrett Okay, this PR has gone on too long, sorry about that.

@ntouran Agrees though that we should either:

a. Leave the dumpSnapshot setting 100% alone and just add a new setting, or
b. Entirely remove the dumpSnapthot, and use SettingRenamer.

Nick thinks you're probably right, that we should just rename the setting.

@mgjarrett
Copy link
Contributor Author

Okay, I am happy to change the settings names to whatever we all agree on.

For the new setting being implemented here, what is the favored name? I threw out savePhysicsIO but I am happy to hear other suggestions.

I'll leave dumpSnapshot alone in this PR; if we want to change it we could open another PR for that.

@john-science
Copy link
Member

For the new setting being implemented here, what is the favored name? I threw out savePhysicsIO but I am happy to hear other suggestions.

It's up to you. It's your baby. I'm not sure IO means anything to me. But I don't have like a great alternative all lined up:

  • savePhysicsAtNodes
  • dumpNodeSnapshots
  • nodeSnapshots

The only thing we need is for the name (and the description field in globalSettings.py) to be as descriptive and helpful as possible for users.

@mgjarrett
Copy link
Contributor Author

For the new setting being implemented here, what is the favored name? I threw out savePhysicsIO but I am happy to hear other suggestions.

It's up to you. It's your baby. I'm not sure IO means anything to me. But I don't have like a great alternative all lined up:

  • savePhysicsAtNodes
  • dumpNodeSnapshots
  • nodeSnapshots

The only thing we need is for the name (and the description field in globalSettings.py) to be as descriptive and helpful as possible for users.

Okay, I'm hesitant to use snapShot in the name now because there could be confusion between the functionality of dumpSnapshot vs. dumpNodeSnapshots or nodeSnapshots.

IO is supposed to mean Inputs and Outputs, but savePhysicsFiles might be more descriptive. I'll try that.

@john-science john-science merged commit d33b7f9 into terrapower:main Nov 11, 2022
@mgjarrett mgjarrett mentioned this pull request Nov 14, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Smaller user request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants