Organize output files by cycle/node. #952

mgjarrett · 2022-10-27T21:53:42Z

Description

ARMI can output a large number of input files at each cycle and node. Most of these file names are re-used at different cycle/node points, so they get overwritten. To improve organization and retain all important inputs and outputs, the directory changer is being updated to copy these files to a permanent location in the working directly, under folders named c0n0, c0n1, etc, using a new setting savePhysicsFiles.

Note

This PR does not change the functionality of dumpSnapshot, which is now used to run detailed physics calculations at selected time points.

Checklist

This PR has only one purpose or idea.
Tests have been added/updated to verify that the new/changed code works.

The code style follows good practices.
The commit message(s) follow good practices.

The release notes (location doc/release/0.X.rst) are up-to-date with any bug fixes or new features.
The documentation is still up-to-date in the doc folder.
The dependencies are still up-to-date in setup.py.

ARMI can a large number of input files at each cycle and node. Most of these file names are re-used at different cycle/node points, so they get overwritten. To improve organization and retain all important inputs and outputs, the directory changer is being updated to copy these files to a permanent location in the working directly, under folders named c0n0, c0n1, etc.

keckler · 2022-10-27T22:33:01Z

Yes, IT is going to love this! 😆

mgjarrett · 2022-10-27T22:46:35Z

Yes, IT is going to love this! 😆

Ideally it will be easier to delete old files you don't want because they'll be grouped into a handful of folders, rather than just having 1000+ loose files in the working directory. For now we still are copying to the working directory so the files are there for codes to interface with each other, but hopefully we can get to a point where the physics codes can find the files they need within these directories.

john-science · 2022-10-28T02:05:21Z

Why did you change the TemporaryDirectoryChanger? The goal of that feature is to make sure that a test can create as many files as it needs, willy nilly, and be easily disposable. We do not want unit tests to be sharing a scratch space.

armi/physics/executers.py

john-science · 2022-10-28T02:18:26Z

@jakehader Please do not merge this yet.

@mgjarrett Have you tested this downstream in our other important repos and projects? It appears to me that most of the projects that use ARMI to run simulations would have to change to meet your new change. I would like you to propose this change at the next Monday meeting, and spearhead the work in what I am guessing is about 12 other pull requests in other repositories. Thanks so much!

mgjarrett · 2022-10-28T15:16:44Z

Why did you change the TemporaryDirectoryChanger? The goal of that feature is to make sure that a test can create as many files as it needs, willy nilly, and be easily disposable. We do not want unit tests to be sharing a scratch space.

There are downstream repos that use the TemporaryDirectoryChanger for things other than testing. The temporary directory is still cleaned up, but this provides the option to provide an outputPath where the files will be transferred to. If no outputPath is passed, then there should be no additional permanent files created that weren't created before.

mgjarrett · 2022-10-28T15:25:39Z

@jakehader Please do not merge this yet.

@mgjarrett Have you tested this downstream in our other important repos and projects? It appears to me that most of the projects that use ARMI to run simulations would have to change to meet your new change. I would like you to propose this change at the next Monday meeting, and spearhead the work in what I am guessing is about 12 other pull requests in other repositories. Thanks so much!

The goal of this PR is to retain all physics kernel outputs from a simulation in a location where they will not be overwritten at the next time step.

I would like to get input from others at the next meeting. I am not in any rush for this to be merged, and it will be important to get collective agreement about how this should be implemented. The hope is that we can improve the organization of the many, many physics kernel outputs that are typically generated during an ARMI run. Right now, you might just have 1000+ files sitting loose in a directory with no organization.

No downstream projects need to change in order to work with this proposed PR, in its current form. The intent is to add the option for anyone using a DirectoryChanger to have the files retrieved from that DirectoryChanger to be stored in a folder with a specific name based on cycle, time step, and the interface or Executer that created it, in addition to them being stored in the main working directory. The new outputPath parameter is optional, and if it is not provided it should not change the behavior of the DirectoryChanger at all. It may be better to not provide the outputPath on the DefaultExecuter but instead leave it up to the subclasses to decide whether or not to provide an outputPath.

john-science · 2022-10-30T17:33:25Z

@mgjarrett Can you verify something for me? Something that would be Bad would be if running 2 simulations in a row could now result in over-writing the first results because the second results are in the same folder.

That's not the case with your re-design here, right?

Have you tested that?
Thanks!

Return `dumpSnapshot` closer to its original purpose of dumping the physics kernel inputs and outputs without performing any detailed analysis. Moved the existing functionality of `dumpSnapshot` to a new setting called `runSnapshot`, which is a clearer description of what the setting does: run detailed analysis at the specified "snapshots" (i.e., state points).

mgjarrett · 2022-11-02T22:22:34Z

@mgjarrett Can you verify something for me? Something that would be Bad would be if running 2 simulations in a row could now result in over-writing the first results because the second results are in the same folder.

This is the current behavior of ARMI, and this PR does not change that. If you re-run ARMI in the same folder where you have previously run it, it will overwrite any existing files as new ones are generated.

The benefit of this PR is that it allows you to store inputs/outputs from a given time step in the simulation. Previously, any auxiliary I/O from a physics kernel would be overwritten unless it was given a name with a unique cycle/step identifier.

mgjarrett · 2022-11-02T22:29:18Z

After our internal user meeting, @ntouran brought up that what we are implementing here is basically the old behavior of the dumpSnapshot setting. dumpSnapshot is now used to run detailed physics calculations at selected time points.

I am proposing with this PR to change the name of the current dumpSnapshot setting to runDetailedSnapshot. This makes it more clear to the user that the setting is running detailed physics calculations at the specified "snapshots" (i.e., cycle/node time points). This setting is used during a runType: Snapshots run.

With this change, we can reclaim the dumpSnapshot setting for the functionality that has been re-introduced here: dumping all of the physics kernel I/O at selected "snapshot" statepoints during a runType: Standard run.

To summarize:

Old dumpSnapshot + runType: Snapshots will become runDetailedSnapshot + runType: Snapshots
New dumpSnapshot + runType: Standard will dump the I/O files into a folder at the specified cycle/node steps.

For example:

dumpSnapshot:
  - 000000
  - 000001
  - 001000
runType: Standard

will dump I/O at c0n0, c0n1, and c1n0, but no other points. If no dumpSnapshot setting is provided, none of the additional I/O will be dumped. Without dumpSnapshot, the behavior of ARMI should be identical to before this PR.

jakehader · 2022-11-03T01:18:48Z

This looks great to me! Holding on any approvals and landing until we get the greenlight from @john-science

john-science · 2022-11-03T15:50:19Z

Sorry, I can't tell, did we get rid of the "dumpSnapshots" setting in this PR?

It looks like we did, but I thought we were keeping that and adding a new level of snapshot detail?

(I just mention it because I quickly found 9 downstream repos that use that setting. So, even just changing the spelling of that name would generate a lot of work that feels unnecessary if we can leave the name unchanged.)

mgjarrett · 2022-11-03T16:10:37Z

I am looking at the PR now.

I'm not 100% convinced about the naming. Will a user understand the difference between "dumpSnapshot" and "runDetailedSnapshot" from the names? Maybe "dumpSnapshot" and "dumpDetailedSnapshot" makes it more clear they are related and have only a difference in level of detail?

Naming is hard, I'm just thinking. I'll be willing to expect the name if you think it's clear to the user.

(Also, I know changing the name now is 100 times easier than changing it a year from now, after everyone is already using it.)

We can change the names to whatever we want. I am suggesting these names because runDetailedSnapshot actually runs additional calculations, while dumpSnapshot only dumps I/O but doesn't perform any additional calculation. dumpDetailedSnapshot is fine if we think that's better, but I was trying to highlight that it runs something, it doesn't just dump files.

mgjarrett · 2022-11-03T16:19:17Z

Sorry, I can't tell, did we get rid of the "dumpSnapshots" setting in this PR?

I am proposing renaming it to runDetailedSnapshot. We can keep it the same, or make it anything else we want; I'm putting that forth as a reasonable representation of what the setting does.

It looks like we did, but I thought we were keeping that and adding a new level of snapshot detail?

We're not adding the detailed calculation; the current setting dumpSnapshot does the detailed calculation. We're trying to add a setting that doesn't do the detailed stuff, which is what dumpSnapshot used to do before it was commandeered for its current purpose.

(I just mention it because I quickly found 9 downstream repos that use that setting. So, even just changing the spelling of that name would generate a lot of work that feels unnecessary if we can leave the name unchanged.)

Yes, we'd have to update the name of the setting in downstream repos. They're trivial PRs, but it is unnecessary work. I am arguing that the current setting name is confusing because it doesn't just dump data, it also runs additional calculations. But that might not outweigh the burden of changing a setting name. If that's the case, we can come up with another name for what this PR is doing. Maybe savePhysicsIO?

john-science · 2022-11-07T19:15:26Z

@mgjarrett Okay, this PR has gone on too long, sorry about that.

@ntouran Agrees though that we should either:

a. Leave the dumpSnapshot setting 100% alone and just add a new setting, or
b. Entirely remove the dumpSnapthot, and use SettingRenamer.

Nick thinks you're probably right, that we should just rename the setting.

mgjarrett · 2022-11-07T22:51:44Z

Okay, I am happy to change the settings names to whatever we all agree on.

For the new setting being implemented here, what is the favored name? I threw out savePhysicsIO but I am happy to hear other suggestions.

I'll leave dumpSnapshot alone in this PR; if we want to change it we could open another PR for that.

john-science · 2022-11-08T00:15:06Z

For the new setting being implemented here, what is the favored name? I threw out savePhysicsIO but I am happy to hear other suggestions.

It's up to you. It's your baby. I'm not sure IO means anything to me. But I don't have like a great alternative all lined up:

savePhysicsAtNodes
dumpNodeSnapshots
nodeSnapshots

The only thing we need is for the name (and the description field in globalSettings.py) to be as descriptive and helpful as possible for users.

mgjarrett · 2022-11-08T00:26:18Z

For the new setting being implemented here, what is the favored name? I threw out savePhysicsIO but I am happy to hear other suggestions.

It's up to you. It's your baby. I'm not sure IO means anything to me. But I don't have like a great alternative all lined up:

savePhysicsAtNodes

dumpNodeSnapshots

nodeSnapshots

The only thing we need is for the name (and the description field in globalSettings.py) to be as descriptive and helpful as possible for users.

Okay, I'm hesitant to use snapShot in the name now because there could be confusion between the functionality of dumpSnapshot vs. dumpNodeSnapshots or nodeSnapshots.

IO is supposed to mean Inputs and Outputs, but savePhysicsFiles might be more descriptive. I'll try that.

mgjarrett added 3 commits October 27, 2022 14:47

Update documentation.

0c47cdb

Test new directoryChangers copy feature.

fa07f3b

mgjarrett requested a review from jakehader October 27, 2022 22:23

mgjarrett added the enhancement New feature or request label Oct 27, 2022

mgjarrett marked this pull request as ready for review October 27, 2022 22:25

john-science self-requested a review October 28, 2022 02:03

john-science added feature request Smaller user request and removed enhancement New feature or request labels Oct 28, 2022

john-science reviewed Oct 28, 2022

View reviewed changes

armi/physics/executers.py Outdated Show resolved Hide resolved

mgjarrett marked this pull request as draft October 28, 2022 15:17

mgjarrett added 2 commits October 28, 2022 12:01

Use interfaceName for Executer output directory.

2002f10

Only copy files on DC exit if necessary.

46f80bb

mgjarrett added 4 commits October 31, 2022 10:40

Copy input files to directory also.

a2810e7

More changes to dumpSnapshot.

f76a843

Change runSnapshot to runDetailedSnapshot

93c6081

mgjarrett added 4 commits November 2, 2022 15:37

Update documentation.

32e4cbb

Merge branch 'main' into organizeOutputs

79db86e

Update release notes.

442dd39

Fix bug.

4a46b8c

Initialize snapshotList to None.

13fbef3

john-science marked this pull request as ready for review November 3, 2022 15:39

mgjarrett and others added 6 commits November 7, 2022 16:30

Change dumpSnapshot to new name savePhysicsIO

14ebc10

Change runDetailedSnapshot to dumpSnapshot

9ed26c1

Update docs

5889f52

Merge branch 'main' into organizeOutputs

63f6aeb

Fix typo

62e5ced

Merge branch 'main' into organizeOutputs

c602443

john-science approved these changes Nov 11, 2022

View reviewed changes

john-science merged commit d33b7f9 into terrapower:main Nov 11, 2022

mgjarrett mentioned this pull request Nov 14, 2022

Cycle node bug fix #963

Merged

7 tasks

jakehader mentioned this pull request Dec 20, 2022

Extension of the framework nuclides #998

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organize output files by cycle/node. #952

Organize output files by cycle/node. #952

mgjarrett commented Oct 27, 2022 •

edited by john-science

Loading

keckler commented Oct 27, 2022

mgjarrett commented Oct 27, 2022

john-science commented Oct 28, 2022

john-science commented Oct 28, 2022

mgjarrett commented Oct 28, 2022

mgjarrett commented Oct 28, 2022 •

edited

Loading

john-science commented Oct 30, 2022

mgjarrett commented Nov 2, 2022

mgjarrett commented Nov 2, 2022

jakehader commented Nov 3, 2022

john-science commented Nov 3, 2022

mgjarrett commented Nov 3, 2022

mgjarrett commented Nov 3, 2022

john-science commented Nov 7, 2022

mgjarrett commented Nov 7, 2022

john-science commented Nov 8, 2022

mgjarrett commented Nov 8, 2022

Organize output files by cycle/node. #952

Organize output files by cycle/node. #952

Conversation

mgjarrett commented Oct 27, 2022 • edited by john-science Loading

Description

Note

Checklist

keckler commented Oct 27, 2022

mgjarrett commented Oct 27, 2022

john-science commented Oct 28, 2022

john-science commented Oct 28, 2022

mgjarrett commented Oct 28, 2022

mgjarrett commented Oct 28, 2022 • edited Loading

john-science commented Oct 30, 2022

mgjarrett commented Nov 2, 2022

mgjarrett commented Nov 2, 2022

jakehader commented Nov 3, 2022

john-science commented Nov 3, 2022

mgjarrett commented Nov 3, 2022

mgjarrett commented Nov 3, 2022

john-science commented Nov 7, 2022

mgjarrett commented Nov 7, 2022

john-science commented Nov 8, 2022

mgjarrett commented Nov 8, 2022

mgjarrett commented Oct 27, 2022 •

edited by john-science

Loading

mgjarrett commented Oct 28, 2022 •

edited

Loading