Create a separate .log file for the worker node .stdouts. #528

mgjarrett · 2022-01-10T23:28:42Z

Description

At the end of a run, all of the logs from the individual MPI processes get concatenated together and appended to the end of the *.stdout. If there are a lot of MPI processes, this can make the *.stdout obnoxiously bloated, with a bunch of information that the user would rarely ever want to look at.

Instead, this PR puts all of those worker node logs into a separate file, .\logs\armi-mpi-workers.log
It would probably be best if the file had the name of the case (i.e., the same as the *.stdout and *.stderr), but I don't know if that name is available to the logger at this level. It's just writing those to the streams sys.stdout and sys.stderr. If somebody has a better idea for naming I'm all ears; this was just kind of a temporary name to make sure the code worked.

Checklist

Tests have been added/updated to verify that the new or changed code works.
Docstrings were included and updated as necessary
The code is understandable and maintainable to people beyond the author
There is no commented out code in this PR.

If user exposed functionality was added/changed:

Documentation added/updated in the doc folder.
New or updated dependencies have been added to setup.py.

armi/runLog.py

john-science · 2022-01-17T16:08:55Z

@ntouran Michael has explained his reasoning, but I am curious what you think of this change to our logging.

ntouran · 2022-01-17T17:45:50Z

This is reasonable in concept for sure. I think it is worth trying to get the same to be synced up with the caseTitle though, because otherwise it will be easy to miss in archiving files, and multiple runs of in the same dir (e.g. param sweeping) will overwrite one another. (though honestly running concurrently in the same dir probably is a bad idea regardless).

john-science · 2022-01-20T15:49:49Z

I think it is worth trying to get the same to be synced up with the caseTitle...

@mgjarrett What do you think of Nick's idea?

mgjarrett · 2022-01-20T16:29:21Z

Yeah I think it would be best to have the caseTitle as the name, but I'm not sure exactly how it would be done and I haven't had time to look at it.

armi/runLog.py

john-science · 2022-01-20T20:25:09Z

I think it is worth trying to get the same to be synced up with the caseTitle though....

I like this idea. In retrospect, it seems like the caseTitle should be in the output file name for all log files. That would have been a good design.

However... the caseTitle is not readily assessible in runLog.py right now. We could call getMasterCS(), but I'm fairly sure that will break a bunch of plugins and our docs. Because people want to be able to use our runLogger in all sorts of very general situations.

Maybe that could be called inside concatLogs() since that is probably only ever called during a parallel/cluster-type run. That might not break things. But certainly, let's not call getMasterCS() anywhere else.

One question I have is WHY isn't this already set to caseTitle? I notice these two lines exist:

armi/cases/case.py:        runLog.LOG.startLog(self.cs.caseTitle)
armi/operators/operator.py:        runLog.LOG.startLog(self.cs.caseTitle)

onufer · 2022-01-20T20:37:33Z

I think startLog is the log in the execution directory. The logs that Mike is reading are in the fast path.

armi/tests/test_runLog.py

ntouran · 2022-01-24T17:18:01Z

Sorry for the late questions on this, but on the discussion level, can we talk a little bit more about the motivation of this? Ideally we'd answer these questions in implementation docstrings in the code itself so it lives on.

Having ~hundreds of files will be harder to manage and archive for qualified runs. Concatenating them into one provides one record of the run's output and its pedigree (e.g. its hash), which is fairly convenient from a QA point of view. That was the motivation for concatenating them in the first place
Can you explain in more detail how having lots of info in one big stdout is inconvenient? I typically just use find/filtering to filter do the node of interest using my text editor.
Could we make concatenate or not a user-facing option and leave it up to the users?

mgjarrett · 2022-01-24T17:25:50Z

The goal is not to have ~hundreds of files, but to separate the current stdout into:
!. The master node stdout that has everything somebody might usually want to read from stdout, like CR worth, k-eff, assembly designs, etc.
2. A separate file that has all of the stdouts from the workers, which are usually very repetitive and don't have much new useful information in most cases.

The unit tests are passing, but I haven't yet verified that it works with MPI with the latest commit.

mgjarrett · 2022-01-24T18:13:54Z

The branchVerbosity setting might already be a satisfactory solution to the root problem, which is that the stdout tends to be too big when branchVerbosity is extra. warning or error are probably better options when the user doesn't care about these worker logs.

onufer · 2022-01-24T22:21:34Z

There are times when the workers are useful... so i don't want to get rid of the data entirely. Its just appending all the workers to the main standard does not enable the use of the data though since a lot of time you are just looking at the stdout to see how your run is and then it updates, but the file gets like 50x larger.

Setting workers to "error" sounds like an okay solution, but a lot of times you may want the data, but just don't want to view it at the same time you are viewing the normal stdout. There should be an option to store the worker data without having to view it with the main stdout data.

ntouran · 2022-01-25T00:10:22Z

I say we go ahead with this PR. Since it's just 2 files, that's not a huge QA burden, and even with branchVerbosity options it can be a bit verbose. If stdout gets too big and you're trying to refresh it over a slow VPN there a good case to be made to have them be separated.

I would like to see a follow-up ticket, maybe for our internal plugins, that says "get rid of all these totally useless xtra messages in the worker nodes" :)

mgjarrett · 2022-01-25T00:15:05Z

Just a heads up, we're still working on getting the caseTitle out of the logger filenames, so it's not fully baked yet.

john-science · 2022-01-25T17:14:50Z

I would like to see a follow-up ticket, maybe for our internal plugins, that says "get rid of all these totally useless xtra messages in the worker nodes" :)

Hear, hear!

As any ecosystem ages, it's important to clean out unused log messages, and push the rarely-used ones to debug-level.

ntouran · 2022-01-31T16:30:17Z

Ok so is it fully baked now after those last two commits? Ready to be merged?

mgjarrett · 2022-01-31T19:42:49Z

Needed one more commit because the __init__ for RunLogger was throwing away the caseTitle in MPI cases and just initializing a logger with the first piece of the name, which is just "ARMI". Running a case now to verify that everything works correctly when the files are concatenated at the end of the run.

john-science · 2022-02-02T14:50:09Z

black is complaining again.

mgjarrett · 2022-02-02T16:17:22Z

Is it something I can fix? It looks like a bunch of files unrelated to this PR.

mgjarrett · 2022-02-02T17:57:27Z

I have already run black.

(nala-dev) C:\Users\mjarrett\codes\nala\framework>black armi
All done! ✨ � ✨
406 files left unchanged.

If you look at the output from this black run, the changes are all to files that are not in this PR. That's why I don't know what I'm supposed to do here. Sometimes there is a problem with the black version changing, but I'm not sure whether that is what is happening here.

john-science · 2022-02-02T19:39:52Z

I have fixed the black issue in the ARMI master branch.

It was unrelated to this PR.

…er#528)

mgjarrett added 2 commits January 10, 2022 14:53

Create a separate *.log file for the worker node *.stdouts.

b515802

Fix a bug in MPI worker log.

276b8e3

jakehader requested a review from john-science January 13, 2022 14:47

john-science reviewed Jan 14, 2022

View reviewed changes

armi/runLog.py Outdated Show resolved Hide resolved

mgjarrett force-pushed the separateLogs branch from 4dc03ec to 276b8e3 Compare January 14, 2022 17:09

mgjarrett added 2 commits January 14, 2022 09:15

Unit test for worker log.

8eca0de

Improve test coverage for concatenateLogs.

2af0ea4

onufer reviewed Jan 20, 2022

View reviewed changes

armi/runLog.py Outdated Show resolved Hide resolved

Use caseTitle to name worker logs.

7fe3668

mgjarrett added 4 commits January 20, 2022 12:58

Change name of parallel processing logs.

ef99ef7

Fix runLog unit test.

16fde37

Add caseTitle to runLog file name.

f73d4e5

Fix runLog test.

8e8d52a

john-science reviewed Jan 24, 2022

View reviewed changes

armi/tests/test_runLog.py Outdated Show resolved Hide resolved

Move import glob back to original position.

a11edb5

mgjarrett added 2 commits January 24, 2022 16:39

Get correct name for worker logs.

b78c933

More robust caseTitle identification.

9d616fe

Keep caseTitle in name when creating logger.

98b7f2f

john-science merged commit e8609c9 into terrapower:master Feb 3, 2022

scottyak pushed a commit to scottyak/armi that referenced this pull request Oct 27, 2022

Create a separate *.log file for the worker node *.stdouts. (terrapow…

f1a728c

…er#528)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a separate .log file for the worker node .stdouts. #528

Create a separate .log file for the worker node .stdouts. #528

mgjarrett commented Jan 10, 2022 •

edited

Loading

john-science commented Jan 17, 2022

ntouran commented Jan 17, 2022

john-science commented Jan 20, 2022

mgjarrett commented Jan 20, 2022

john-science commented Jan 20, 2022

onufer commented Jan 20, 2022 •

edited

Loading

ntouran commented Jan 24, 2022 •

edited

Loading

mgjarrett commented Jan 24, 2022 •

edited

Loading

mgjarrett commented Jan 24, 2022

onufer commented Jan 24, 2022

ntouran commented Jan 25, 2022

mgjarrett commented Jan 25, 2022

john-science commented Jan 25, 2022 •

edited

Loading

ntouran commented Jan 31, 2022

mgjarrett commented Jan 31, 2022

john-science commented Feb 2, 2022

mgjarrett commented Feb 2, 2022

mgjarrett commented Feb 2, 2022 •

edited

Loading

john-science commented Feb 2, 2022

Create a separate *.log file for the worker node *.stdouts. #528

Create a separate *.log file for the worker node *.stdouts. #528

Conversation

mgjarrett commented Jan 10, 2022 • edited Loading

Description

Checklist

john-science commented Jan 17, 2022

ntouran commented Jan 17, 2022

john-science commented Jan 20, 2022

mgjarrett commented Jan 20, 2022

john-science commented Jan 20, 2022

onufer commented Jan 20, 2022 • edited Loading

ntouran commented Jan 24, 2022 • edited Loading

mgjarrett commented Jan 24, 2022 • edited Loading

mgjarrett commented Jan 24, 2022

onufer commented Jan 24, 2022

ntouran commented Jan 25, 2022

mgjarrett commented Jan 25, 2022

john-science commented Jan 25, 2022 • edited Loading

ntouran commented Jan 31, 2022

mgjarrett commented Jan 31, 2022

john-science commented Feb 2, 2022

mgjarrett commented Feb 2, 2022

mgjarrett commented Feb 2, 2022 • edited Loading

john-science commented Feb 2, 2022

Create a separate .log file for the worker node .stdouts. #528

Create a separate .log file for the worker node .stdouts. #528

mgjarrett commented Jan 10, 2022 •

edited

Loading

onufer commented Jan 20, 2022 •

edited

Loading

ntouran commented Jan 24, 2022 •

edited

Loading

mgjarrett commented Jan 24, 2022 •

edited

Loading

john-science commented Jan 25, 2022 •

edited

Loading

mgjarrett commented Feb 2, 2022 •

edited

Loading