Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a separate *.log file for the worker node *.stdouts. #528

Merged
merged 13 commits into from
Feb 3, 2022

Conversation

mgjarrett
Copy link
Contributor

@mgjarrett mgjarrett commented Jan 10, 2022

Description

At the end of a run, all of the logs from the individual MPI processes get concatenated together and appended to the end of the *.stdout. If there are a lot of MPI processes, this can make the *.stdout obnoxiously bloated, with a bunch of information that the user would rarely ever want to look at.

Instead, this PR puts all of those worker node logs into a separate file, .\logs\armi-mpi-workers.log
It would probably be best if the file had the name of the case (i.e., the same as the *.stdout and *.stderr), but I don't know if that name is available to the logger at this level. It's just writing those to the streams sys.stdout and sys.stderr. If somebody has a better idea for naming I'm all ears; this was just kind of a temporary name to make sure the code worked.


Checklist

  • Tests have been added/updated to verify that the new or changed code works.
  • Docstrings were included and updated as necessary
  • The code is understandable and maintainable to people beyond the author
  • There is no commented out code in this PR.

If user exposed functionality was added/changed:

  • Documentation added/updated in the doc folder.
  • New or updated dependencies have been added to setup.py.

armi/runLog.py Outdated Show resolved Hide resolved
@john-science
Copy link
Member

@ntouran Michael has explained his reasoning, but I am curious what you think of this change to our logging.

@ntouran
Copy link
Member

ntouran commented Jan 17, 2022

This is reasonable in concept for sure. I think it is worth trying to get the same to be synced up with the caseTitle though, because otherwise it will be easy to miss in archiving files, and multiple runs of in the same dir (e.g. param sweeping) will overwrite one another. (though honestly running concurrently in the same dir probably is a bad idea regardless).

@john-science
Copy link
Member

I think it is worth trying to get the same to be synced up with the caseTitle...

@mgjarrett What do you think of Nick's idea?

@mgjarrett
Copy link
Contributor Author

Yeah I think it would be best to have the caseTitle as the name, but I'm not sure exactly how it would be done and I haven't had time to look at it.

armi/runLog.py Outdated Show resolved Hide resolved
@john-science
Copy link
Member

I think it is worth trying to get the same to be synced up with the caseTitle though....

I like this idea. In retrospect, it seems like the caseTitle should be in the output file name for all log files. That would have been a good design.

However... the caseTitle is not readily assessible in runLog.py right now. We could call getMasterCS(), but I'm fairly sure that will break a bunch of plugins and our docs. Because people want to be able to use our runLogger in all sorts of very general situations.

Maybe that could be called inside concatLogs() since that is probably only ever called during a parallel/cluster-type run. That might not break things. But certainly, let's not call getMasterCS() anywhere else.

One question I have is WHY isn't this already set to caseTitle? I notice these two lines exist:

armi/cases/case.py:        runLog.LOG.startLog(self.cs.caseTitle)
armi/operators/operator.py:        runLog.LOG.startLog(self.cs.caseTitle)

@onufer
Copy link
Member

onufer commented Jan 20, 2022

I think startLog is the log in the execution directory. The logs that Mike is reading are in the fast path.

@ntouran
Copy link
Member

ntouran commented Jan 24, 2022

Sorry for the late questions on this, but on the discussion level, can we talk a little bit more about the motivation of this? Ideally we'd answer these questions in implementation docstrings in the code itself so it lives on.

  • Having ~hundreds of files will be harder to manage and archive for qualified runs. Concatenating them into one provides one record of the run's output and its pedigree (e.g. its hash), which is fairly convenient from a QA point of view. That was the motivation for concatenating them in the first place
  • Can you explain in more detail how having lots of info in one big stdout is inconvenient? I typically just use find/filtering to filter do the node of interest using my text editor.
  • Could we make concatenate or not a user-facing option and leave it up to the users?

@mgjarrett
Copy link
Contributor Author

mgjarrett commented Jan 24, 2022

The goal is not to have ~hundreds of files, but to separate the current stdout into:
!. The master node stdout that has everything somebody might usually want to read from stdout, like CR worth, k-eff, assembly designs, etc.
2. A separate file that has all of the stdouts from the workers, which are usually very repetitive and don't have much new useful information in most cases.

The unit tests are passing, but I haven't yet verified that it works with MPI with the latest commit.

@mgjarrett
Copy link
Contributor Author

The branchVerbosity setting might already be a satisfactory solution to the root problem, which is that the stdout tends to be too big when branchVerbosity is extra. warning or error are probably better options when the user doesn't care about these worker logs.

@onufer
Copy link
Member

onufer commented Jan 24, 2022

There are times when the workers are useful... so i don't want to get rid of the data entirely. Its just appending all the workers to the main standard does not enable the use of the data though since a lot of time you are just looking at the stdout to see how your run is and then it updates, but the file gets like 50x larger.

Setting workers to "error" sounds like an okay solution, but a lot of times you may want the data, but just don't want to view it at the same time you are viewing the normal stdout. There should be an option to store the worker data without having to view it with the main stdout data.

@ntouran
Copy link
Member

ntouran commented Jan 25, 2022

I say we go ahead with this PR. Since it's just 2 files, that's not a huge QA burden, and even with branchVerbosity options it can be a bit verbose. If stdout gets too big and you're trying to refresh it over a slow VPN there a good case to be made to have them be separated.

I would like to see a follow-up ticket, maybe for our internal plugins, that says "get rid of all these totally useless xtra messages in the worker nodes" :)

@mgjarrett
Copy link
Contributor Author

Just a heads up, we're still working on getting the caseTitle out of the logger filenames, so it's not fully baked yet.

@john-science
Copy link
Member

john-science commented Jan 25, 2022

I would like to see a follow-up ticket, maybe for our internal plugins, that says "get rid of all these totally useless xtra messages in the worker nodes" :)

Hear, hear!

As any ecosystem ages, it's important to clean out unused log messages, and push the rarely-used ones to debug-level.

@ntouran
Copy link
Member

ntouran commented Jan 31, 2022

Ok so is it fully baked now after those last two commits? Ready to be merged?

@mgjarrett
Copy link
Contributor Author

Needed one more commit because the __init__ for RunLogger was throwing away the caseTitle in MPI cases and just initializing a logger with the first piece of the name, which is just "ARMI". Running a case now to verify that everything works correctly when the files are concatenated at the end of the run.

@john-science
Copy link
Member

black is complaining again.

@mgjarrett
Copy link
Contributor Author

Is it something I can fix? It looks like a bunch of files unrelated to this PR.

@mgjarrett
Copy link
Contributor Author

mgjarrett commented Feb 2, 2022

I have already run black.

(nala-dev) C:\Users\mjarrett\codes\nala\framework>black armi
All done! ✨ � ✨
406 files left unchanged.

If you look at the output from this black run, the changes are all to files that are not in this PR. That's why I don't know what I'm supposed to do here. Sometimes there is a problem with the black version changing, but I'm not sure whether that is what is happening here.

@john-science
Copy link
Member

I have fixed the black issue in the ARMI master branch.

It was unrelated to this PR.

@john-science john-science merged commit e8609c9 into terrapower:master Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants