Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode help-*.txt file contents into C source code #13144

Merged
merged 2 commits into from
Mar 25, 2025

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Mar 15, 2025

Per discussion from https://github.com/open-mpi/ompi/wiki/Meeting-2025-03-14, we would like to make relocating an Open MPI installation easier. Specifically, text help files are something that Open MPI currently has to find in the install tree filesystem at run-time. If, instead, we can slurp these text files into C code, then there's nothing to find at run time.

That being said, even with this change, developers will still maintain help messages in the various help_*.txt files around the code base. Maintaining descriptive, user-friendly help messages in text files (instead of manually hand-coding long strings in C) has proven to be quite useful. We do not want to lose this capability.

This commit adds a step during "make" (in opal/util), those help_*.txt files are encoded into an indexed array of C strings in opal/util/show_help_content.c.

opal_show_help*() then can look up the appropriate help strings via filename / topic tuples, just like it used to -- these strings now just happen to be in C variables instead of text files.

This work is intended for main / v6.0.x -- not for v5.0.x.

@jsquyres jsquyres added this to the v6.0.0 milestone Mar 15, 2025
@jsquyres jsquyres force-pushed the pr/help-files-as-c-code branch from 30cbe0c to 1cf2f47 Compare March 15, 2025 18:09
@rhc54
Copy link
Contributor

rhc54 commented Mar 15, 2025

Suggestion: have you looked at the PMIx dictionary files (src/include/pmix_dictionary.[h,c])? I generate those (using contrib/construct_dictionary.py) from the include file from the standard, generating an array of structs using the string key as the index.

I was thinking of doing the same thing here. Let the name of the help file be the first field in the struct, then the topic be the second field, and then the text be the final field. This leaves the "show_help" signature the same so we don't have to edit and change all those usages. Now show_help itself just scans thru the array, checking the first field in each struct for the file name and the second for the topic - and then extracts the corresponding block.

Python isn't much different - just stored the results a little differently.

@jsquyres
Copy link
Member Author

Suggestion: ...

Good idea; that's far better than what I did the first time. Let me update the PR to do that...

@jsquyres jsquyres force-pushed the pr/help-files-as-c-code branch from 1cf2f47 to 79d68d7 Compare March 16, 2025 20:07
@jsquyres
Copy link
Member Author

Let me update the PR to do that...

Done. Updated the description in the PR to reflect the new scheme.

@jsquyres jsquyres force-pushed the pr/help-files-as-c-code branch from 79d68d7 to 0d40934 Compare March 16, 2025 20:14
@rhc54
Copy link
Contributor

rhc54 commented Mar 16, 2025

This looks good to me - many thanks for doing it!!

@jsquyres jsquyres force-pushed the pr/help-files-as-c-code branch from 9bd226b to 7e4828f Compare March 18, 2025 14:58
@jsquyres
Copy link
Member Author

Force pushed to fix an OPAL_DECLSPEC problem and some other minor touch-ups.

@rhc54
Copy link
Contributor

rhc54 commented Mar 21, 2025

we'll be adding about ~350KB worth of strings to each process.

Ick - probably too large, IMO. Here's what I'm doing in PMIx/PRRTE as a compromise:

  1. running "check-help-strings.pl" (from OMPI's contrib) to identify problems in the help filesystem. Just a quick scan on PMIx revealed lots of unused topics, some unused files(!), and a number of missing (but referenced) topics. Modifying it to identify duplicate topics as well.
  2. Clean all those up
  3. Identify a handful of topics that might relate to being unable to find the install location or initialize the library. Primary complaint is when we hit those errors and the user can't get a useful error output. Put those handful into memory
  4. Leave the rest as a file lookup. Probably consolidate all current files into one pmix-helpfile.txt to reduce the search complexity.
  5. Add running "check-help-strings.pl" to the CI with an error-out if we find unused topics/files, missing referenced topics, or duplicate topics

Will do the same to PRRTE. Little more complicated there as it uses the PMIx show_help subsystem, so I'll have to add a new PMIx function to allow PRRTE to load help topics/strings into the PMIx in-memory array for those few we need there.

If you feel like/want to convert check-help-strings.pl to Python, that's fine with me! I already extended your code to identify duplicate topics - haven't added the source code scan for the other functional pieces.

@bwbarrett
Copy link
Member

For static linked applications, the 300 KiB is painful. For shared library installations, I'd argue that 300 KiB doesn't make a difference. We should definitely clean up the text files, but in the end, not sure it matters to customers. From a filesystem standpoint, we've just moved 300 KiB from a bunch of text files to the shared library. The total size to install a functioning libmpi.so doesn't really change. And since it's read-only text section in the shared library, it will only exist in memory once. Even on a Raspberry Pi, 300 KiB more memory is noise.

I guess I'm saying we should clean things up, but not go overboard. The estimate is also a little high, because I didn't filter out the copyright headers, and they wouldn't be copied. There's also some things we could theoretically do to claw back 50 KiB or so, like instead of slurping all the text files into the primary shared libraries, slurping the application help files (which are 3 of the 8 largest files) into the applications (prun, palloc, pctrl), which would move 10 KiB out of the various shared libraries.

There are also text files like opal/mca/accelerator/cuda/help-accelerator-cuda.txt that are very duplicative. I think we could pass in the function call name that failed and save a bunch of entries there.

@ggouaillardet
Copy link
Contributor

My two cents: an other option to mitigate the memory overhead is to compress the data decompress it in memory only if needed.

@bwbarrett
Copy link
Member

Compression seems like a lot of work, but also would help. tmp.tar contains all the .txt files for PMIx, PRRTE, OPAL, and OMPI.

% ls -lh tmp.tar 
-rw-r--r-- 1 bbarrett Domain Users 420K Mar 21 15:32 tmp.tar
[bbarrett@ip-10-0-1-96:~/projects/open-mpi]
% gzip tmp.tar 
[bbarrett@ip-10-0-1-96:~/projects/open-mpi]
% ls -l tmp.tar*
-rw-r--r-- 1 bbarrett Domain Users 77903 Mar 21 15:32 tmp.tar.gz

@ggouaillardet
Copy link
Contributor

Agree, we should first assess if we really need it.
And if we do, use the compress framework, I suspect xz or even bzip2 with max compression can outperform default gzip.

@rhc54
Copy link
Contributor

rhc54 commented Mar 21, 2025

I have no issue with putting these into memory. In truth, you should not be counting the PRRTE text as that is only in the PRRTE tools - the apps do not link against libprrte. I've spent some time cleaning up the PMIx help text and will finish that in the upcoming week. It's the lesser footprint of the two projects.

I'll still need to modify PMIx to allow PRRTE to load its help text into the PMIx array. I also need to do something to resolve the rst vs txt issue - need to stop having to do everything twice, but the prior way of trying to generate the txt from the rst wasn't working well enough. Might have a solution - remains to be tried.

I also need to do something to simplify the rst stuff. Really isn't working in terms of managing things right now - the OMPI schizo rst is out-of-date because it is unmaintained, I have to split everything into tiny individual rst files, mpirun's man page is out-of-date, etc. Feels like a lot of extra work on my side for no real benefit as nobody maintains the other end. Need to ponder it a bit, come up with a better strategy.

Bottom line is that this is turning into quite the job, probably going to take another 2-3 weeks to complete. I'm aiming it at PMIx v6 and PRRTE v4 as it will not be backward compatible on either side. We have a break anyway due to the LTO work, so this should be good timing.

@jsquyres
Copy link
Member Author

FWIW, I did my own check of the main branch and I came up with a smaller number:

$ wc -c `find . -name help\*.txt | grep -v 3rd-party | grep -v contrib/` 
...
  87252 total

~87K, not 350K.

So where do we want to go with this? I'm not sure that 87K is worth using compression.

@rhc54
Copy link
Contributor

rhc54 commented Mar 23, 2025

I would avoid compression as it creates dependencies (remember, lack of zlib is our highest show-help incidence) and really isn't worth it. Like I said, you can remove all of the PRRTE text from your count as that only applies to the daemons - and it is the largest amount as it includes all the help for map/rank/bind and other major topics. PMIx really doesn't have very much itself, and that's the portion you'd have in your library.

@jsquyres
Copy link
Member Author

87k is just the opal, ompi, and oshmem directories.

@rhc54
Copy link
Contributor

rhc54 commented Mar 23, 2025

Ah, I see - didn't grok the "-v" option behavior.

FWIW: PMIx has about 30k if you exclude the PMIx tools themselves.
PRRTE has about 130k is you exclude the tool cmd line help text - most of that is in text explaining proc placement.

@jsquyres jsquyres marked this pull request as ready for review March 24, 2025 22:35
@bwbarrett
Copy link
Member

Other than a couple nits around the build system, this all looks good to me.

@jsquyres jsquyres force-pushed the pr/help-files-as-c-code branch from 7e4828f to 88a449a Compare March 25, 2025 00:25
@jsquyres
Copy link
Member Author

  • Rebased
  • Updated per @bwbarrett's comments
  • Made all the remaining Makefile.am changes

Per discussion from
https://github.com/open-mpi/ompi/wiki/Meeting-2025-03-14, we would
like to make relocating an Open MPI installation easier.
Specifically, text help files are something that Open MPI currently
has to find in the install tree filesystem at run-time.
If, instead, we can slurp these text files into C code, then there's
nothing to find at run time.

That being said, even with this change, developers will still maintain
help messages in the various help_*.txt files around the code base.
Maintaining descriptive, user-friendly help messages in text files
(instead of manually hand-coding long strings in C) has proven to be
quite useful.  We do not want to lose this capability.

This commit adds a step during "make" (in opal/util), those help_*.txt
files are encoded into an indexed array of C strings in
opal/util/show_help_content.c.

opal_show_help*() then can look up the appropriate help strings via
filename / topic tuples, just like it used to -- these strings now
just happen to be in C variables instead of text files.

This work is intended for main / v6.0.x -- not for v5.0.x.

Signed-off-by: Jeff Squyres <jeff@squyres.com>
Commit a525e70 converted opal_show_[v]help into global function
pointer variables that were re-assigned a few times during the
lifecycle of an application.  This allows calls top
opal_show_[v]help() to emit error messages different ways at different
times -- e.g., before various subsystems were initialized, after they
were initialized, and after they were shutdown.

Long ago, however, the show_help infrastructure was converted to use
PMIx under the covers, and this switcharoo was no longer necessary.
Specifically: show_help always just calls PMIx and PMIx does the Right
Things during at different stages of the process lifecycle.  This
commit changes opal_show_[v]help back to be a regular function -- not
a function pointer.

Signed-off-by: Jeff Squyres <jeff@squyres.com>
@jsquyres jsquyres force-pushed the pr/help-files-as-c-code branch from 88a449a to d1bf286 Compare March 25, 2025 01:11
@jsquyres jsquyres merged commit 674f46c into open-mpi:main Mar 25, 2025
15 checks passed
@jsquyres jsquyres deleted the pr/help-files-as-c-code branch March 25, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants