Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preload libsuseconnect.so if available (bsc#1194996) #1236

Merged
merged 1 commit into from
Jan 25, 2022

Conversation

skazi0
Copy link
Contributor

@skazi0 skazi0 commented Jan 24, 2022

On aarch64 installer/YaST sometimes failed to load libsuseconnect.so
with "cannot allocate memory in static TLS block" error.
Loading the library before others solves the problem until a better
solution is found.

Copy link
Member

@dgdavid dgdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @skazi0!

I have asked review from some other members in the team with better knowledge than me in that area. Meanwhile, do you mind bumping version and updating the changelog? It is just a matter of following instructions given in the 4th step at https://yastgithubio.readthedocs.io/en/latest/contributing/#code-changes 😀

Of course, I can help if needed after it got reviewed and approved, but it would be nice if you at least give it a try 😉

On aarch64 installer/YaST sometimes failed to load libsuseconnect.so
with "cannot allocate memory in static TLS block" error.
Loading the library before others solves the problem until a better
solution is found.
@skazi0 skazi0 force-pushed the libsuseconnect-aarch64-preload branch from 3cbd9b6 to 6a14acd Compare January 24, 2022 21:06
@skazi0
Copy link
Contributor Author

skazi0 commented Jan 24, 2022

@dgdavid done (and sorry for not reading the contributing guide before).

@coveralls
Copy link

Coverage Status

Coverage remained the same at 41.594% when pulling 6a14acd on skazi0:libsuseconnect-aarch64-preload into ad8d240 on yast:master.

@dgdavid
Copy link
Member

dgdavid commented Jan 24, 2022

@dgdavid done

Thanks!

(and sorry for not reading the contributing guide before).

No problem at all 😃

Copy link
Member

@lslezak lslezak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! (workaround)

@lslezak lslezak merged commit a0491ba into yast:master Jan 25, 2022
@yast-bot
Copy link
Contributor

✔️ Public Jenkins job #353 successfully finished
✔️ Created OBS submit request #948941

@yast-bot
Copy link
Contributor

✔️ Internal Jenkins job #190 successfully finished
✔️ Created IBS submit request #263042

@mvidner
Copy link
Member

mvidner commented Jan 25, 2022

BTW from pytorch/builder#758 (comment) I guess that a possible solution is to not preallocate memory within libsuseconnect. But I don't know its implementation details.

@skazi0
Copy link
Contributor Author

skazi0 commented Jan 25, 2022

@mvidner see my comments on https://bugzilla.suse.com/show_bug.cgi?id=1194996
The TLS variable is from Go runtime. I don't think we can change this part.

@kobliha
Copy link
Member

kobliha commented Mar 2, 2022

It seems this PR is causing quite some issues, e.g., https://bugzilla.suse.com/show_bug.cgi?id=1196326

@shundhammer shundhammer changed the title Preload libsuseconnect.so if avaiable (bsc#1194996) Preload libsuseconnect.so if available (bsc#1194996) Mar 8, 2022
@shundhammer
Copy link
Contributor

shundhammer commented Mar 8, 2022

Since this causes follow-up problems, we decided to revert this.

https://bugzilla.suse.com/show_bug.cgi?id=1196326

https://bugzilla.suse.com/show_bug.cgi?id=1194996#c11

The workaround for this was to set LD_PRELOAD in the YaST start script if libsuseconnect.so was found, but that causes new problems: This is inherited by child processes, and now every binary started from YaST also gets this LD_PRELOAD, so they are now also force-linked to libsuseconnect.so.

YaST uses external programs extensively; among lots of other things, for storage probing. That means that in many cases, storage probing now fails because of this; mostly due to out of memory because libsuseconnect.so (and the Go runtime libs that it uses?) consumes a lot more RAM.

Not only are we always on the verge of OOM anyway (and we are at the end of the food chain after the kernel, kernel modules, kernel firmware, glibc, libQt and dozens of others), but who knows what other problems this causes. This needs to stop.

It has been suggested that we should clean up LD_PRELOAD again after the YaST start. But after an internal discussion, we decided against that: It would be adding a hack on top of a hack; a workaround for a workaround. And that might create even more problems.

The original problem appears in a low-memory scenario only on aarch64; and now the first hack spread the problem to other architectures as well.

This clearly needs a real solution, not adding hacks on top of hacks. The original problem has been known since early 2019; it even made it to Stack Overflow, where they suggest do set LD_PRELOAD as a workaround.

After almost 3 years (maybe longer), there must be a real fix for this. It really can't be an acceptable solution to spread those workarounds all over all kinds of software; other projects will have the same problem as well.

LD_PRELOAD is a debugging tool for desperate situations, a last-ditch effort; it can't be the regular way to go. But recommendations on the web seem to indicate just that for the aarch64 platform.

It doesn't help the Open Source world if we keep fighting the symptoms and disguise the real problem; this basically forces everyone who ever uses any Golang-based lib to do the same hacky approach, and most of them won't even be aware of follow-up problems for child processes.

That is why we are going to revert the LD_PRELOAD change. This needs a true fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants