Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wesnoth uses 100% CPU unless OMP_WAIT_POLICY is manually set #4130

Open
marmistrz opened this issue Jun 21, 2019 · 12 comments

Comments

Projects
None yet
6 participants
@marmistrz
Copy link

commented Jun 21, 2019

TL;DR: The workaround is:

export OMP_WAIT_POLICY=passive
wesnoth

This is different from #1474, because that issue is not OpenMP-related.
This issue was already mentioned at the wesnoth forums, but is missing from the bugtracker. It looks like some of the Windows scripts (e.g. cwesnoth.cmd) already take this variable into account, but this is not the case on Linux.

The gcc documentation mentions that

Specifies whether waiting threads should be active or passive. If the value is PASSIVE, waiting threads should not consume CPU power while waiting; while the value is ACTIVE specifies that they should. If undefined, threads wait actively for a short time before waiting passively.

In other words, it looks like wesnoth has a lot of frequent but light computation, which makes the OpenMP threads spinlock all the time. On the other hand, why would idling in a hot-seat match require that much computation?

@marmistrz marmistrz changed the title Wesnoth uses 100% CPU unless OMP_WAIT_POLICY is changed Wesnoth uses 100% CPU unless OMP_WAIT_POLICY is manually set Jun 21, 2019

@gfgtdf

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

Iirc we removed the omp code in master so this should not longer happen in 1.15

@sevu

This comment has been minimized.

Copy link
Member

commented Jun 21, 2019

It probably was reverted with the new master.

@soliton-

This comment has been minimized.

Copy link
Member

commented Jun 21, 2019

Indeed there are still parts left in src/display.cpp in master which should presumably just be removed.

1.14 already sets OMP_WAIT_POLICY to PASSIVE and restarts itself if it isn't set.

wesnoth/src/wesnoth.cpp

Lines 1057 to 1081 in 3771ac0

#ifdef _OPENMP
// Wesnoth is a special case for OMP
// OMP wait strategy is to have threads busy-loop for 100ms
// if there is nothing to do, they then go to sleep.
// this avoids the scheduler putting the thread to sleep when work
// is about to be available
//
// However Wesnoth has a lot of very small jobs that need to be done
// at each redraw => 50fps every 2ms.
// All the threads are thus busy-waiting all the time, hogging the CPU
// To avoid that problem, we need to set the OMP_WAIT_POLICY env var
// but that var is read by OMP at library loading time (before main)
// thus the relaunching of ourselves after setting the variable.
#if !defined(_WIN32) && !defined(__APPLE__)
if(!getenv("OMP_WAIT_POLICY")) {
setenv("OMP_WAIT_POLICY", "PASSIVE", 1);
execv(argv[0], argv);
}
#elif _MSC_VER >= 1600
if(!getenv("OMP_WAIT_POLICY")) {
_putenv_s("OMP_WAIT_POLICY", "PASSIVE");
restart_process();
}
#endif
#endif //_OPENMP

Not sure if that happens in every situation that openmp could be enabled though.

@marmistrz

This comment has been minimized.

Copy link
Author

commented Jun 22, 2019

@soliton- I'm experiencing the issue on v1.14.7, on Arch Linux.

@soliton-

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

Well, perhaps you can figure out what's going on in more detail. The above code means that for Arch Linux the part between line 1070 and 1075 is run which sets OMP_WAIT_POLICY to PASSIVE and restarts itself unless OMP_WAIT_POLICY is already set to something.

Is OMP_WAIT_POLICY set to something before starting wesnoth? If echo ${OMP_WAIT_POLICY+set} outputs set then it is.

You could also run wesnoth under strace: strace -ff -e process wesnoth and see if you can verify that it does or doesn't restart itself.

@marmistrz

This comment has been minimized.

Copy link
Author

commented Jun 24, 2019

Notice that wesnoth doesn't check if execv syscall error code. strace shows that it fails:

execve("wesnoth", ["wesnoth"], 0x564cea84fc30 /* 52 vars */) = -1 ENOENT (No such file or directory)

OTOH, the execve syscall succeeds if the game is launched using the full path, i.e. as /usr/bin/wesnoth.

@soliton-

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

So wesnoth is not able to restart itself. We should perhaps abort at that point or at least show an error message. Can you show more of the strace output? What does the first execve look like?

@jostephd

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

execve doesn't do a path search. execve("/usr/games/wesnoth", ...) (or whatever the path to the wesnoth binary is) would be valid; execvpe("wesnoth", ...) would be valid; execve("wesnoth") is not. (But execvpe is a GNU extension, it's not portable.)

@soliton-

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

That is why I want to see the first execve to see how wesnoth was started initially. Specifically what's in argv[0]. Though it's likely going to be just "wesnoth" so it seems this restart code is not particularly useful to begin with. Probably was only tested with the wesnoth binary in the current working directory.

@soliton-

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

A trivial fix is probably to just use execvp instead of execv.

@GregoryLundberg

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

Seems to me the better fix would be to backport the removal of OpenMP and make the problem go away completely.

@marmistrz

This comment has been minimized.

Copy link
Author

commented Jul 7, 2019

It's how the calls look like from strace:

execve("/usr/bin/wesnoth", ["wesnoth"], 0x7ffdedeefb08 /* 51 vars */) = 0
execve("wesnoth", ["wesnoth"], 0x55741da1bc30 /* 52 vars */) = -1 ENOENT (No such file or directory)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.