Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchonization hazards on multicore systems #6241

Closed
carli2 opened this issue Nov 24, 2023 · 13 comments
Closed

Synchonization hazards on multicore systems #6241

carli2 opened this issue Nov 24, 2023 · 13 comments
Labels
bug Something isn't working memory & performance Fix bottlenecks, memory leaks, ASan, ... multithreading Deadlocks, race conditions, mutex trouble, …
Milestone

Comments

@carli2
Copy link

carli2 commented Nov 24, 2023

The problem occurs on the following system:

OS: linux
Widelands versions: All since rendering and game logic has been separated into different threads.

CPU:

vendor_id	: AuthenticAMD
cpu family	: 23
model		: 113
model name	: AMD Ryzen 9 3900X 12-Core Processor
stepping	: 0
microcode	: 0x8701013
cpu MHz		: 4010.400
cache size	: 512 KB
physical id	: 0
siblings	: 24
core id		: 14
cpu cores	: 12
apicid		: 29
initial apicid	: 29
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso
bogomips	: 7600.07
TLB size	: 3072 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

Now to the problem:
In normal operation, widelands game speed drops down to 0.1 and slower, even on medium and small maps. My CPU has 12 cores, 24 threads and 4 GHz, so ressources aren't a problem.

However when I run widelands with taskset -c 0 ./widelands, the performance is great.

What does taskset -c do? It pins a process to a single CPU core, so it effectively disables multicore processing and turns to preemptive scheduling.

What is the cause of widelands performance drops in multicore systems?

Widelands experiences the effects of so called synchronization hazards which means that two threads are operating on the same piece of memory (in this case e.g. the GUI thread renders a warehouse statistic and the game logic thread updates numbers in the statistics) - this means two threads accesss the same piece of memory.

The CPU will in this case flush all caches, write back the memory cache line, let the other CPU load the cache lines and so forth. This all takes a 100-fold of the time in comparison to a single-core mode where no cache lines have to be exchanged.

There are multiple ways to fix this issue:
a) remove the multicore code and go back to single core processing
b) separate the memory either locally or timely
b.1) locally -> after game logic has updated the map, the whole map gets copied into a second instance on which the GUI thread can now operate (so both threads operate on completely disjunct memory parts)
b.2) timely -> multicore code is kept but logic and GUI are synced so that the logic thread waits for the GUI thread to finish a render cycle before starting the next logic cycle. The logic cycle can last multiple GUI cycles but at least during "normal" game operation with enough CPU resources, the sync should evade most synchronization hazards.

@carli2 carli2 added the bug Something isn't working label Nov 24, 2023
@tothxa tothxa added memory & performance Fix bottlenecks, memory leaks, ASan, ... multithreading Deadlocks, race conditions, mutex trouble, … labels Nov 24, 2023
@hessenfarmer
Copy link
Contributor

hessenfarmer commented Nov 24, 2023

thanks for the explanations. this might help a lot in fixing the current performance problems.
However reverting the multithreading will not be an option imho as the reason for introducing it were massive performance problems in our single threaded design before.
Edit:
As far as I understand it the implementation of our mutexes aim exactly at achieving b2)

@bunnybot
Copy link

frankystoneMirrored from Codeberg
On Fri Nov 24 17:13:19 CET 2023, ** (frankystone)* wrote:*


Thanks carli2.

On Fri Nov 24 13:24:27 CET 2023, hessenfarmer wrote:

However reverting the multithreading will not be an option imho as the reason for introducing it were massive performance problems in our single threaded design before.

Are you really sure about that? As far i remember there were no performance issues, except with debug builds. But also with mutlithreading debug builds tend to slow down after a while. IMHO, the reason to implement multithreading was just this bugreport (originally from the year 2016). On my old laptop with a celeron 900 processor widelands did always run fine. Will test widelands on this laptop again.

Please don't get me wrong, im not against multithreading, but i have the feeling it has more drawbacks than profits.

Anyway: Isn't this a duplicate of #6230 ? Or is this another issue? At least <@>hessenfarmer should check his processor specs, because he had such an issue (lags) as well. Here are mine (no issue):

vendor_id       : AuthenticAMD
cpu family      : 25
model           : 33
model name      : AMD Ryzen 5 5600 6-Core Processor
stepping        : 2
microcode       : 0xa20120a
cpu MHz         : 2200.000
cache size      : 512 KB
physical id     : 0
siblings        : 12
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
[...]

@carli2
Copy link
Author

carli2 commented Nov 24, 2023

I cite from #2056 :

c) Redesign major games structures for Mutithreading

Well, this is a huge task of its own

this should have been done before someone just adds multithreading to such a codebase.

@hessenfarmer
Copy link
Contributor

i tested limiting the processs to one core on windows as well, and I still could provoke the lags even in singleplayer with the inventory and 2 warehouse windows.
So I think it is an issue of a mutex race or something like that and at least on windows not related whether the threads run on 1 or multiple cores.

@hessenfarmer
Copy link
Contributor

Furthermore even 1.1 had multithreading enabled, and did not suffer from such lagging, so it is an issue of 1.2 perhaps related to multithreading, but not originally caused by the multithreading. So I believe this issue is misleading here.

@bunnybot
Copy link

frankystoneMirrored from Codeberg
On Sun Nov 26 11:05:45 CET 2023, ** (frankystone)* wrote:*


<@>carli2, can you provoke the issue with v1.1?

@carli2
Copy link
Author

carli2 commented Nov 27, 2023

I took the time to compile good old release 1.1. Here are the results:

image

I tried both, single and multicore. This time, in both cases, the game ran fine for the first few seconds (speed 22.0) but then got stuck and laggy. After returning to gamespeed 1.0, it didn't fix the lag. The lag stayed until I saved, quitted and reloaded the game.

@carli2
Copy link
Author

carli2 commented Nov 27, 2023

Another interesting fact:

If I just save the game, leave the game and then reenter the game from a loadgame, the lag stays.

However if I close widelands, the lag disappears for the first few minutes.

@frankystone
Copy link
Contributor

Thanks for the test with 1.1.

Playing at speed 22 will probably always slow down the game. You may had many warnings in the console like "Warning: Gamespeed too high for the AI, it is past x sec now" (or similar).

Did you test the lags caused by opening inventory and warehouse windows?

@carli2
Copy link
Author

carli2 commented Nov 28, 2023

I did not test the difference between with/without warehouse. Do you need the test on 1.1?

@frankystone
Copy link
Contributor

I did not test the difference between with/without warehouse. Do you need the test on 1.1?

Yes. Since hessenfarmer can't provoke the lags with open inventory and warehouse windows in v1.1 it would be interesting if you can provoke it. If not then the issue is somewhere in v1.2, if the lags happens on your machine with v1.1 then this is really a different problem, imho.

@hessenfarmer
Copy link
Contributor

as we found a solution for #6230 which might have been related which got into 1.2 could you @carli2 confirm whether this issue is still relevant or whether we may close it

@carli2
Copy link
Author

carli2 commented May 3, 2024

I tested it extensively and it seems to work lagless now.

@carli2 carli2 closed this as completed May 3, 2024
@bunnybot bunnybot added this to the v1.2 milestone May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working memory & performance Fix bottlenecks, memory leaks, ASan, ... multithreading Deadlocks, race conditions, mutex trouble, …
Projects
None yet
Development

No branches or pull requests

5 participants