Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Game is stuttering #4963

Open
frankystone opened this issue Jun 20, 2021 · 19 comments
Open

Game is stuttering #4963

frankystone opened this issue Jun 20, 2021 · 19 comments
Labels
memory & performance Fix bottlenecks, memory leaks, ASan, ... multithreading Deadlocks, race conditions, mutex trouble, …
Milestone

Comments

@frankystone
Copy link
Contributor

After long time not playing i started a new game today. But i am a bit disappointed about the speed... The game noticeable stutters some time. Animals and carriers are walking not with the same speed during a single walk. Placing a building can last a while (click a building to place, but after some noticeable milliseconds the construction appears on the map) and it seems to me that even the mouse does not react smooth.

This is most noticeable when setting Max-FPS to 99. Setting this to a value of 30 the game reacts a little bit smoother, but playing at 2x speed it feels like the game speed switches between 1x and 2x speed in irregular intervals.

To Reproduce
What i have done:

  1. Chosen the map Astoria 2.0
  2. Set myself as Player 4/Empire
  3. Changed Playerslot 1 to AI and Barbarians
  4. Set HQ-Hunter wincondition
  5. Started the game

Right after start one can see the Animals stumbling around. But i remember this was also in versions before releasing 1.0 (the versions i played before this time where all before version 1.0, but always current master).

Expected behavior
A smooth game play.

Version:

  • OS: archlinux
  • Widelands Version: [d94928c@master] (Debug)

Maybe because i use a debug build? This wasn't a problem so far, except on really big maps. Will also test a release build soon.

@frankystone frankystone added the bug Something isn't working label Jun 20, 2021
@Noordfrees
Copy link
Member

The speed is the same as it used to be (debug builds on large maps tend to be very slow), but it is more noticeable now than before due to multithreading. In 1.0, Widelands simply hangs for a brief (more or less) interval every time the game state is progressing by a certain number of milliseconds; whereas now the game is redrawn all the time, which has the side-effect that the continuous alterations in the maximum game speed your CPU can actually handle (and which may be lower than the desired speed setting), as well as the delays between a click and its effect, are more noticeable.

The largest portion of those slow-speed periods are filled by the AI – which can not be reasonably multithreaded – and the economy routing code – which I intend to make multithreading-compatible in the near future which will improve performance.

@hessenfarmer
Copy link
Contributor

I simply do not understand enough to know what has to be done to make the Ai thread safe. Discussing such things would be one big benefit of the proposal of @klaus-halfmann to have a devcon. So if somebody could teach me some principles I might help with my understanding of the AI code

@Noordfrees
Copy link
Member

Noordfrees commented Jul 16, 2021

The AI could be run pretty much independent from everything else already except for one thing: The state of the game might change while the AI is analysing it. For example look at any piece of AI code:

widelands/src/ai/defaultai.cc

Lines 4038 to 4062 in f305643

bool DefaultAI::create_shortcut_road(const Widelands::Flag& flag,
uint16_t checkradius,
const Time& gametime) {
// Increasing the failed_connection_tries counter
// At the same time it indicates a time an economy is without a warehouse
// TODO(Nordfriese): Someone should update the code since the big economy splitting for the
// ferries
EconomyObserver* eco = get_economy_observer(flag.economy(Widelands::wwWORKER));
// if we passed grace time this will be last attempt and if it fails
// building is destroyed
bool last_attempt_ = false;
// this should not happen, but if the economy has a warehouse and a dismantle
// grace time set, we must 'zero' the dismantle grace time
if (!flag.get_economy(Widelands::wwWORKER)->warehouses().empty() &&
eco->dismantle_grace_time.is_valid()) {
eco->dismantle_grace_time = Time();
}
// first we deal with situations when this is economy with no warehouses
// and this is a flag belonging to a building/constructionsite
// such economy must get dismantle grace time (if not set yet)
// end sometimes extended checkradius
if (flag.get_economy(Widelands::wwWORKER)->warehouses().empty() && flag.get_building()) {

With the current implementation, the game logic freezes while the AI's code is running, guaranteeing that the game state is the same at all lines. But if we put the AI into its own thread, then it may change at any point while the AI is running. For example, in this code snippet flag is initially valid, but if this flag happens to be deleted e.g. because another player conquers some land then it may become a dangling reference at any moment so it may e.g. be still valid in line 4046 and suddenly segfault in line 4053. The gametime reference will also keep changing its value.

The solution is to add an appropriate MutexLock in every section of code that needs to assume that a certain object will not change while the code is executing, like so:

void InputQueueDisplay::set_priority(const Widelands::WarePriority& priority) {
MutexLock m(MutexLock::ID::kObjects);
Widelands::Building* b = building_.get(ibase_.egbase());
if (!b) {
return;
}
if (!can_act_ || !has_priority_ ||
priority == (queue_ ? b->get_priority(type_, index_) : get_setting()->priority)) {
return;
}
if (Widelands::Game* game = ibase_.get_game()) {
game->send_player_set_ware_priority(*b, type_, index_, priority, settings_ != nullptr);
} else {
if (queue_) {
b->set_priority(type_, index_, priority);
} else {
get_setting()->priority = priority;
}
}
}

This lock ensures that no changes related to map objects will happen while the function is running, so we can guarantee that the building pointer does not suddenly turn invalid at some point.

The problem with the current AI code is that is uses object pointers that stay in existence for a very long time. It would be necessary to create large-scoped mutex locks, which would make the thread so inefficient that having a separate thread would offer virtually no benefits in the first place. So the code would need to be restructured to use OPtr instead of real pointers/references a lot, and create a MutexLock whenever an actual pointer/reference needs to be obtained, and drop all assumptions that a pointer that is valid at a point is still valid a few lines later unless it's guarded with a MutexLock.

If you like I can create a branch to give the AI its separate thread and you can then look at the crashes and fix each one symptomatically until there are none left… which is pretty much what I did to make the rest of the codebase threadsafe :D

@Noordfrees Noordfrees added memory & performance Fix bottlenecks, memory leaks, ASan, ... multithreading Deadlocks, race conditions, mutex trouble, … and removed bug Something isn't working labels Jul 16, 2021
@Noordfrees Noordfrees added this to the v1.1 milestone Jul 16, 2021
@hessenfarmer
Copy link
Contributor

I would appreciate If you could create such a branch for me. Although I don't know how far I will get with it. At least I might learn something. Furthermore I wanted to try to restructure the AI code anyway to make it more readable as it is currently to unstructured imho.

@Noordfrees
Copy link
Member

Noordfrees commented Jul 16, 2021

Branch is up: https://github.com/widelands/widelands/tree/multithread-ai
And the first crash a few seconds after start in an AI-only LAN game is:

=================================================================
==13641==ERROR: AddressSanitizer: heap-use-after-free on address 0x60400423f228 at pc 0x55863c6d46d0 bp 0x7fdc3c852090 sp 0x7fdc3c852080
READ of size 8 at 0x60400423f228 thread T25
    #0 0x55863c6d46cf in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_S_right(std::_Rb_tree_node_base*) /usr/include/c++/10/bits/stl_tree.h:798
    #1 0x55863c6c1c96 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #2 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #3 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #4 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #5 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #6 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #7 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #8 0x55863c98d247 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::clear() /usr/include/c++/10/bits/stl_tree.h:1261
    #9 0x55863c989fab in std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> >::clear() /usr/include/c++/10/bits/stl_set.h:734
    #10 0x55863d8621e2 in AI::DefaultAI::update_buildable_field(AI::BuildableField&) ../src/ai/defaultai.cc:1783
    #11 0x55863d85ca5a in AI::DefaultAI::update_all_not_buildable_fields() ../src/ai/defaultai.cc:1447
    #12 0x55863d84b9c8 in AI::DefaultAI::think() ../src/ai/defaultai.cc:370
    #13 0x55863d83fe77 in AI::ComputerPlayer::thread_function() ../src/ai/computer_player.cc:52
    #14 0x55863d847895 in void std::__invoke_impl<void, void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>(std::__invoke_memfun_deref, void (AI::ComputerPlayer::*&&)(), AI::ComputerPlayer*&&) /usr/include/c++/10/bits/invoke.h:73
    #15 0x55863d8476fa in std::__invoke_result<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>::type std::__invoke<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>(void (AI::ComputerPlayer::*&&)(), AI::ComputerPlayer*&&) /usr/include/c++/10/bits/invoke.h:95
    #16 0x55863d84765a in void std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/10/thread:264
    #17 0x55863d84760f in std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> >::operator()() /usr/include/c++/10/thread:271
    #18 0x55863d8475ef in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> > >::_M_run() /usr/include/c++/10/thread:215
    #19 0x7fdc7b40e693  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xda693)
    #20 0x7fdc7b55844f in start_thread nptl/pthread_create.c:473
    #21 0x7fdc7b0f6d52 in clone (/lib/x86_64-linux-gnu/libc.so.6+0x117d52)

0x60400423f228 is located 24 bytes inside of 40-byte region [0x60400423f210,0x60400423f238)
freed by thread T27 here:
    #0 0x7fdc7d8550c7 in operator delete(void*) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:160
    #1 0x55863c6f3d07 in __gnu_cxx::new_allocator<std::_Rb_tree_node<unsigned int> >::deallocate(std::_Rb_tree_node<unsigned int>*, unsigned long) /usr/include/c++/10/ext/new_allocator.h:133
    #2 0x55863c6ee0a0 in std::allocator_traits<std::allocator<std::_Rb_tree_node<unsigned int> > >::deallocate(std::allocator<std::_Rb_tree_node<unsigned int> >&, std::_Rb_tree_node<unsigned int>*, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:492
    #3 0x55863c6e531c in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_put_node(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:588
    #4 0x55863c6d4753 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_drop_node(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:655
    #5 0x55863c6c1ccb in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1921
    #6 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #7 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #8 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #9 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #10 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #11 0x55863c6c1ca8 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*) /usr/include/c++/10/bits/stl_tree.h:1919
    #12 0x55863c98d247 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::clear() /usr/include/c++/10/bits/stl_tree.h:1261
    #13 0x55863c989fab in std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> >::clear() /usr/include/c++/10/bits/stl_set.h:734
    #14 0x55863d8621e2 in AI::DefaultAI::update_buildable_field(AI::BuildableField&) ../src/ai/defaultai.cc:1783
    #15 0x55863d85ca5a in AI::DefaultAI::update_all_not_buildable_fields() ../src/ai/defaultai.cc:1447
    #16 0x55863d84b9c8 in AI::DefaultAI::think() ../src/ai/defaultai.cc:370
    #17 0x55863d83fe77 in AI::ComputerPlayer::thread_function() ../src/ai/computer_player.cc:52
    #18 0x55863d847895 in void std::__invoke_impl<void, void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>(std::__invoke_memfun_deref, void (AI::ComputerPlayer::*&&)(), AI::ComputerPlayer*&&) /usr/include/c++/10/bits/invoke.h:73
    #19 0x55863d8476fa in std::__invoke_result<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>::type std::__invoke<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>(void (AI::ComputerPlayer::*&&)(), AI::ComputerPlayer*&&) /usr/include/c++/10/bits/invoke.h:95
    #20 0x55863d84765a in void std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/10/thread:264
    #21 0x55863d84760f in std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> >::operator()() /usr/include/c++/10/thread:271
    #22 0x55863d8475ef in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> > >::_M_run() /usr/include/c++/10/thread:215
    #23 0x7fdc7b40e693  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xda693)

previously allocated by thread T25 here:
    #0 0x7fdc7d8545a7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x55863c6fcf1c in __gnu_cxx::new_allocator<std::_Rb_tree_node<unsigned int> >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115
    #2 0x55863c6f9373 in std::allocator_traits<std::allocator<std::_Rb_tree_node<unsigned int> > >::allocate(std::allocator<std::_Rb_tree_node<unsigned int> >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460
    #3 0x55863c6f42da in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584
    #4 0x55863ca51881 in std::_Rb_tree_node<unsigned int>* std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_create_node<unsigned int>(unsigned int&&) /usr/include/c++/10/bits/stl_tree.h:634
    #5 0x55863ca50eaf in std::_Rb_tree_node<unsigned int>* std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_Alloc_node::operator()<unsigned int>(unsigned int&&) const /usr/include/c++/10/bits/stl_tree.h:552
    #6 0x55863ca504af in std::_Rb_tree_iterator<unsigned int> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_<unsigned int, std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_Alloc_node>(std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, unsigned int&&, std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_Alloc_node&) /usr/include/c++/10/bits/stl_tree.h:1816
    #7 0x55863ca4f5ca in std::pair<std::_Rb_tree_iterator<unsigned int>, bool> std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_insert_unique<unsigned int>(unsigned int&&) /usr/include/c++/10/bits/stl_tree.h:2159
    #8 0x55863ca4decf in std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> >::insert(unsigned int&&) /usr/include/c++/10/bits/stl_set.h:521
    #9 0x55863d8628ec in AI::DefaultAI::update_buildable_field(AI::BuildableField&) ../src/ai/defaultai.cc:1829
    #10 0x55863d85ca5a in AI::DefaultAI::update_all_not_buildable_fields() ../src/ai/defaultai.cc:1447
    #11 0x55863d84b9c8 in AI::DefaultAI::think() ../src/ai/defaultai.cc:370
    #12 0x55863d83fe77 in AI::ComputerPlayer::thread_function() ../src/ai/computer_player.cc:52
    #13 0x55863d847895 in void std::__invoke_impl<void, void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>(std::__invoke_memfun_deref, void (AI::ComputerPlayer::*&&)(), AI::ComputerPlayer*&&) /usr/include/c++/10/bits/invoke.h:73
    #14 0x55863d8476fa in std::__invoke_result<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>::type std::__invoke<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*>(void (AI::ComputerPlayer::*&&)(), AI::ComputerPlayer*&&) /usr/include/c++/10/bits/invoke.h:95
    #15 0x55863d84765a in void std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/10/thread:264
    #16 0x55863d84760f in std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> >::operator()() /usr/include/c++/10/thread:271
    #17 0x55863d8475ef in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> > >::_M_run() /usr/include/c++/10/thread:215
    #18 0x7fdc7b40e693  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xda693)

Thread T25 created by T0 here:
    #0 0x7fdc7d7f66d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x7fdc7b40e969 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xda969)
    #2 0x55863d83fd05 in AI::ComputerPlayer::start_thread() ../src/ai/computer_player.cc:47
    #3 0x55863c7cac9c in GameHost::init_computer_player(unsigned char) ../src/network/gamehost.cc:454
    #4 0x55863c7cb6ca in GameHost::init_computer_players() ../src/network/gamehost.cc:489
    #5 0x55863c7cd6b3 in GameHost::run_callback() ../src/network/gamehost.cc:594
    #6 0x55863c7cb719 in operator() ../src/network/gamehost.cc:499
    #7 0x55863c7ebf43 in __invoke_impl<void, GameHost::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60
    #8 0x55863c7eb834 in __invoke_r<void, GameHost::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:153
    #9 0x55863c7eb301 in _M_invoke /usr/include/c++/10/bits/std_function.h:291
    #10 0x55863c296571 in std::function<void ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622
    #11 0x55863ca0a312 in FsMenu::LaunchMPG::clicked_ok() ../src/ui_fsmenu/launch_mpg.cc:230
    #12 0x55863ca857c3 in FsMenu::TwoColumnsFullNavigationMenu::handle_key(bool, SDL_Keysym) ../src/ui_fsmenu/menu.cc:165
    #13 0x55863c984237 in UI::Panel::do_key(bool, SDL_Keysym) ../src/ui_basic/panel.cc:1401
    #14 0x55863c98415b in UI::Panel::do_key(bool, SDL_Keysym) ../src/ui_basic/panel.cc:1393
    #15 0x55863c98415b in UI::Panel::do_key(bool, SDL_Keysym) ../src/ui_basic/panel.cc:1393
    #16 0x55863c9858fd in UI::Panel::ui_key(bool, SDL_Keysym) ../src/ui_basic/panel.cc:1633
    #17 0x55863c1b81c9 in WLApplication::handle_input(InputCallback const*) ../src/wlapplication.cc:914
    #18 0x55863c97b140 in UI::Panel::do_run() ../src/ui_basic/panel.cc:399
    #19 0x55863c1ddcc9 in int UI::Panel::run<int>() ../src/ui_basic/panel.h:120
    #20 0x55863c1b4ce9 in WLApplication::run() ../src/wlapplication.cc:786
    #21 0x55863c1a8118 in main ../src/main.cc:47
    #22 0x7fdc7b007564 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x28564)

Thread T27 created by T0 here:
    #0 0x7fdc7d7f66d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x7fdc7b40e969 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xda969)
    #2 0x55863d83fd05 in AI::ComputerPlayer::start_thread() ../src/ai/computer_player.cc:47
    #3 0x55863c7cac9c in GameHost::init_computer_player(unsigned char) ../src/network/gamehost.cc:454
    #4 0x55863c7cb6ca in GameHost::init_computer_players() ../src/network/gamehost.cc:489
    #5 0x55863c7cd6b3 in GameHost::run_callback() ../src/network/gamehost.cc:594
    #6 0x55863c7cb719 in operator() ../src/network/gamehost.cc:499
    #7 0x55863c7ebf43 in __invoke_impl<void, GameHost::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:60
    #8 0x55863c7eb834 in __invoke_r<void, GameHost::run()::<lambda()>&> /usr/include/c++/10/bits/invoke.h:153
    #9 0x55863c7eb301 in _M_invoke /usr/include/c++/10/bits/std_function.h:291
    #10 0x55863c296571 in std::function<void ()>::operator()() const /usr/include/c++/10/bits/std_function.h:622
    #11 0x55863ca0a312 in FsMenu::LaunchMPG::clicked_ok() ../src/ui_fsmenu/launch_mpg.cc:230
    #12 0x55863ca857c3 in FsMenu::TwoColumnsFullNavigationMenu::handle_key(bool, SDL_Keysym) ../src/ui_fsmenu/menu.cc:165
    #13 0x55863c984237 in UI::Panel::do_key(bool, SDL_Keysym) ../src/ui_basic/panel.cc:1401
    #14 0x55863c98415b in UI::Panel::do_key(bool, SDL_Keysym) ../src/ui_basic/panel.cc:1393
    #15 0x55863c98415b in UI::Panel::do_key(bool, SDL_Keysym) ../src/ui_basic/panel.cc:1393
    #16 0x55863c9858fd in UI::Panel::ui_key(bool, SDL_Keysym) ../src/ui_basic/panel.cc:1633
    #17 0x55863c1b81c9 in WLApplication::handle_input(InputCallback const*) ../src/wlapplication.cc:914
    #18 0x55863c97b140 in UI::Panel::do_run() ../src/ui_basic/panel.cc:399
    #19 0x55863c1ddcc9 in int UI::Panel::run<int>() ../src/ui_basic/panel.h:120
    #20 0x55863c1b4ce9 in WLApplication::run() ../src/wlapplication.cc:786
    #21 0x55863c1a8118 in main ../src/main.cc:47
    #22 0x7fdc7b007564 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x28564)

SUMMARY: AddressSanitizer: heap-use-after-free /usr/include/c++/10/bits/stl_tree.h:798 in std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_S_right(std::_Rb_tree_node_base*)
Shadow bytes around the buggy address:
  0x0c088083fdf0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c088083fe00: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
  0x0c088083fe10: fa fa 00 00 00 00 00 fa fa fa fd fd fd fd fd fa
  0x0c088083fe20: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c088083fe30: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
=>0x0c088083fe40: fa fa fd fd fd[fd]fd fa fa fa fd fd fd fd fd fa
  0x0c088083fe50: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c088083fe60: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c088083fe70: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c088083fe80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x0c088083fe90: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==13641==ABORTING

Edit: The AI also uses some local variables declared static for performance. All those will have to have the static keyword removed of course.

@hessenfarmer
Copy link
Contributor

ok thanks I will have a look.
What is promising is that a singleplayer game with only one AI runs without a crash for quite a while

@hessenfarmer
Copy link
Contributor

Ok I am in Progress in understanding the concept.
In a first run I will clear of the static variables then as the above crash seems to be exactly an issue from this problem.
I still have difficulties to determine which type of Mutex to use when I need one. (i.e. what is the definition of each enum entry in the mutex Id enum?)
I don't know what a OPtr is and what the difference to a normal pointer is.

@Noordfrees
Copy link
Member

The only mutex you should need for the AI is the kObjects mutex. The other ones are all used by various parts of the back-end, but the AI code uses front-end functions which already ought to take care of thread safety. E.g. the kCommands mutex guards against concurrent sending and processing of player commands, and the AI sends its commands via the send_... functions which use this mutex internally.
The kObjects mutex guards creation and deletion of map objects. Any new map object is first created and then initialized, and this is guarded by a kObjects lock to ensure that other threads will not attempt to access a map object before its initialization is complete. Whenever an object is removed, all the time from its cleanup until the actual deletion is also guarded by a lock to ensure other threads won't access an object that is in the process of being deleted. A few other major changes are guarded similarly.


A normal pointer just points to an address in memory at which an object is located. Such a pointer does not "know" if this memory is freed by another thread, so a pointer that is valid at some time may point to invalid memory later, resulting in a heap-use-after-free. OPtr is a custom pointer-like struct which only stores the serial number of a map object. It has a get() function which checks whether the pointed-to object still exists; if so, it returns a normal pointer to it, otherwise it returns nullptr.
An example:

void do_something(Building* b) {
  b->some_function();
  do_something_else_for_a_long_time();
  b->some_function();
}

This code above is problematic, because building b might be deleted by another thread while do_something_else_for_a_long_time is running. There are two methods to secure it:

You can use a mutex lock, if you need to be absolutely certain that b stays valid all the time:

void do_something(Building* b) {
  MutexLock m(MutexLock::ID::kObjects);
  b->some_function();
  do_something_else_for_a_long_time();
  b->some_function();
  // `m` goes out of scope here
}

This has the disadvantage that the lock stays in existance for a long time. The better solution is to use OPtr and get a fresh pointer whenever you need one. You still need to create a lock every time to ensure the building won't vanish while you're working with it, but the lock's scope only needs to be as large as the scope of the actual pointer:

void do_something(OPtr<Building> b) {
  {
    MutexLock m(MutexLock::ID::kObjects);
    if (Building* bld = b.get(game())) {
      bld->some_function();
    }
    // `m` goes out of scope here
  }

  // There is no active lock any more now
  do_something_else_for_a_long_time();

  {
    MutexLock m(MutexLock::ID::kObjects);
    if (Building* bld = b.get(game())) {
      bld->some_function();
    }
    // `m` goes out of scope here
  }
}

@hessenfarmer
Copy link
Contributor

thanks for the explanation.
short update is that with the static definitions of inline variable gone 2 AI run pretty well until they get in contact.
Now I need to find out how to use gdb properly on Windows Msys to debug the crashes. But I am making progress

@hessenfarmer
Copy link
Contributor

hessenfarmer commented Jul 30, 2021

ok i found out how to use gdb and get the following crash although it should be safeguarded by a Mutex. I don't understand this

Thread 15 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 10548.0x1610]
0x00007ff6dfd28ed0 in __gnu_cxx::__normal_iterator<Widelands::Warehouse* const*, std::vector<Widelands::Warehouse*, std::allocator<Widelands::Warehouse*> > >::__normal_iterator (this=0xf87e9feff8, __i=<error reading variable>)
    at C:/msys64/mingw64/include/c++/10.3.0/bits/stl_iterator.h:976
976           : _M_current(__i) { }
(gdb) backtrace
#0  0x00007ff6dfd28ed0 in __gnu_cxx::__normal_iterator<Widelands::Warehouse* const*, std::vector<Widelands::Warehouse*, std::allocator<Widelands::Warehouse*> > >::__normal_iterator (this=0xf87e9feff8, __i=<error reading variable>)
    at C:/msys64/mingw64/include/c++/10.3.0/bits/stl_iterator.h:976
#1  0x00007ff6dfdf5500 in std::vector<Widelands::Warehouse*, std::allocator<Widelands::Warehouse*> >::end (this=0x40)
    at C:/msys64/mingw64/include/c++/10.3.0/bits/stl_vector.h:839
#2  0x00007ff6dfdf5585 in std::vector<Widelands::Warehouse*, std::allocator<Widelands::Warehouse*> >::empty (this=0x40)
    at C:/msys64/mingw64/include/c++/10.3.0/bits/stl_vector.h:1008
#3  0x00007ff6df26f3e4 in AI::DefaultAI::update_buildable_field_military_aspects (this=0x25450b26f60, field=...)
    at D:/development/Widelands_Git/fork_europe/src/ai/defaultai.cc:1894
#4  0x00007ff6df29c884 in AI::DefaultAI::check_militarysites (
    this=0x25450b26f60, gametime=...)
    at D:/development/Widelands_Git/fork_europe/src/ai/defaultai_warfare.cc:956
#5  0x00007ff6df266088 in AI::DefaultAI::think (this=0x25450b26f60)
    at D:/development/Widelands_Git/fork_europe/src/ai/defaultai.cc:447
#6  0x00007ff6df26247e in AI::ComputerPlayer::thread_function (
    this=0x25450b26f60)
    at D:/development/Widelands_Git/fork_europe/src/ai/computer_player.cc:52
#7  0x00007ff6e015ad17 in std::__invoke_impl<void, void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> (
    __f=@0x254475feac0: (void (AI::ComputerPlayer::*)(AI::ComputerPlayer * const)) 0x7ff6df262452 <AI::ComputerPlayer::thread_function()>,
    __t=@0x254475feab8: 0x25450b26f60)
    at C:/msys64/mingw64/include/c++/10.3.0/bits/invoke.h:73
#8  0x00007ff6e01baeb9 in std::__invoke<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> (
    __fn=@0x254475feac0: (void (AI::ComputerPlayer::*)(AI::ComputerPlayer * const)) 0x7ff6df262452 <AI::ComputerPlayer::thread_function()>)
    at C:/msys64/mingw64/include/c++/10.3.0/bits/invoke.h:95
#9  0x00007ff6dffd1785 in std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> >::_M_invoke<0ull, 1ull> (this=0x254475feab8)
    at C:/msys64/mingw64/include/c++/10.3.0/thread:264
#10 0x00007ff6dffd17a7 in std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> >::operator() (this=0x254475feab8)
    at C:/msys64/mingw64/include/c++/10.3.0/thread:271
#11 0x00007ff6dffd113c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (AI::ComputerPlayer::*)(), AI::ComputerPlayer*> > >::_M_run (
    this=0x254475feab0) at C:/msys64/mingw64/include/c++/10.3.0/thread:215
#12 0x00007ff6e01dc821 in std::execute_native_thread_routine (
    __p=0x254475feab0)
    at ../../../../../gcc-10.3.0/libstdc++-v3/src/c++11/thread.cc:80
#13 0x00007ff6dfbc10a3 in pthread_create_wrapper ()
#14 0x00007ffb83e6af5a in msvcrt!_beginthreadex ()
   from C:\WINDOWS\System32\msvcrt.dll
#15 0x00007ffb83e6b02c in msvcrt!_endthreadex ()
   from C:\WINDOWS\System32\msvcrt.dll
#16 0x00007ffb82707034 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\System32\kernel32.dll
#17 0x00007ffb83fc2651 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#18 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@Noordfrees
Copy link
Member

At a quick glance you seem to create the mutex lock after upcasting base_immovable to Building, but there is no lock that ensures that the base_immovable stays valid during the time from the creation of the pointer (in map.find_immovables()) until this cast. Does moving the lock up before line 1834 help?

@hessenfarmer
Copy link
Contributor

Well Moving the mutex did work. I even realized this was more performant then having it further down. So sometimes a bigger focus of a mutex seems to be better.
Good news: I had an AI Battle on crater running for over 2 game hours without a crash. (needed to close it as both amazon AI were deadlocked.) So for the moment only the fields check seems to be not thread safe.
Bad news: creating this one Mutex seemed to be at most equally performant then the Implementation without threading the AI.
I will try if I can safeguard the one line with an if statement, to gain better performance.

@hessenfarmer
Copy link
Contributor

Well I got a full match on crater running until one AI won, without using any mutex and quite nice performance

@attet
Copy link
Contributor

attet commented May 11, 2022

I'm curious about the progress on this.

FWIW, I got a huge gamespeed performance boost (in a situation where max_fps was set to 99 and actual rendering was happening at around ~40fps; previously the game would barely keep up at just about 1-2x speed, now 10x was okay) by setting

constexpr uint32_t kMutexNormalLockInterval = 30;
to 1.

Quick look at profiling seems to indicate that the logic thread often takes 2 or 3 30ms attempts to obtain the contentious kObjects mutex, now it typically gets it in at most ~20ms.

@hessenfarmer
Copy link
Contributor

I'm curious about the progress on this.

FWIW, I got a huge gamespeed performance boost (in a situation where max_fps was set to 99 and actual rendering was happening at around ~40fps; previously the game would barely keep up at just about 1-2x speed, now 10x was okay) by setting

constexpr uint32_t kMutexNormalLockInterval = 30;

to 1.

Quick look at profiling seems to indicate that the logic thread often takes 2 or 3 30ms attempts to obtain the contentious kObjects mutex, now it typically gets it in at most ~20ms.

well, I gave up on the Ai multithreading for the moment being, I could not prevent it to crash, and mutexing the relevant code was slower then the current implementation.
But if you get such performance boost the mutex might work with such configuration. However, I don't understand why your change would speed up the logic thread, as the logic thread has a different locking intervall which is much longer.

@attet
Copy link
Contributor

attet commented May 11, 2022

well, I gave up on the Ai multithreading for the moment being, I could not prevent it to crash, and mutexing the relevant code was slower then the current implementation. But if you get such performance boost the mutex might work with such configuration. However, I don't understand why your change would speed up the logic thread, as the logic thread has a different locking intervall which is much longer.

The locking interval is defined per mutex, not per thread. i.e.

constexpr uint32_t kMutexLogicFrameLockInterval = 400;
gets used only when locking kLogicFrame, not for kObjects
kLogicFrame,
kObjects,
- most contention I'm seeing is from kObjects locked here:
MutexLock m(MutexLock::ID::kObjects);

@hillierdani
Copy link

Hi,
Thanks for all who work on this great game project. I tried to find a thread where the same error happened as it did to me (maybe wrong place to post). When "start expedition" is clicked, I get segfault. Log and screencast attached. Thanks for looking into this!
widelands start campaign bug.txt

widelands.campaign.ship.launch.bug.mp4

@Noordfrees
Copy link
Member

Noordfrees commented Jan 23, 2023

Hi, this is indeed the wrong place, the best place is to open a new issue with the bug report template at https://github.com/widelands/widelands/issues/new/choose
But never mind :) Which version of Widelands are you using? A similar crash was fixed earlier this month in #5720

@hillierdani
Copy link

Wow, thanks for hinting to #5720 . My version was end december 2022 (1.2-git daily build). I downloaded the daily build and thanks to the merged #5720 starting expedition does not cause segfault anymore. Thanks very much!

@Noordfrees Noordfrees removed this from the v1.2 milestone Jun 20, 2023
@Noordfrees Noordfrees added this to the v1.3 milestone Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory & performance Fix bottlenecks, memory leaks, ASan, ... multithreading Deadlocks, race conditions, mutex trouble, …
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants