Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using unit IDs that are no longer valid #6616

Merged
merged 1 commit into from Apr 11, 2022

Conversation

Wedge009
Copy link
Member

@Wedge009 Wedge009 commented Apr 8, 2022

Patch originally from @stevecotton. Aiming to merge before 1.16.3 is tagged (supposedly scheduled for 17th April 2022).

Resolves #6603 (crash with musl implementation of libc)

@Wedge009 Wedge009 added Backport A reminder of a bugfix that was added to master that needs to be duplicated on the stable branch. Engine General game engine issues that do not fit in any other category. labels Apr 8, 2022
@github-actions github-actions bot added the Lua API Issues with the Lua engine and API. label Apr 8, 2022
@Wedge009
Copy link
Member Author

Wedge009 commented Apr 8, 2022

Just trying to understand the issue - this is a nasty bug. Is it because erase_if_matches_id is accepting a reference to the ID, and so when that unit is removed the ID is no longer valid? Is it the case that musl's implementation is more 'correct' or strict such that the bug is exposed using that implementation, whereas GNU's is more forgiving?

I checked that these are the only two uses of erase_if_matches_id in the file and the only cases where the parameter is from a unit object. Other uses of the method use a std::string instance as the parameter.

Regarding the existing comment 'Should it use underlying ID instead?' - I'm not sure what it means by that. If it's referring to the ID string instead of the ID from the unit object, maybe that comment should be deleted as well?

@CelticMinstrel
Copy link
Member

I mentioned this on the issue as well, but I would personally prefer to instead alter the erase_if_matches_id function to take the ID string by value (probably with a comment to explain why, so someone doesn't change it back later without realizing).

Is it because erase_if_matches_id is accepting a reference to the ID, and so when that unit is removed the ID is no longer valid?

That appears to be the situation, yes.

@Wedge009
Copy link
Member Author

Wedge009 commented Apr 8, 2022

Oh, I overlooked that as I'm only just now starting to understand the issue.

void recall_list_manager::erase_if_matches_id(const std::string &unit_id)
{
recall_list_.erase(std::remove_if(recall_list_.begin(), recall_list_.end(),
[&unit_id](const unit_ptr & ptr) { return ptr->id() == unit_id; }),
recall_list_.end());
}

So just remove the & from the parameter (with explanatory comment)?

Edit: Oh wait, I see your comment from the issue report that you removed it from [&unit_id]. As it stands, it's hard for me to read and understand this syntax.

unit_ptr v = u->clone();
t.recall_list().erase_if_matches_id(u->id());
u = v;

What about here, where it looks like the intention was to use v->id() as the search parameter. If the function is changed to accept string as value, would it make the cloning and reassigning between u and v redundant?

Edit: The reason I went with steve's solution was because the reporter seemed to say that one worked for them. Since none of us appear to be using musl-based distribution, we must rely on them for testing.

@stevecotton
Copy link
Contributor

I mentioned this on the issue as well, but I would personally prefer to instead alter the erase_if_matches_id function to take the ID string by value (probably with a comment to explain why, so someone doesn't change it back later without realizing).

I'm wondering if that would still be undefined behavior - the copy in the callee is fine, but is the one in the caller still meant to be valid?

@CelticMinstrel
Copy link
Member

CelticMinstrel commented Apr 8, 2022

So just remove the & from the parameter (with explanatory comment)?

Edit: Oh wait, I see your comment from the issue report that you removed it from [&unit_id]. As it stands, it's hard for me to read and understand this syntax.

I believe removing it from either place should have the same effect. If you remove it from the parameter, I'd also remove const.

Oh, I overlooked that as I'm only just now starting to understand the issue.

void recall_list_manager::erase_if_matches_id(const std::string &unit_id)
{
recall_list_.erase(std::remove_if(recall_list_.begin(), recall_list_.end(),
[&unit_id](const unit_ptr & ptr) { return ptr->id() == unit_id; }),
recall_list_.end());
}

So just remove the & from the parameter (with explanatory comment)?

Edit: Oh wait, I see your comment from the issue report that you removed it from [&unit_id]. As it stands, it's hard for me to read and understand this syntax.

unit_ptr v = u->clone();
t.recall_list().erase_if_matches_id(u->id());
u = v;

What about here, where it looks like the intention was to use v->id() as the search parameter. If the function is changed to accept string as value, would it make the cloning and reassigning between u and v redundant?

Edit: The reason I went with steve's solution was because the reporter seemed to say that one worked for them. Since none of us appear to be using musl-based distribution, we must rely on them for testing.

This is true – ideally we'd have them confirm that whatever fix we choose still works (perhaps ping the two of them here once we've settled on something).

What about here, where it looks like the intention was to use v->id() as the search parameter. If the function is changed to accept string as value, would it make the cloning and reassigning between u and v redundant?

It was already cloning the unit so I assume there is a reason to do so beyond the issue of the ID.

I'm wondering if that would still be undefined behavior - the copy in the callee is fine, but is the one in the caller still meant to be valid?

I'm not quite sure what you mean?

@Wedge009
Copy link
Member Author

Wedge009 commented Apr 8, 2022

This is true – ideally we'd have them confirm that whatever fix we choose still works (perhaps ping the two of them here once we've settled on something).

I made a request to test your option in the original issue report, I assume there hasn't been time to test yet as there's been no response thus far.

It was already cloning the unit so I assume there is a reason to do so beyond the issue of the ID.

That's my point - it isn't clear to me what the reason for cloning the unit is, other than possibly to avoid the pass-by-reference issue we're now aware of. But I suppose it is indeed better to leave things as they are lest we break something else. Perhaps I'll leave a comment if we end up choosing the option to change the erase_if_matches_id function rather than uses of the function.

@CelticMinstrel
Copy link
Member

it isn't clear to me what the reason for cloning the unit is, other than possibly to avoid the pass-by-reference issue we're now aware of.

Well, it's in extract_unit, the purpose of which is to remove a unit from the map while keeping a Lua-private reference to it. So if you assume removing it from the map destroys it, I think it makes sense that it's being cloned.

@Wedge009
Copy link
Member Author

Wedge009 commented Apr 9, 2022

Oh okay, when you explain it like that it makes more sense. I don't know if there's an established convention with u and v, but the single-letter variable names didn't really help me to see that and I didn't know that u was somehow still used after the function.

Still no word from the original reporter - maybe we'll have to miss the 1.16.3 tag if they don't respond in time?

@Wedge009
Copy link
Member Author

Wedge009 commented Apr 9, 2022

I'll update the PR either later today or tomorrow.

Resolves wesnoth#6603 (crash with musl implementation of libc)
Copy link
Contributor

@stevecotton stevecotton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learnt a bit of detail about lambdas - this syntax makes a single copy of unit_id when the lambda is created.

(The delay in reviewing was because I was worried it would make a copy each time the lambda is called, and thus still potentially access the invalid value.)

@stevecotton stevecotton merged commit aa6b5b4 into wesnoth:master Apr 11, 2022
@Wedge009 Wedge009 deleted the musl_patch branch April 12, 2022 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backport A reminder of a bugfix that was added to master that needs to be duplicated on the stable branch. Engine General game engine issues that do not fit in any other category. Lua API Issues with the Lua engine and API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wesnoth 1.16.1 crashes when loading campaign SotBE-Backhome
3 participants