Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WoF: fix Lua type error with cast to number #8773

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Jonathan-Kelly
Copy link
Contributor

Lua was getting this variable without being told it is a number rather than a string. On some systems it is interpreted as a string which breaks the monster spawner code and throws Lua error messages onto the player's screen.

This is a critical fix which needs a back port for 1.18.1.

Lua was getting this variable without being told it is a number rather than a string. On some systems it is interpreted as a string which breaks this monster spawner code and throws Lua error messages onto the player's screen.
@github-actions github-actions bot added the Campaign WoF The mainline Winds of Fate campaign label Apr 17, 2024
@CelticMinstrel
Copy link
Member

CelticMinstrel commented Apr 17, 2024

That's weird. Why would whether it's recognized as a number depend on the system? Does tonumber really fix it or is that going to return nil? (ie, was this tested on a system where the previous code breaks?)

@Jonathan-Kelly
Copy link
Contributor Author

Yeah, this was tested on the system that was having trouble with the previous code and it works fine with this change. I really have no idea why the specific system hardware or operating system would affect something like this. It is very strange, which is why I only now noticed it after upgrading to a new machine.

@Wedge009 Wedge009 added the Backport A reminder of a bugfix that was added to master that needs to be duplicated on the stable branch. label Apr 17, 2024
@CelticMinstrel
Copy link
Member

So, does that mean the error occurred in the official released build on one machine but not on the other? And the same platform? In other words, it can't be caused by a difference in how it was built.

@CelticMinstrel
Copy link
Member

So, does that mean the error occurred in the official released build on one machine but not on the other? And the same platform? In other words, it can't be caused by a difference in how it was built.

What exactly were the systems where it work and didn't work?

@Jonathan-Kelly
Copy link
Contributor Author

Works:
Windows 10 Pro Dell Optiplex 3010
Windows 10 Home Dell Precision M6700

Fails:
Windows 10 Home Dell Optiplex 3050

(so actually one of the working machines has the same OS as the not working one)


The full error and traceback message is:

error scripting/lua: [string " local t = ......"]:2: attempt to compare number with string
stack traceback:
	[string " local t = ......"]:2: in main chunk
	(...tail calls...)
	[C]: in function 'wml.eval_conditional'
	lua/wml-flow.lua:17: in local 'cmd'
	lua/wml-utils.lua:145: in field 'handle_event_commands'
	lua/wml-flow.lua:19: in local 'cmd'
	lua/wml-utils.lua:145: in field 'handle_event_commands'
	lua/wml-flow.lua:5: in function <lua/wml-flow.lua:4>
	[C]: in ?

@Jonathan-Kelly
Copy link
Contributor Author

In other words, it can't be caused by a difference in how it was built.

It occurs consistently on the official windows release of 1.18.0 and all the official windows releases of 1.17 back to 1.17.21 when this bit of Lua code was first added to the campaign.

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 18, 2024

Can you tell what the content of rate (before it's converted to a number) is when it's not a number? Would be interesting to know

@CelticMinstrel
Copy link
Member

Works: Windows 10 Pro Dell Optiplex 3010 Windows 10 Home Dell Precision M6700

Fails: Windows 10 Home Dell Optiplex 3050

I think that info is omitting the CPU model? I don't really know what Optiplex or Precision or those number mean.

@Jonathan-Kelly
Copy link
Contributor Author

Can you tell what the content of rate (before it's converted to a number) is when it's not a number? Would be interesting to know

Using std_print(type(t.rate)) it says the type is string and the values (before and after the type cast) are all the correct decimal number values - the same that were input.

I think that info is omitting the CPU model?

Works:
Intel Core i3-3240
Intel Core i7-3740QM

Fails:
Intel Core i5-7500T

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 18, 2024

wesnith behaving differently on different machines when it comes to basic wml stuff like this is a problem since it will not only lead to campaign not working on some machines but can also lead to OOS errors in multiplayer.

So we should understand and at least try to fix the underlying cause (which seems to be different string representations of numbers on different machines) , ideally before we merge this in my opinion (we can also merge this first and open a new issue, but it's be nice if you could do more tests if we need them since we probably cannot do them on our own)

Using std_print(type(t.rate)) it says the type is string and the values (before and after the type cast) are all the correct decimal number values - the same that were input.

My question was more about for which particular number does this issue happen.

@CelticMinstrel
Copy link
Member

CelticMinstrel commented Apr 18, 2024

Hmm, it was a slim chance, but I thought maybe one of the computers was using a non-Intel processor. Guess it's not likely that the processor is the cause then…

Is the computer that fails set to English locale?

@Jonathan-Kelly
Copy link
Contributor Author

Agreed, just so long as some fix goes into 1.18.1 to avoid exposing players to this bug.

My question was more about for which particular number does this issue happen.

Wow, looks like only the floating point numbers are coming up as strings in Lua.

Tested floats:
0.05
0.1
0.125
0.25
0.5
1.1
(all string)

Tested integers:
-1
0
1
2
(all number)

@CelticMinstrel
Copy link
Member

CelticMinstrel commented Apr 19, 2024

Okay, now do these come out as string too, or as number?

0,05
0,1
0,125
0,25
0,5
1,1

@Jonathan-Kelly
Copy link
Contributor Author

Those all come out as strings.

@CelticMinstrel
Copy link
Member

Alright, that probably rules out locale-dependent parsing, at least.

I'm not 100% certain, but I think this is probably the section of code that governs this behaviour.

char* eptr;
double d = strtod(v.c_str(), &eptr);
if(*eptr == '\0') {
// Possibly a number. See what type it should be stored in.
// (All conversions will be from the string since the largest integer
// type could have more precision than a double.)
if(d > 0.0) {
// The largest type for positive integers is unsigned long long.
unsigned long long ull = 0;
if(from_string_verify<unsigned long long>(v, ull)) {
return *this = ull;
}
} else {
// The largest (variant) type for negative integers is int.
int i = 0;
if(from_string_verify<int>(v, i)) {
return *this = i;
}
}
// This does not look like an integer, so it should be a double.
// However, make sure it can convert back to the same string (in
// case this is a string that just looks like a numeric value).
std::ostringstream tester;
tester << d;
if(tester.str() == v) {
value_ = d;
return *this;
}
}

There are a few ways it could fail, but my first guess would be the roundtrip-safety test at the bottom. For example, maybe the failing computer is for some reason padding it to 5 decimal places or something. (I don't have any idea why it might do such a thing without being asked to though.)

@Jonathan-Kelly
Copy link
Contributor Author

Is the computer that fails set to English locale?

Yes, it is set to English.

@CelticMinstrel
Copy link
Member

The round-trip safety check looks a bit primitive even if you do assume predictable formatting modes. For example, does 0.50 fail on the computer that works with all your other examples?

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

It would be interesting to see to what strings the numbers would be converted, maybe wml.tostring { a = 0.1 } could work to test it out.

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

The round-trip safety check looks a bit primitive even if you do assume predictable formatting modes. For example, does 0.50 fail on the computer that works with all your other examples?

The reason for these is that wml was supposed to guarantee that the string value of a attibute is not changed by some conversion. From the Lua perspective it might seem like wml is a format that supports different data types of attribute values, but actually it isn't, in wml all attributes are strings/tstrings and the fact config:: attribute_value stores some attributes as numbers,bools etc is from the wml perspective just an optimisation since they will most likely be used as such.

@Jonathan-Kelly
Copy link
Contributor Author

For example, does 0.50 fail on the computer that works with all your other examples?

0.5 works on it too. The value of 0.5 is properly stored as a number on the working machine.

It would be interesting to see to what strings the numbers would be converted, maybe wml.tostring { a = 0.1 } could work to test it out.

Do you mean something like this:

wml.variables["float"] = 0.1
wml.variables["integer"] = 3
local vconfig = wml.tovconfig({ float = "$float", integer = "$integer" })
std_print(wml.tostring(vconfig))

Which outputs:

float="0.1"
integer=3

@CelticMinstrel
Copy link
Member

0.5 works on it too.

I said 0.50, not 0.5.

Do you mean something like this:

I'm really not sure how you got something that complicated out of wml.to_string { a = 0.1 } when you could've just directly run the code example gfgtdf provided.

@Jonathan-Kelly
Copy link
Contributor Author

I said 0.50, not 0.5.

Ah, that does in fact break it on the working machine. It interprets a value of 0.50 as a string.

I'm really not sure how you got something that complicated out of wml.to_string { a = 0.1 } when you could've just directly run the code example gfgtdf provided.

Directly running it gives the error attempt to call a nil value (field 'to_string'). This code snippet seems to be missing the parentheses of a function call but even when I add those it still throws the same error.

I tried to use the example provided in the Lua API documentation for wml.tostring to create something similar.

@soliton-
Copy link
Member

It's wml.tostring. Just remove the underscore.

@Jonathan-Kelly
Copy link
Contributor Author

Ah, good catch. So then std_print( wml.tostring { a = 0.1 } ) prints out a=0.1

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

OK, thx for testing this. Just to be sure you tested this on the machine that breaks the code (that converts rate to a string) right ?

This is really odd because it means that "0.1" is the default string representation, this means again we don't know why that it

@Jonathan-Kelly
Copy link
Contributor Author

Just to be sure you tested this on the machine that breaks the code (that converts rate to a string) right ?

That is correct, yeah.

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

hmm i'm a bit out of ideas right now, usually they way toproceed now would probaböy to use a c++ debugger, i thinkthe config get converted to a vconfig multiple times dueing the wml_conditionals handler, maybe that changes something but its really hard to tell from here

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

Which outputs:

float="0.1"
integer=3

This is interesting tho, it seems like the float value gets converted to a string in this example but not in the std_print( wml.tostring { a = 0.1 } ) example.

Maybe we could try to investigate further here:

  1. what does std_print( wml.tostring( wml.tovconfig({ a = 0.1 }))) print?
  2. what does std_print( wml.tostring( wml.tovconfig({ a = 0.1 }).__literal)) print?

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

actualyl nevermind they will probably just pring the expected numbers, i was confused for a bit

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

wml.variables["float"] = 0.1
wml.variables["integer"] = 3
local vconfig = wml.tovconfig({ float = "$float", integer = "$integer" })
std_print(wml.tostring(vconfig))

Which outputs:

float="0.1"
integer=3

I mean this should print float=0.1 (without the ") on normal systems i thin, so at least we now have a simpler to reproduce example.

What is expected to happen is that:

  1. tostring converts it to a pased config, calling
    config vconfig::get_parsed_config() const
  2. Which in particular does the variable interpolation in
    result = utils::interpolate_variables_into_string(s, vars);
  3. Which calls
    config_attribute_value& config_attribute_value::operator=(const std::string& v)
    whihc is the code that celmin mentioned. Which should set the config attribute value to a double.

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

maybe strtod behaves differently from lexical_cast_default here? Again locales might be involved, maybe they only apply to one but not the other

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 19, 2024

Note, that we do have different reports based on that already, and it iirc was alwas y little coomplicated to reproduce #3945

@CelticMinstrel
Copy link
Member

CelticMinstrel commented Apr 19, 2024

maybe strtod behaves differently from lexical_cast_default here?

Wait… if strtod is expecting a decimal comma and finds a decimal point instead, what will it do then? Most likely, either stop (returning 0 instead of 0.1, 1 instead of 1.5, etc) or take it as a thousands separator (and I'm not even sure what that would return).

…but if that's what happening it would imply input and output are using a different locale, which would be very weird.

@Jonathan-Kelly
Copy link
Contributor Author

Is the computer that fails set to English locale?

Found the problem - it was set to English (Sweden). As a result of being set to a continental locale (albeit still an "English" one) Wesnoth was interpreting its own data with comma decimals. It was also experiencing the one pixel health and experience bars bug described here. Also after deleting the user date directory, maps (and scenarios) would no longer load (presumably because of how the map files were being interpreted).

Switching to any non-european English locale fixes all the issues. Switching back to Europe reproduces them.

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 25, 2024

hmm it seems like on windows the code doesntset the locale to "C" but to english here

setlocale(LC_ALL, "English");

i don't really know why it does that tho, also thecode seems to be really old.

EDIT: nevermind, didnt see that this is inthe tests/ folder

@stevecotton
Copy link
Contributor

Am I correct to assume that this PR should be rejected, and a bugfix for the engine used instead?

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 26, 2024

Am I correct to assume that this PR should be rejected, and a bugfix for the engine used instead?

Hmm no, the commit still makes sense. because even we fix the engine part and so that it works the same way on all machines, it would be nice it if supports all numbers not only those whose strign prepresentation survive the roundtrip. In any case i think the commit should also be in the next 1.18 release.

For example "+4" instead of "4" or "0.000001" instead of "1e-6".

We just didn't merge it yet becaue we wanted to investigate, and because having this bug in woudl make it easer to test the engine part once its implemented.

imo we can also merge it right now, but it would also be nice if we had a (wml) test for this, something like

-- test that the string "0.1" get stored as a float.
assert(wml.tostring({a = "0.1"}) == "a=0.1")

@CelticMinstrel
Copy link
Member

CelticMinstrel commented Apr 26, 2024

tostring always formats the output in complete lines, so it would actually be:

wml.tostring{a='0.1'} == 'a=0.1\n'

(Note: There's no meaning behind using single quotes instead of double, I just tend to prefer it because I don't need to press shift.)

Also, if it's a unit test, it would probably be using the unit_test module:

unit_test.assert_equal(wml.to_string{a='0.1'}, 'a=0.1\n')

@Jonathan-Kelly
Copy link
Contributor Author

the commit still makes sense. because even we fix the engine part and so that it works the same way on all machines, it would be nice it if supports all numbers not only those whose strign prepresentation survive the roundtrip.

So then (for future reference) should all numbers being sent from WML into Lua get the tonumber() type cast?

For example "+4" instead of "4" or "0.000001" instead of "1e-6".

Note that in this PR's context only a decimal number in the range [0.0, 1.0] is meaningful. This value is the chance that a particular creature will spawn on the map each turn. So supporting more complex numerical representations may be unnecessary here?

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 27, 2024

Note that in this PR's context only a decimal number in the range [0.0, 1.0] is meaningful. This value is the chance that a particular creature will spawn on the map each turn. So supporting more complex numerical representations may be unnecessary here?

i don't have a strong opinion on this particular line for 1.19 since its only used in this campaign. But at least in 1.18 it definitely makes sense, even if we backport the engine fix (which its not guaranteed to happen) it'd be nice to support older wesnoth versions.

So then (for future reference) should all numbers being sent from WML into Lua get the tonumber() type cast?

Not sure whether it makes sense for all, but espcially for wml tags that accept non-integer number parameters, its probably better. One should at least be aware that for example "0.000001" might not convert into a number.

@CelticMinstrel
Copy link
Member

One should at least be aware that for example "0.000001" might not convert into a number.

I think it should be considered a bug if this doesn't convert into a number though. Rather than using the roundtrip test to decide whether a value is a number, I think it would be better to convert anything that looks like a number but store metadata along with it that makes the roundtrip possible. For example, "+4" would convert to 4 and set the "force sign" bit, "0.000001" would convert to a number with the "engineering" bit unset, and "1.1e3" would convert to a number with the "engineering" bit set. Maybe "0x12ab" even converts to a number with the "hex" bit set. You'd also need to track number of decimal points and/or significant figures in order to handle non-conventional cases like "121.23e9" or "13.2e-12".

Mind you, while I think this would be better overall, it is also a lot of work and I'm not sure if it's truly worth that effort. If there exists a library that'll do all that work for us though, I'd say to use it without hesitation.

Another alternative would be to discard the round-trip requirement but keep track of whether the value had been quoted in the WML source. So, key="0.1" would be stored as a string but key=0.1 would be stored as a number, and in Lua {key='0.1'} would not automatically convert it to a number either.

@gfgtdf
Copy link
Contributor

gfgtdf commented Apr 27, 2024

I think it would be better to convert anything that looks like a number ...

A problem then is that the string value is still lost when converting the thing to a lua table and back.

it is also a lot of work

And easer implementation with the same result would be not to change attribute_value and instead add an extra check&conversion (whether the string is a number) when converting a wml object to a lua table (or rather attribute_value to a lua value).

Another alternative would be to discard the round-trip requirement but keep track of whether the value had been quoted in the WML source

I had rhe same idea, but that's probably a huge change the would break a lot of code

@CelticMinstrel
Copy link
Member

A problem then is that the string value is still lost when converting the thing to a lua table and back.

It's a solvable problem, probably, but yes, that's another point against the idea.

but that's probably a huge change the would break a lot of code

Right, if we were to do that we would need a major deprecation cycle.

@Jonathan-Kelly
Copy link
Contributor Author

After some more testing I can clarify something I reported earlier. Quoting myself:

after deleting the user date directory, maps (and scenarios) would no longer load

This particular bug only happens on 1.16 and not on 1.18. What specifically causes it is just the deletion of the line tile_size=72 in the preferences file from the user date directory (and only while Windows 10 is set to a decimal comma using region). Not sure if these details are at all helpful but at least we do not need to worry about this one particular issue happening in 1.18.

To be clear, both the "one pixel health and experience bars" bug and the campaign specific bug which this PR addresses are still problems in 1.18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backport A reminder of a bugfix that was added to master that needs to be duplicated on the stable branch. Campaign WoF The mainline Winds of Fate campaign
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants