-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom number parsing #984
base: main
Are you sure you want to change the base?
Conversation
from a quick glance there seems to be some duplication for some of the logic, with this kind of nuance it would be ideal to share it. Have you tried to consolidate it to a shared function or maybe that wasn't feasible? |
One thing I wasn't sure was whether |
A quick look at the source seems to suggest it is... https://github.com/wren-lang/wren/blob/main/src/vm/wren_value.c#L684 I haven't checked if every path goes via this one, but seems to be the case. Edit: It should be noted though that ObjString* can contain null bytes within it, so it's context sensitive whether it matters or not that it's always null terminated. If it's dealing with user strings then it's not safe to assume that. |
The other thing is where
I'll leave in the check for now at least until we can say for certain that it is null terminated. I guess I'm just a little too paranoid 😅 |
Unfortunately, parsing doubles is one of the hardest things in the universe 😄 Doing it correctly requires bigfloats, or at least bigints. Although there are shortcuts, they never cover all of the cases. As an example, the following number will throw an error (incorrectly) in your parser:
I do support not using C's |
We will be forced to do it to some extends. Considering all the features we want, it will be hard to find an implementation that has the correct licence, is correct and follow (or at minimum is adaptable to) all our requirements... |
Really fast float parsing: https://github.com/fastfloat/fast_float |
Well, if custom number parsing is so hard, why not just do #945 which everyone seems to think is a good idea and can be done with what we already have. Personally, I'd be inclined to just put #946 on the back-burner for now. This feature was introduced recently into Go (along with binary and octal literals) but I've rarely used it. Most of the time where I'd want to use separators, it's for decimal numbers which are exact powers of ten so it's easier to just write them in the form |
hmm my build seems to parse that number successfully. System.print(Num.fromString("100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"))
// 1e+269
System.print(100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
// 1e+269 results seem equivalent to browsers console. If I were to put 309 zeroes or more, then it will start spitting errors which is equivalent to how the compiler behaved when it was using |
I do have a new idea to make one function that both |
|
Missed what you do with the exponent. But you still have rounding problems. Try with For deeper dive into this, I suggest you to read How strtod() Works (and Sometimes Doesn't). |
Again, this behaves just like how the old parsing method did using One potential issue I may be noticing is I don't actually check if the exponent will overflow as yet when adding or incrementing but I am planning to consolidate the two areas I parse numbers to one function so can add that in the next commit.
For my previous PR I had to do some research as to how doubles work since I wasn't completely sure why my implementations were off. I did try creating an alternative to |
If you ignore overflows, try |
I'm not sure I follow what your trying to tell me. This does print The goal of this PR is not to surpass the original method of parsing by accuracy but by convenience, although I do attempt to try to be as accurate as the original method by using long doubles where data could potentially be lost. |
You need to print all of the digits (the problem is usually with the last binary one): try with |
Some changed behavior:
|
Added a custom number parser so we don't have to rely on
strtoll
orstrtod
This PR will add support for parsing octal, binary, and hexadecimal number systems as well as provide support for separating digits with underscores for better organization. Because of this I will probably close #945 and #946.
I think parsing the numbers inside the VM is better than re-implementing
strtod
andstrtoll
because we already validate the number so why not parse it at the same time? There are also some slight complexities when using strtoll and strtod that Wren just doesn't use so this way is a tad bit simpler as well. Lastly the parser only parses the values to a double unlike strtoll that parses to a 64-bit integer then converts to a double which would allow for more (albeit slightly less accurate) digits in a hex, oct, and bin literal.Now a couple of questions I have.
0x
is a valid in wren when compiling but is not valid when callingNum.fromString
. I decided to patch it so that the literal0x
is a compile time error to be more consistent.Num.fromString
, an invalid parse returns null instead of throwing an error or printing what it has so far. I get the argument that it is simpler to handle null than to handle error but I also noticed that a literal that is too large throws an error (see this test). I decided to copy this behavior but in my opinion I think we should either return null on all invalid parses if we do it for simplicity or return errors explaining why the parse went wrong as to be helpful.