Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Always serialize NaN without sign #640

Closed
wants to merge 1 commit into from

Conversation

dtolnay
Copy link
Contributor

@dtolnay dtolnay commented Oct 26, 2023

In all likelihood, the sign of NaNs is not meaningful in the user's program. rust-lang/rfcs#3514 (comment): "by and large nobody cares about the sign of a NaN". It would usually just be surprising and undesirable for -nan to appear in output serialized by this crate, when the sign of the NaN was not intentionally controlled by the caller, or may even be nondeterministic if it comes from arithmetic operations or a cast.

In toml-lang/toml#506 (comment) it sounds like the motivation for TOML supporting -nan was a Robustness Principle "be liberal in what you accept from others" without the intention that TOML emitters would ever produce it.

This crate will continue to deserialize -nan as a negative NaN as implemented by #637.

@epage
Copy link
Member

epage commented Oct 27, 2023

There are two layers here

  • toml_edit: a format preserving toml parser/generator
  • serde support for toml, via toml_edit::ser and toml::ser

I think I'd like to start by exercising caution and only doing this in the ser layers by changing toml_edit::ser::value. This will avoid generating toml files with non-deterministic signs when using serde while allowing someone using toml_edit to control exactly how the toml file looks.

Thoughts?

@sunfishcode
Copy link

It's already the case that users don't control exactly how toml files look, as toml_edit also doesn't appear to let users control things like 1.0 vs. 1.00 vs. 1.0e0 vs. +1.0. I propose also not providing control over nan vs. -nan, since TOML itself doesn't guarantee the bit is preserved.

@epage
Copy link
Member

epage commented Oct 27, 2023

TOML also doesn't guarantee table order but we want to preserve that an allow control over it. I'd like to, over time, offer more options for users to control how things are rendered. -nan is an easier case. Those are harder and will take more API design to figure out.

@sunfishcode
Copy link

serialize_f64 getting passed a NaN with the sign bit set doesn't indicate that the user wanted it to appear as "-nan" in the output. Negative-sign-bit NaNs can come from many places, and it differs between x86 and ARM, and between constexpr and runtime:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=bfbfc2ec293c30135d87e8505c6d17b9

So even if you want to give users the ability to control how a NaN is printed, I'd encourage you to do it in some other way than inferring it from the sign bit on serialize_f64. The sign bits in arbitrary f64 values will often come from nondeterministic sources, as in the example above.

And in that case, then the ability to serialize "-nan" wants a new API, and it seems prudent to wait until someone comes up with a real-world use case for it.

@epage
Copy link
Member

epage commented Oct 27, 2023

@sunfishcode my suggestion was that serialize_f64 would force all nans to be positive. dtolnay has already implemetned that and its released.

The area we still respect the sign bit is if you are directly using toml_edit::Document to create or edit a document. This API is mean to be lower level, offering more control over formatting.

@sunfishcode
Copy link

Currently to_f64_repr decides whether to print "nan" or "-nan" by examining the sign bit of the f64 passed in. It appears that's the ultimate place where Document::insert and similar things end up to convert user-supplied f64 values into strings.

The sign bits of user-supplied f64 values, however they're passed in, will often not reflect a conscious intent to print "-nan" instead of "nan". They will often be the result of nondeterminism, differing between x86 and ARM, as well as debug and release, so it isn't desirable to implicitly propagate this information.

@epage
Copy link
Member

epage commented Oct 27, 2023

Yes, to_f64_repr preserves the sign bit. That does not contradict or expand on what I said earlier. #643 made it so users of the serde API will have the sign bit coerced to positive.

The remaining question is what is correct for a low level API that is meant to control as much about the formatted output as is reasonable (weighted against lack of API designs, time availability, and performance constraints).

To put this into context, most developers using these packages will be relying on #643. toml has 3524 direct dependents while toml_edit has 259. I know at least some of toml_edits direct dependents use it like toml with very limited interactions with Document which would not be affected by this policy, either way.

@sunfishcode
Copy link

The sign bit of an f64 value passed in by a user doesn't convey intent to print "-nan". As in my example above, that bit can be spuriously set by the target ISA and/or compiler optimizations.

If you wish to give users control over how NaNs are formatted, I suggest adding an explicit API mechanism, rather than implicitly inferring it. That avoids unintentionally propagating nondeterminism. Such a mechanism would also be needed to support "+nan".

@epage
Copy link
Member

epage commented Oct 27, 2023

As nothing new is being said, I'm going to go ahead and close this for now. We can revisit this in the future if needed.

@epage epage closed this Oct 27, 2023
@dtolnay dtolnay deleted the unsignednan branch October 27, 2023 20:05
@dtolnay dtolnay restored the unsignednan branch October 27, 2023 20:05
@dtolnay dtolnay deleted the unsignednan branch October 27, 2023 20:05
@dtolnay dtolnay restored the unsignednan branch October 27, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants