Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cssom] Serialize numbers using scientific notation? #8538

Open
andruud opened this issue Mar 7, 2023 · 11 comments
Open

[cssom] Serialize numbers using scientific notation? #8538

andruud opened this issue Mar 7, 2023 · 11 comments

Comments

@andruud
Copy link
Member

andruud commented Mar 7, 2023

Created from 1796eb4#r102754719.

The spec currently says that scientific notation is not used, but relevant people seem to think that we should in fact use that notation now (see link).

If so, we need to specify when sci-not is and isn't used, and how.

@zcorpan @tabatkins @emilio

@andruud andruud added the cssom-1 label Mar 7, 2023
@tabatkins
Copy link
Member

tabatkins commented Mar 7, 2023

I think we should just do what JS does.

Edit: If I'm understanding the algo correctly, the n is the number of integer digits the number contains. (If the number's magnitude is less than 1, it's negative, giving the number of zeros between the decimal point and the first non-zero digit.)

So JS prints without scinot if the number either has 21 or less digits of integer part, or has less than 6 leading zeros in the decimal part. If it's larger or smaller than that, it uses scinot. Testing in the console confirms this.

JS produces large string representations by default, while we intentionally capped the decimal precision of our string representations, but the thresholds match.

So I suggest we use scinot when either:

  • the number is large enough to require 22 or more digits in the integer part. (abs(x) >= 1e21)
  • the number is small enough to require 6 or more leading zeros in the decimal part. (abs(x) < 1e-6)

In either of these conditions, we format the number with a single non-zero digit in the integer part and up to 6 digits in the decimal part (less than 6 if they are trailing zeros), followed by the exponent part. (Omitting the decimal part entirely if it's all zeros, obvs.)

@zcorpan
Copy link
Member

zcorpan commented Mar 8, 2023

While at it, we should serialize Infinity and NaN: https://drafts.csswg.org/css-values/#calc-error-constants

The precision loss to 6 digits helps with hiding implementation details of the precision, but OTOH it might cause roundtrip degradation for e.g. pi and e: https://drafts.csswg.org/css-values/#calc-constants - should those values be serialized as keywords?

Since the keywords are only allowed in calc(), that would need to be supported in serialization also. From what I understand of css-values, <number> does not include calc(). I don't see "calc" mentioned in cssom. (Maybe this should be its own issue?)

@emilio
Copy link
Collaborator

emilio commented Mar 8, 2023

calc() serialization and simplification is specified in css-values, afaict.

@tabatkins
Copy link
Member

The <number> production includes anything that's a number, including all math functions whose type is <number>

@tabatkins
Copy link
Member

For the Agenda+, here's a first draft of the text to replace the <number> part of the serialization rules:

The return value of the following algorithm:

<div algorithm="serialize a number">
	1. Let |s| initially be the empty [=string=].
	
	2. If the absolute value of the component is less than 10<sup>21</sup> and greater than or equal to 10<sup>-6</sup>, 
			or equal to zero:
	
		* Serialize the integer part of the component as a base-10 number
			(omitting leading zeros)
			and append the result to |s|.
	
		* If the decimal part of the component,
			when truncated to 6 digits,
			is non-zero,
			append "." (U+002E FULL STOP) to |s|,
			then serialize the decimal part of the component as a base-10 decimal,
			truncating to 6 digits
			and omitting trailing zeros,
			and append the result to |s|.
		
		* Return |s|.
	
	3. Otherwise, serialize the result in scientific notation:
		
		* Let |power| be the integer power of 10 that,
			when multiplied with the component,
			produces a number with a single non-zero integer digit.
			
			Let |shifted component| be the result of multiplying the component by |power|.
		* Serialize |shifted component| as a <number>,
			and append the result to |s|.
		* Append "e" (U+0065 LATIN SMALL LETTER E) to |s|,
			then serialize |power| as a base-10 integer,
			omitting leading zeros,
			and append the result to |s|.
		* Return |s|.
</div>

Note: This algorithm matches the behavior of JavaScript in serializing numbers,
except that we additionally truncate the decimal portion to a maximum of 6 digits.
This somewhat avoids exposing the exact representation precision of numeric values,
as that can change between properties and between implementations.
It also avoids exposing minor differences in ordering of internal arithmetic operations,
which might produce very slightly different floating point values
which would serialize differently
despite acting identically in practice.

@cdoublev
Copy link
Collaborator

I am not sure that the plan is to check later if this change can be applied to <integer>, but I am referencing #6471 just in case.

@dbaron
Copy link
Member

dbaron commented Apr 26, 2023

For what it's worth, one principle of serialization that I think we documented somewhere (but I can't find it anywhere that's general, although it's documented in a more specific case for serializing CSS values) is that serializing should generally prefer serializing to the more backwards-compatible / older form when there are different serialization possibilities that have been part of CSS for different amounts of time. This preference is both (a) because older web content might expect that and (b) because serialized content might be sent to a different user agent with different capabilities. (Though (b) is probably less of an issue these days because of faster browser release cycles and faster uptake of those releases.)

Following this principle here would mean not using scientific notation, or at least mean limiting its use to cases where non-use is problematic (for example, because it breaks other serialization principles such as lack of dataloss during a round-trip through parsing and serialization... though I'm not sure that's the case here).

That said, if we agree that (b) is less important these days, then I think this principle just degrades to the caution that we have whenever making non-backwards-compatible changes.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed [cssom] Serialize numbers using scientific notation?, and agreed to the following:

  • RESOLVED: Accept proposal to match JS scinot serialization triggers, other than 6-digit decimal truncation rule
The full IRC log of that discussion <fantasai> TabAtkins: our rules for serializing numbers are reasonably well-defined
<fantasai> TabAtkins: But we have scientific notation now, which is widely implemented
<fantasai> TabAtkins: Browsers use it for serialization *sometimes*, inconsistent, depends on the property...
<fantasai> TabAtkins: So the proposal here is to formalize when we serialize as scinot
<fantasai> TabAtkins: My proposal is to match JS exactly, which means that you use scinot whenever either the number has 22 or more digits of integer value, or has 6 or more leading zeroes in its decimal portion and is zero integer
<fantasai> TabAtkins: only change from JS is that we continue to truncate to only 6 digits after decimal point, maximum
<fantasai> TabAtkins: this is required for compat
<fantasai> TabAtkins: and also it hides some differences between browsers/properties
<fantasai> TabAtkins: and it also hides some floating-point variances
<fantasai> TabAtkins: so there's spec text in the issue
<fantasai> TabAtkins: and that's it
<bkardell_> do we have numbers that big?
<fantasai> TabAtkins: Wrt interop, we're all over the place
<fantasai> TabAtkins: every property and every browser does an effectively random thing
<fantasai> TabAtkins: partially due to different levels of precision, e.g. width supports subpixel, but scale property supports ...
<fantasai> TabAtkins: e.g. Chrome start scinot at 0.0001
<fantasai> TabAtkins: So no interop, so match JS with wrinkle about 6 digits seems reasonable to do with minimal impact on authors
<fantasai> Rossen_: Any additional comments?
<fantasai> TabAtkins: dbaron's point was to bias towards older formats during serialization
<fantasai> ... that's part of why the bounds are so wide
<fantasai> ... Most numbers you will ever encounter in a stylesheet do not trigger scinot
<fantasai> TabAtkins: but outside those bounds, e.g. when serializing a transform matrix, need to serialize somehow
<fantasai> Rossen_: Pretty clear proposal, not seeing anyone rushing to the queue ...
<fantasai> ... any objections to the proposal?
<fantasai> RESOLVED: Accept proposal to match JS scinot serialization triggers, other than 6-digit decimal truncation rule

@bernhardf-ro
Copy link

While it is good to see that small numbers will be serialized with great precision (7 digits) there is a noticeable 'jump' in precision between "e-6" and "e-7".
For example 1.987654321e-7 or 0.0000001987654321 would be serialized as 1.987654e-7 (aka 0.0000001987654), but 1.987654321e-6 or 0.000001987654321 as 0.000001 (aka 1e-6), dropping the precision from 7 digits to 1.

Allowing "6 significant digits" instead of "truncating to 6 digits", for numbers between 0 and 1, would eliminate that 'jump'.
Besides making the whole behavior more consistent, and understandable for authors, it would make working with very small (but not excessively small, i.e. e-1 to e-6) numbers a lot safer.
This is the current behavior of Firefox and PDFreactor (except for scinot) and we cannot easily change that in PDFreactor, as we had use-cases where a strict 6 decimals limit would interfere with the results.

(A similar 'jump' happens when large numbers switch to scinot and limiting significant digits would also solve that. Again this is already the behavior of Firefox and PDFreactor. However this isn't really a significant issue.)

@tabatkins
Copy link
Member

Just to redocument the behavior between browsers currently, here's a testcase:

code
<!DOCTYPE html>
<div id=test></div>
<table id=results>
<thead><tr><th>JS Value<th>Specified<th>Computed</thead>
</table>
<script>
var test = document.querySelector("#test");
function addRow(...cells) {
 var tr = document.createElement("tr");
 for(var cell of cells) {
  var td = document.createElement("td");
  td.textContent = cell;
  tr.appendChild(td);
 }
 document.querySelector("#results").appendChild(tr);
}

for(var i = 0; i < 10; i++) {
 var value = .123456789 / Math.pow(10, i);
 test.style.opacity = value;
 addRow(value, test.style.opacity, getComputedStyle(test).opacity);
}
</script>
<style>
table {
 border-collapse: collapse;
}
td {
 border: thin solid silver;
 padding: 1px 5px;
}
</style>

The results are different between all three major engines:

  1. Blink: retains six significant figures (aka .000123457), switches to scinot at e-5.
  2. Gecko: retains six significant figures, switches to scinot at e-7.
  3. WebKit: retains six figures after the decimal point (aka .000123), never switched to scinot. (At e-7 it just prints as 0)

@tabatkins
Copy link
Member

Conclusion: yeah, we should indeed mandate six significant figures (matching Firefox, and mostly matching Blink), rather than six figures (kinda matching WebKit, but it never actually switches to scinot so it doesn't count).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants