Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Add support for SI and IEC binary number suffixes #427

Closed
jdfergason opened this issue Aug 20, 2016 · 12 comments
Closed

Suggestion: Add support for SI and IEC binary number suffixes #427

jdfergason opened this issue Aug 20, 2016 · 12 comments

Comments

@jdfergason
Copy link
Contributor

jdfergason commented Aug 20, 2016

I'm breaking this out from ticket #292 as I think it's a very useful feature that would be highly valued in scientific applications.

Proposal

For floating point and integer numbers allow SI and binary suffix modifiers. This suffix acts as a multiplier on the base value. The following table lists the supported suffixes.

Decimal Binary
Suffix Value Suffix Value
k 1000 Ki 1024
M 10002 Mi 10242
G 10003 Gi 10243
T 10004 Ti 10244
P 10005 Pi 10245
E 10006 Ei 10246
Z 10007 Zi 10247
Y 10008 Yi 10248

Examples

5k = 5000
10.3Mi = 10,800,332.8

Motivation

This would be very useful in scientific applications where this notation is common. However, it is also useful outside of the scientific community, for example when specifying the maximum disk space to allow.

@rmunn
Copy link

rmunn commented Oct 5, 2016

If this suggestion is adopted, I would also suggest allowing uppercase K (in addition to lowercase k) for 1000. Although uppercase K is not an official SI unit, if uppercase K is not allowed then it will cause confusion. All the other suffixes follow the pattern "remove the lowercase i from the binary suffix and you get the decimal suffix". If Ki -> K is allowed, then that pattern holds at all times and there will be less user confusion in the long run.

@rmunn
Copy link

rmunn commented Oct 5, 2016

Also, the 10.3Mi = 10,800,332.8 example surprises me. I would expect 10.3Mi to produce an int, not a float. More generally, I have never come across a situation where I wanted to express a floating-point value using binary suffixes. The rule in my head is "If there's a binary suffix, it's an int". Therefore, I would expect 10.3Mi to be "the integer value closest to 10.3 * 10242", or possibly "10.3 * 10242 rounded down to an int". (I don't know if round semantics or floor semantics would be least surprising to others.)

@JeppeKlitgaard
Copy link

I really like this as an addition to TOML, particularly since it would be very useful in memory/disk configuration examples.

I don't think these are made irrelevant by the scientific notation addition to TOML, particularly the IEC Binary Numbers.

The SI/IEC prefixes are useful in some circumstances where it is conventional to use SI/IEC labeling. Scientific notation is useful in cases where it is not conventional to use SI notation. This is particularly relevant for large numbers. In academic applications something like 7.3Y would be a very non-obvious way to describe a quantity, whereas 7.3e24 is much more appropriate.

For example:

disk_size = 512Mi
distance_to_datacentre = 10K
number_of_stars_in_the_milky_way = 2.5e11

One very big advantage of implementing these would be that configuration files would no longer have to choose an appropriate scaling to their values. For example, it is common to see configuration files using keys like mem_size_in_mib, or worse mem_size where the number is assumed to be in MiB.

In short, prefix notation allows numbers in config files to be actual numbers, not some scaled version of them in order to achieve reasonable human-writable values.

I would also suggest using K instead of (not in addition to) k. This might bother some SI-purists, but in my opinion would be a far more obvious implementation for users.

I think anyone messing around with configuration files would intuitively be able to understand the syntax without having to refer to TOML documentation.

@eksortso
Copy link
Contributor

What we're doing with these units is not applying dimensions to numbers, but rather keeping them dimensionless and multiplying them. So @JeppeKlitgaard I'm inclined to agree with you that uppercase Ks should be valid but lowercase ks should not. It wouldn't be painful to allow both cases, but the choice of indicators emphasizes that these are just numbers, with no greater significance to them during parsing.

@eksortso
Copy link
Contributor

eksortso commented Apr 28, 2021

With all due respect to @rmunn, there's a problem with choosing how to turn floats with binary units into integers. What would be most useful? Using trunc, floor, ceiling (which is what I'd personally expect if I was using 10.3Mi), or some rounding variant? Should we be deciding which of these to use?

Let's keep it simple. Integers with units will be integers, and floats with units will be floats. Let the application figure out how to turn 10.3Mi into an integer. It ought to be doing that anyway.

@eksortso
Copy link
Contributor

There could be a minor conflict using E or Ei with some existing parsers. The letter E already indicates the exponent portion of a float.

For the sake of faster adoption, could we just start with everything from K/Ki up to P/Pi?

We could add the E, Z, and Y units later on if there continues to be a demand. But for now at least, floats with exponents would be preferable in real-life scenarios past the penta level, wouldn't they?

@JeppeKlitgaard
Copy link

JeppeKlitgaard commented Apr 28, 2021

I agree that the use-cases for exa- and above is limited/non-existent at the moment, though I think it would be better to do this addition in just one version of TOML. Parsers are anyway going to need to support the other suffices, so ensuring that exa works as expected likely wouldn't be much extra work. The pattern-matching for the exa suffix and the exponent shouldn't be overlapping either way, I would imagine.

A good TOML parser would currently also fail to parse something like some_key=1.2E.

It might make sense to make using E as the exponent considered bad form or even deprecated in favour of just e. This would not break existing configurations and would make it even more obvious whether it is the exa-suffix or exponent. In my opinion, e is anyway preferable to E since it has a different height to the decimal numbers, making the scientific notation number easier on the eyes. Something like this could be added to the docs:

key1 = 5e12  # Good
key2 = 5E12  # Bad since it is harder to read and also might be confused the exa- suffix
key3 = 5E    # 5 exa = 5*10^18

While exa and above might not see much use immediately, it feels as though they should be there. Doing this over two iterations would just add more pain.

@pradyunsg
Copy link
Member

pradyunsg commented Apr 28, 2021

Beyond the ambiguity pointed out above, I'm not sure it is immediately obvious to me what 5M means. Or what 7Yi means.

I don't think I can write down the values for these unless I scroll up to the table in OP, which makes me think that is needs way more context than I'd want a reader to have in mind when working on a TOML document.

Yes, it'd be nice to have a good way to write 1048576 (that's 1024*1024) but that isn't something that can't be clarified today with a comment.

Overall, I'm not convinced that the additional context and nuance needed to understand the proposed syntax adds enough actual value to the authoring/reading experience to justify adding it.

@eksortso
Copy link
Contributor

Only speaking personally, I'm familiar with K through T when used with a number in context, and see them used in news stories a lot. 7.9B people, $1.9T budget, 5K crowd capacity expandable to 18K, and so on. Any well-thought-out key name could provide the necessary context. The binary suffixes take a little getting used to if you've not seen them before, but as soon as you can make out the i, they're immediately clear, and more useful than multiplying powers of 1024, which is an ugliness that we could certainly afford to remove for administrators.

Could we have more comments from the scientific community about the utility of these suffixes, as opposed to using exponents with floats and using trains of _000s and such with integers?

And more comments from the tech community, who'd ostensibly benefit from Mis and Kis being adopted?

@JeppeKlitgaard
Copy link

I would agree with @eksortso that most people would be familiar with K and M certainly, but I would also expect people in general will know G and T. Notably, billion is not B, but G and differs from the prefixes commonly found in English news articles. People from either tech or science backgrounds could be expected to be able to deduce G and T though.

The ones above T should be included not because they are commonly used, but for completeness and future-proofing.

I personally don't like trains like _000 and I know that their use is generally discouraged within the scientific community, where scientific notation is used along with the conventional use of significant figures. It is likely that anyone within science would have a preference for the exponent notation in general, and SI suffix notation for certain use-cases (for example resistance, where 6.7M would be more conventional than 6.7e6.

I think this suggestion would mainly be targeted at tech, where the IEC suffices are clearly more readable than the alternative.

@eksortso
Copy link
Contributor

eksortso commented May 3, 2021

I would agree with @eksortso that most people would be familiar with K and M certainly, but I would also expect people in general will know G and T. Notably, billion is not B, but G and differs from the prefixes commonly found in English news articles.

Thanks, @JeppeKlitgaard. That B slipped through my SI radar. Pity, because I could have totally used watts = 1.21G as another example!

People from either tech or science backgrounds could be expected to be able to deduce G and T though.

TOML is and ought to be language-agnostic. G makes more sense than B, which could be confused with an 8.

@pradyunsg
Copy link
Member

Overall, I'm not convinced that the additional context and nuance needed to understand the proposed syntax adds enough actual value to the authoring/reading experience to justify adding it.

I'm gonna lean further into this and say that I don't think this is going to be beneficial overall.

That isn't to say that this would not be useful in some cases, I'm sure it would be. I also think that this can be confusing on certain other contexts and that outweights the usefulness here IMO.

Thanks for the discussion here folks, and for the patience! ^.^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants