Suggestion: Add support for SI and IEC binary number suffixes #427

jdfergason · 2016-08-20T02:00:21Z

I'm breaking this out from ticket #292 as I think it's a very useful feature that would be highly valued in scientific applications.

Proposal

For floating point and integer numbers allow SI and binary suffix modifiers. This suffix acts as a multiplier on the base value. The following table lists the supported suffixes.

Decimal		Binary
Suffix	Value	Suffix	Value
k	1000	Ki	1024
M	1000²	Mi	1024²
G	1000³	Gi	1024³
T	1000⁴	Ti	1024⁴
P	1000⁵	Pi	1024⁵
E	1000⁶	Ei	1024⁶
Z	1000⁷	Zi	1024⁷
Y	1000⁸	Yi	1024⁸

Examples

5k = 5000
10.3Mi = 10,800,332.8

Motivation

This would be very useful in scientific applications where this notation is common. However, it is also useful outside of the scientific community, for example when specifying the maximum disk space to allow.

The text was updated successfully, but these errors were encountered:

rmunn · 2016-10-05T02:28:34Z

If this suggestion is adopted, I would also suggest allowing uppercase K (in addition to lowercase k) for 1000. Although uppercase K is not an official SI unit, if uppercase K is not allowed then it will cause confusion. All the other suffixes follow the pattern "remove the lowercase i from the binary suffix and you get the decimal suffix". If Ki -> K is allowed, then that pattern holds at all times and there will be less user confusion in the long run.

rmunn · 2016-10-05T02:36:20Z

Also, the 10.3Mi = 10,800,332.8 example surprises me. I would expect 10.3Mi to produce an int, not a float. More generally, I have never come across a situation where I wanted to express a floating-point value using binary suffixes. The rule in my head is "If there's a binary suffix, it's an int". Therefore, I would expect 10.3Mi to be "the integer value closest to 10.3 * 1024²", or possibly "10.3 * 1024² rounded down to an int". (I don't know if round semantics or floor semantics would be least surprising to others.)

JeppeKlitgaard · 2021-04-22T19:55:11Z

I really like this as an addition to TOML, particularly since it would be very useful in memory/disk configuration examples.

I don't think these are made irrelevant by the scientific notation addition to TOML, particularly the IEC Binary Numbers.

The SI/IEC prefixes are useful in some circumstances where it is conventional to use SI/IEC labeling. Scientific notation is useful in cases where it is not conventional to use SI notation. This is particularly relevant for large numbers. In academic applications something like 7.3Y would be a very non-obvious way to describe a quantity, whereas 7.3e24 is much more appropriate.

For example:

disk_size = 512Mi
distance_to_datacentre = 10K
number_of_stars_in_the_milky_way = 2.5e11

One very big advantage of implementing these would be that configuration files would no longer have to choose an appropriate scaling to their values. For example, it is common to see configuration files using keys like mem_size_in_mib, or worse mem_size where the number is assumed to be in MiB.

In short, prefix notation allows numbers in config files to be actual numbers, not some scaled version of them in order to achieve reasonable human-writable values.

I would also suggest using K instead of (not in addition to) k. This might bother some SI-purists, but in my opinion would be a far more obvious implementation for users.

I think anyone messing around with configuration files would intuitively be able to understand the syntax without having to refer to TOML documentation.

eksortso · 2021-04-28T06:06:50Z

What we're doing with these units is not applying dimensions to numbers, but rather keeping them dimensionless and multiplying them. So @JeppeKlitgaard I'm inclined to agree with you that uppercase Ks should be valid but lowercase ks should not. It wouldn't be painful to allow both cases, but the choice of indicators emphasizes that these are just numbers, with no greater significance to them during parsing.

eksortso · 2021-04-28T14:20:30Z

With all due respect to @rmunn, there's a problem with choosing how to turn floats with binary units into integers. What would be most useful? Using trunc, floor, ceiling (which is what I'd personally expect if I was using 10.3Mi), or some rounding variant? Should we be deciding which of these to use?

Let's keep it simple. Integers with units will be integers, and floats with units will be floats. Let the application figure out how to turn 10.3Mi into an integer. It ought to be doing that anyway.

eksortso · 2021-04-28T19:11:25Z

There could be a minor conflict using E or Ei with some existing parsers. The letter E already indicates the exponent portion of a float.

For the sake of faster adoption, could we just start with everything from K/Ki up to P/Pi?

We could add the E, Z, and Y units later on if there continues to be a demand. But for now at least, floats with exponents would be preferable in real-life scenarios past the penta level, wouldn't they?

JeppeKlitgaard · 2021-04-28T21:27:51Z

I agree that the use-cases for exa- and above is limited/non-existent at the moment, though I think it would be better to do this addition in just one version of TOML. Parsers are anyway going to need to support the other suffices, so ensuring that exa works as expected likely wouldn't be much extra work. The pattern-matching for the exa suffix and the exponent shouldn't be overlapping either way, I would imagine.

A good TOML parser would currently also fail to parse something like some_key=1.2E.

It might make sense to make using E as the exponent considered bad form or even deprecated in favour of just e. This would not break existing configurations and would make it even more obvious whether it is the exa-suffix or exponent. In my opinion, e is anyway preferable to E since it has a different height to the decimal numbers, making the scientific notation number easier on the eyes. Something like this could be added to the docs:

key1 = 5e12  # Good
key2 = 5E12  # Bad since it is harder to read and also might be confused the exa- suffix
key3 = 5E    # 5 exa = 5*10^18

While exa and above might not see much use immediately, it feels as though they should be there. Doing this over two iterations would just add more pain.

pradyunsg · 2021-04-28T23:14:31Z

Beyond the ambiguity pointed out above, I'm not sure it is immediately obvious to me what 5M means. Or what 7Yi means.

I don't think I can write down the values for these unless I scroll up to the table in OP, which makes me think that is needs way more context than I'd want a reader to have in mind when working on a TOML document.

Yes, it'd be nice to have a good way to write 1048576 (that's 1024*1024) but that isn't something that can't be clarified today with a comment.

Overall, I'm not convinced that the additional context and nuance needed to understand the proposed syntax adds enough actual value to the authoring/reading experience to justify adding it.

eksortso · 2021-04-29T17:18:21Z

Only speaking personally, I'm familiar with K through T when used with a number in context, and see them used in news stories a lot. 7.9B people, $1.9T budget, 5K crowd capacity expandable to 18K, and so on. Any well-thought-out key name could provide the necessary context. The binary suffixes take a little getting used to if you've not seen them before, but as soon as you can make out the i, they're immediately clear, and more useful than multiplying powers of 1024, which is an ugliness that we could certainly afford to remove for administrators.

Could we have more comments from the scientific community about the utility of these suffixes, as opposed to using exponents with floats and using trains of _000s and such with integers?

And more comments from the tech community, who'd ostensibly benefit from Mis and Kis being adopted?

JeppeKlitgaard · 2021-05-03T12:16:31Z

I would agree with @eksortso that most people would be familiar with K and M certainly, but I would also expect people in general will know G and T. Notably, billion is not B, but G and differs from the prefixes commonly found in English news articles. People from either tech or science backgrounds could be expected to be able to deduce G and T though.

The ones above T should be included not because they are commonly used, but for completeness and future-proofing.

I personally don't like trains like _000 and I know that their use is generally discouraged within the scientific community, where scientific notation is used along with the conventional use of significant figures. It is likely that anyone within science would have a preference for the exponent notation in general, and SI suffix notation for certain use-cases (for example resistance, where 6.7M would be more conventional than 6.7e6.

I think this suggestion would mainly be targeted at tech, where the IEC suffices are clearly more readable than the alternative.

eksortso · 2021-05-03T15:24:08Z

I would agree with @eksortso that most people would be familiar with K and M certainly, but I would also expect people in general will know G and T. Notably, billion is not B, but G and differs from the prefixes commonly found in English news articles.

Thanks, @JeppeKlitgaard. That B slipped through my SI radar. Pity, because I could have totally used watts = 1.21G as another example!

People from either tech or science backgrounds could be expected to be able to deduce G and T though.

TOML is and ought to be language-agnostic. G makes more sense than B, which could be confused with an 8.

pradyunsg · 2022-03-04T17:44:03Z

Overall, I'm not convinced that the additional context and nuance needed to understand the proposed syntax adds enough actual value to the authoring/reading experience to justify adding it.

I'm gonna lean further into this and say that I don't think this is going to be beneficial overall.

That isn't to say that this would not be useful in some cases, I'm sure it would be. I also think that this can be confusing on certain other contexts and that outweights the usefulness here IMO.

Thanks for the discussion here folks, and for the patience! ^.^

a-teammate mentioned this issue Oct 14, 2017

Add milestone 1.0 #482

Closed

26 tasks

pradyunsg mentioned this issue Nov 23, 2017

Few ideas #292

Closed

forember mentioned this issue Jan 20, 2018

Feature request: Add a duration/timedelta type #514

Open

pradyunsg added the new-syntax label May 24, 2018

pradyunsg mentioned this issue Dec 2, 2018

Artihmetic expression as values #582

Closed

pradyunsg added the post-1.0 label May 25, 2019

yyny mentioned this issue Feb 17, 2020

Optionally supported implementation-defined values #707

Closed

pradyunsg removed the post-1.0 label Apr 17, 2021

pradyunsg closed this as completed Mar 4, 2022

eksortso mentioned this issue Jul 7, 2022

Add nicer syntax for file sizes #912

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Add support for SI and IEC binary number suffixes #427

Suggestion: Add support for SI and IEC binary number suffixes #427

jdfergason commented Aug 20, 2016 •

edited

Loading

rmunn commented Oct 5, 2016

rmunn commented Oct 5, 2016

JeppeKlitgaard commented Apr 22, 2021

eksortso commented Apr 28, 2021

eksortso commented Apr 28, 2021 •

edited

Loading

eksortso commented Apr 28, 2021

JeppeKlitgaard commented Apr 28, 2021 •

edited

Loading

pradyunsg commented Apr 28, 2021 •

edited

Loading

eksortso commented Apr 29, 2021

JeppeKlitgaard commented May 3, 2021

eksortso commented May 3, 2021

pradyunsg commented Mar 4, 2022

Suggestion: Add support for SI and IEC binary number suffixes #427

Suggestion: Add support for SI and IEC binary number suffixes #427

Comments

jdfergason commented Aug 20, 2016 • edited Loading

rmunn commented Oct 5, 2016

rmunn commented Oct 5, 2016

JeppeKlitgaard commented Apr 22, 2021

eksortso commented Apr 28, 2021

eksortso commented Apr 28, 2021 • edited Loading

eksortso commented Apr 28, 2021

JeppeKlitgaard commented Apr 28, 2021 • edited Loading

pradyunsg commented Apr 28, 2021 • edited Loading

eksortso commented Apr 29, 2021

JeppeKlitgaard commented May 3, 2021

eksortso commented May 3, 2021

pradyunsg commented Mar 4, 2022

jdfergason commented Aug 20, 2016 •

edited

Loading

eksortso commented Apr 28, 2021 •

edited

Loading

JeppeKlitgaard commented Apr 28, 2021 •

edited

Loading

pradyunsg commented Apr 28, 2021 •

edited

Loading