Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are leading zeros allowed in the exponent part of a float? #356

Closed
avakar opened this issue Oct 6, 2015 · 20 comments
Closed

Are leading zeros allowed in the exponent part of a float? #356

avakar opened this issue Oct 6, 2015 · 20 comments

Comments

@avakar
Copy link

avakar commented Oct 6, 2015

An issue have arisen in my parser: avakar/pytoml#9, and I find specs ambiguous about this. In particular, it says

An exponent part is an E (upper or lower case) followed by an integer part (which may be prefixed with a plus or minus sign).

Does the phrase integer part mean it's an integer as specified in the Integer section of the specs and thus disallows leading zeros? In other words, is the following a valid TOML file?

maximum_error = 4.85e-06

Note the leading zero in the exponent.

@ziotom78
Copy link
Contributor

ziotom78 commented Oct 6, 2015

I am the initial reporter of the bug avakar/pytoml#9. I would advocate the correctness of using leading zeroes, as this is common practice in a number of languages. (It allows nicely aligned numbers.)

Here are a few examples:

Python 2.7/3.4:

print(4.86e-6)
# Prints "4.86e-06"

Ruby 2.1:

puts(4.86e-6)
# Prints 4.86e-06

avakar pushed a commit to avakar/toml-test that referenced this issue Oct 8, 2015
@mojombo
Copy link
Member

mojombo commented Oct 31, 2015

Can someone provide a concrete example where allowing leading zeros is useful?

@ziotom78
Copy link
Contributor

Here is my case. @avakar's TOML parser refused leading zeros because it failed to parse some of the TOML files that a Python script of mine was producing automatically. It turned out that such files were the ones where a parameter turned so small that scientific notation was used for it. (As I said above, Python's print automatically puts a leading zero to the exponent if this has just one digit.)

@mojombo
Copy link
Member

mojombo commented Oct 31, 2015

@ziotom78 Oh, I see, I didn't realize that's what your example was pointing out. That's quite curious that Ruby and Python do that. Do you know what the rationale is behind that behavior?

@Hrxn
Copy link

Hrxn commented Nov 1, 2015

I'd call that bad design...

Reminds me of that old thing, the C strcmp function and the implications it had for sorting...

@ziotom78
Copy link
Contributor

ziotom78 commented Nov 2, 2015

I imagine that this way of formatting numbers might allow nicer alignment when printing numbers in columns, though I am not really sure. Python says nothing about this: https://docs.python.org/2/library/string.html#formatspec. (See this question on StackOverflow for some more interesting information: http://stackoverflow.com/questions/9910972/python-number-of-digits-in-exponent.)

Interesting enough, there are cases where this behaviour is intentional and documented, see how .NET does floating-point formatting (https://msdn.microsoft.com/en-us/library/dwhawy9k.aspx#EFormatString):

The exponent always consists of a plus or minus sign and a minimum of three digits. The exponent is padded with zeros to meet this minimum, if required.

@Hrxn
Copy link

Hrxn commented Nov 4, 2015

It's a mess...

https://en.wikipedia.org/wiki/Leading_zero#0_as_a_prefix

Decimal vs. octal, etc.

I think C#/VB.net etc. use three digits in the exponent, forcing the user to use custom string formats for 'correct' output..

AFAIK, strictly mathematically, using leading zeros is generally discouraged.

Programming languages have different conventions, obviously, so I don't know what an easy and elegant answer to the question at hand would be..

@ziotom78
Copy link
Contributor

ziotom78 commented Nov 5, 2015

I agree that C's way of indicating octal numbers by prefixing them with a zero is really confusing. However, in this case we are discussing the opportunity of allowing leading zeroes in exponents. AFAIK, in this case there is no ambiguity at all, as every language I know always interprets exponents as decimal numbers, regardless of the trailing zeros.

@mojombo
Copy link
Member

mojombo commented Jan 23, 2016

In this case I think internal consistency between "integer parts" of the spec outweighs the benefits of accepted the generated output of certain languages. You could argue that integer values should allow leading zeros, which would then propagate to all integer parts, but I just don't agree with allowing such cruft. TOML is designed for readability as a primary goal, and leading zeros are counter to that. I'll submit a clarification to the float language.

@D-Alex
Copy link

D-Alex commented May 9, 2018

I would argue that C99 is a pretty strong standard and going against it will produce a lot of headaches for a lot of people. One of the reason is that there is no easy way to manipulate the number of digits in the exponent for output functions like cout / printf ... making toml unsuitable to be output by many languages based on C.

"The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent."

A reconsideration would be welcome.

@gentlesystems
Copy link

"outweighs the benefits of accepted the generated output of certain languages." This is misleading, as, in fact we're talking most common languages. Exponents with leading zeroes has always been a standard in the computing world, and restricting them here reduces usability to no purpose. toml is a terrible place to create new notation standards.

@eksortso
Copy link
Contributor

I'm sure this comes up with naive computer-generated TOML configurations, and not as much with human-written configurations. My knee-jerk reaction, valid or not, is that the programs generating TOML ought to write human-readable exponents. But leading zeros in exponents, valid or not, are human-readable, especially the ones that printf and cout make.

This is a case in which consistency for the sake of the spec isn't worth the cost to users writing configs with the tools they've got on hand. Integer values shouldn't have leading zeros, but exponent values could.

The spec could be changed to read "An exponent part is an E (upper or lower case) followed by an integer part (which follows the same rules as integer values but may include up to two leading zeros)."

I chose the "up to two" part because printf may use one leading 0, and printf on Windows may use two. We could drop the "up to two" part, though allowing an arbitrary number of leading zeros could lead to abuse. But a C99-based program writing seventy million leading zeros on exponents is about as hard to craft as a C99-based program writing no leading zeros on exponents.

The ABNF could use the following in place of the current definition of exp, which I've tested on Instaparse with the current version of toml.abnf as so modified:

exp = "e" float-exp-part
float-exp-part = [ minus / plus ] zero-prefixable-int

And if we really don't want any more than two leading zeros, we could instead do this:

exp = "e" float-exp-part
float-exp-part = [ minus / plus ] float-exp-int
float-exp-int  = [ %x30 [ underscore ] [ %x30 [ underscore ] ] ] unsigned-dec-int

There's probably a better way to write this, but it works. The underscores are ugly but consistent with unsigned-dec-int.

And so I also ask @mojombo to reconsider.

@D-Alex
Copy link

D-Alex commented May 10, 2018

Actually, Microsoft had a function called _set_output_format to print the exponent as two digits which is now obsolete because they also follow C99 starting with Visual Studio 2015.

https://msdn.microsoft.com/en-us/library/bb531344(v=vs.140).aspx

Exponent formatting The %e and %E format specifiers format a floating point number as a decimal mantissa and exponent. The %g and %G format specifiers also format numbers in this form in some cases. In previous versions, the CRT would always generate strings with three-digit exponents. For example, printf("%e\n", 1.0) would print 1.000000e+000. This was incorrect: C requires that if the exponent is representable using only one or two digits, then only two digits are to be printed.

In Visual Studio 2005 a global conformance switch was added: _set_output_format. A program could call this function with the argument _TWO_DIGIT_EXPONENT, to enable conforming exponent printing. The default behavior has been changed to the standards-conforming exponent printing mode."

@uvtc
Copy link

uvtc commented Aug 20, 2019

Two observations:

  1. Since before computers, real physical constants used in computation generally had exponents with 2 digits max. For example, consider the tiny mass of an electron (9.11e-31 kg) and the huge mass of the Sun (1.99e30 kg). So, if your program displays or prints out a list of floats, they typically line up most nicely and are easiest to read when you have 3 spots for the exponent: a +/- sign, and two digits. (Note, most programming languages print out the plus sign too, for positive exponents.)

  2. With at least Lua, Python, and Haxe (probably many others, I haven't tried them), small numbers from 1e-5 to 1e-9 get printed "1e-05", "1e-06", ... "1e-09" --- with that extra zero. Note, numbers larger than 1e-5 (and up to around 1e13) most often are printed in decimal, since that's presumably considered more human-readable. So there's really not a huge range of numbers we're talking about that get that extra zero. These languages also accept as input the extra zero, and the plus sign too.

Anyhow, my point is, there is effectively zero ambiguity over what 1e-05 means; and for that matter, what 1e+05 means. They've been written that way since time immemorial, programming languages typically print them that way by default and read them that way as well, and even most non-graphing calculators display 2-digit zero-padded exponents. I'd be surprised if TOML didn't accept them that way. If I were using a toml file for a scientific program with floating point config values that I was getting from elsewhere (maybe output from another program), it would be a nuisance to have to edit them and remove the zeros and plus signs from the exponents.

@ziotom78
Copy link
Contributor

If I were using a toml file for a scientific program with floating point config values that I was getting from elsewhere (maybe output from another program), it would be a nuisance to have to edit them and remove the zeros and plus signs from the exponents.

This is indeed the main reason why I stopped using TOML in my scientific codes.

@ChristianSi
Copy link
Contributor

@uvtc is right, and considering their observations and @ziotom78's corroboration I would plead to re-open this issue and allow leading zeros. Even those who don't find them useful will supposedly find them harmless, and considering that others find them useful – or even essential – that's a clear case in favor of allowing them.

@pradyunsg
Copy link
Member

@mojombo ^

@mojombo
Copy link
Member

mojombo commented Aug 21, 2019

Ok, you've all made a compelling argument. I would love to see a PR adding this capability.

@mojombo mojombo reopened this Aug 21, 2019
@eksortso
Copy link
Contributor

Here goes nothing! I stuck to what I spelled out in my previous comment for any number of leading zeroes, and updated the changelog.

(And trimmed an unnecessary trailing whitespace elsewhere, whoops. Blame my trusty editor.)

@mojombo
Copy link
Member

mojombo commented Aug 22, 2019

Closed by #656.

@mojombo mojombo closed this as completed Aug 22, 2019
TOML 1.0 automation moved this from Critical Path to Done Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

10 participants