[ufonormalizer] use repr() to stringify float; don't round to 10 digits #26

anthrotype · 2016-03-18T18:26:00Z

miguelsousa · 2016-03-18T19:05:31Z

Seems like a good change to me, but I'd like to hear from @typesupply.
@anthrotype there are two tests failing. If I understood correctly, those need to be updated, yes?

anthrotype · 2016-03-18T19:06:46Z

ops, sorry yes I'll do that.

anthrotype · 2016-03-18T20:06:38Z

I'm no longer sure this is a good idea... Let's think about it a bit it more.

coveralls · 2016-03-19T16:49:01Z

Coverage increased (+0.008%) to 91.032% when pulling 463d4cc on daltonmaag:repr-float into cf0aeea on unified-font-object:master.

anthrotype · 2016-03-19T16:51:49Z

ok, I just fixed the tests -- and convinced myself again that using repr(float) is a good idea.

coveralls · 2016-03-19T17:09:35Z

Coverage increased (+0.008%) to 91.032% when pulling 82cbf18 on daltonmaag:repr-float into cf0aeea on unified-font-object:master.

typesupply · 2016-03-20T15:41:49Z

Hm. I'm torn on this. The use of repr just to ensure that Py2 and Py3 output an edge case looks a bit like a hack to me. What I mean is, when we're all on Py3 XXX years from now, it's not going to matter and someone updating the validator is going to have to pause to figure out why rep is in there. I'm not objecting, I'm just wondering if the added complexity is worth it.

As far as the 10 digit limitation goes, I did that on purpose. Not every environment is going to have the same level of precision. Besides, we're making fonts on a pretty coarse grid. The difference between 0.0000000001 and 0.00000000005 is extremely small in practice. You can crank it up to 17 if you want, but it's going to make UFO file sizes bigger for a tiny gain in precision.

adrientetar · 2016-03-20T20:41:30Z

^ I thought the same as the second paragraph here, FWIW.

adrientetar · 2016-03-20T20:43:05Z

Now, I don't disagree that the discrepancy between versions is embarrasing, too.

anthrotype · 2016-03-21T10:00:40Z

I may have contributed to create a bit of confusion around this float strings issue. Let me try to clarify it again.

The problem I see in the ufonormalizer has not to do with the difference between python 2 and 3's str vs repr of floats. The xmlConvertFloat currently uses explicit string formatting with a fixed number of decimal digits ("%.10f" % value). This of course always produces identical results in both python 2 and 3.

That was indeed an issue in ufoLib, since the latter was using the str built-in to stringify floats, and on Python2 str only displays 12 significant digits, whereas repr works the same way across Python 2 and 3 (i.e. uses up to 17 significant digits, but prefers the shortest representation among equivalent ones).

The issue I'm trying to solve here, by replacing a fixed number of decimal digits with repr, is that of roundtripping float => str => float. I do agree with you that being fonts designed on a coarse grid the difference between 0.0000000001 and 0.00000000005 is negligible from a design point of view; and that UFO file size would get bigger if we increased the float strings precision.

However, I think it should be left to the user to decide whether and how much to round floats. I believe that there are situations when you don't want to have any rounding, e.g. when you want to ensure that conditions that hold before floats have been serialized to a .glif file will still hold by the time these float strings are converted back to float numbers upon reading the file.

The built-in repr uses the maximum number of "significant" digits (which are not the same thing as the number of digits after the decimal point) that are required to ensure that eval(repr(f)) == f -- while also choosing the shortest representation among the possible ways of representing a binary double-precision floating point as a decimal number.

The UFO spec (rightly) leaves the door open to float numbers of arbitrary precision. The choice of 10 decimal positions, while I agree it's sensible and reasonable, it's still arbitrary nonetheless.
I think it'd be preferable if the ufonormalizer did not do any rounding of floats, and simply ensured that they maintain their original value as they are written out as strings and read back in. Alternatively, the rounding could be made into an option of the ufonormalizer script, if desired.

anthrotype · 2016-04-06T08:32:08Z

I think it'd be preferable if the ufonormalizer did not do any rounding of floats, and simply ensured that they maintain their original value as they are written out as strings and read back in.

I'd like to reiterate my point above, especially considering that a similar PR was merged in the reference ufoLib implementation, so there's now a discrepancy between the way the latter and the ufoNormalizer treat float strings: unified-font-object/ufoLib#20

typesupply · 2016-04-06T13:56:03Z

I'd like to reiterate my point above, especially considering that a similar PR was merged in the reference ufoLib implementation, so there's now a discrepancy between the way the latter and the ufoNormalizer treat float strings: unified-font-object/ufoLib#20

ufoLib is not a normalizer. The normalization guidelines are not requirements. Editors can do whatever they want within the parameters of the UFO specification.

The point of the normalizer is to take output from two environments and ensure that they are consistent to a certain level of precision. Here's an example:

Designer A uses Editor 1 to create a component in glyph "A" with a transformation with an x scale of 0.1.
Designer A commits UFO to a repository.
Designer B pulls the UFO from the repository.
Designer B edits glyph "B" in Editor 2.
Designer B commits the change to the repository.

Unbeknownst to Designers A and B, Editor 2 writes over all GLIF files, regardless of if the represented glyph has changed or not, and the language that Editor 2 is written in outputs floats differently that the language used by Editor 1. So, instead of "0.1" the transformation is now "0.10000000000000001". The designers then have a noisy, pointless diff. Mutliply this by N number of components in N number of fonts and it's a lot of unnecessary noise. Insert normalization between the editors and the repository and these differences are abstracted away.

I completely understand your desire for roundtripping unlimited precision in the abstract. I'm a precision-to-the-max guy myself. However, I'm not sure that the increased precision for some edge cases outweighs the effect of increasing the float length for all UFOs. I know that repr will reduce to only significant digits, but that's still potentially a lot of digits to represent precision that won't ultimately be relevant. I could see the benefit of arbitrary precision being an argument when calling the normalizer, but I don't think it's a good idea to force this onto everyone all the time.

Perhaps precision could be specified like this:

normalizeUFO("path/to/my.ufo", floatPrecision=10) # the default
normalizeUFO("path/to/my.ufo", floatPrecision=1000) # mega precise!
normalizeUFO("path/to/my.ufo", floatPrecision=None) # use repr

This would need to be worked into the guts of the normalizer, but that shouldn't be too hard.

Just to reiterate my position on the overall outlook for the UFO: I'm significantly more concerned about keeping the main UFO workflow working smoothly for daily common use. If an edge case can be handled without causing unnecessary effects in common use cases, that's great. If not, I prefer opt-out arguments than can be used by designers seeking specific handling of unusual situations. To be honest, I'm seeing more and more edge case handling and invalid data ignoring being pushed into the workflow at the potential expense of most users and it's making me very nervous. I'd like to try to make these slopes a lot less slippery rather than more.

anthrotype · 2016-04-06T14:45:23Z

Thank you, Tal. I understand your point. I'll add a floatPrecision option as you suggested and keep the current default of 10 digits.

About the "ufoLib is not a normalizer', I wonder.. wouldn't it be desirable if ufoLib and the ufoNormalizer produced the same output, so that when one uses ufoLib to write a UFO, one would not need to do any normalization?

also, convert scientific-notation floats to fixed-point notation See unified-font-object/ufoLib#19

the default is the same as the current (10 decimal digits). If value is None, then no rounding occurs and repr() is used I had to use a global variable as I couldn't find a nice way to propagate the option down to xmlConvertFloat, as it's used all over the places...

coveralls · 2016-04-06T15:35:49Z

Coverage decreased (-0.04%) to 91.601% when pulling b5798d1 on daltonmaag:repr-float into 58ab51f on unified-font-object:master.

anthrotype · 2016-04-06T15:37:46Z

I just added a floatPrecision=10 option to normalizeUFO, but did not expose it to the command line script. I guess this is fine for now.

typesupply · 2016-04-07T17:47:51Z

This looks good. Thanks!

About the "ufoLib is not a normalizer', I wonder.. wouldn't it be desirable if ufoLib and the ufoNormalizer produced the same output, so that when one uses ufoLib to write a UFO, one would not need to do any normalization?

Ideally, but we don't have control over the entire toolchain that ufoLib is built on. There are probably lots of things deep in the standard library's plistlib that vary from our desired normalization. We should try to keep as much as we can in ufoLib in sync, though.

anthrotype force-pushed the repr-float branch from 463d4cc to 82cbf18 Compare March 19, 2016 17:08

Cosimo Lupo added 3 commits April 6, 2016 15:49

[ufonormalizer] use repr() to stringify float; don't round to 10 digits

7e2393a

also, convert scientific-notation floats to fixed-point notation See unified-font-object/ufoLib#19

[test_ufonormalizer] test xmlConvertFloat with custom 'floatPrecision'

b5798d1

anthrotype force-pushed the repr-float branch from 82cbf18 to b5798d1 Compare April 6, 2016 15:34

typesupply merged commit 1ac5c43 into unified-font-object:master Apr 7, 2016

anthrotype deleted the repr-float branch April 8, 2016 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ufonormalizer] use repr() to stringify float; don't round to 10 digits #26

[ufonormalizer] use repr() to stringify float; don't round to 10 digits #26

anthrotype commented Mar 18, 2016

miguelsousa commented Mar 18, 2016

anthrotype commented Mar 18, 2016

anthrotype commented Mar 18, 2016

coveralls commented Mar 19, 2016

anthrotype commented Mar 19, 2016

coveralls commented Mar 19, 2016

typesupply commented Mar 20, 2016

adrientetar commented Mar 20, 2016

adrientetar commented Mar 20, 2016

anthrotype commented Mar 21, 2016

anthrotype commented Apr 6, 2016

typesupply commented Apr 6, 2016

anthrotype commented Apr 6, 2016

coveralls commented Apr 6, 2016

anthrotype commented Apr 6, 2016

typesupply commented Apr 7, 2016

[ufonormalizer] use repr() to stringify float; don't round to 10 digits #26

[ufonormalizer] use repr() to stringify float; don't round to 10 digits #26

Conversation

anthrotype commented Mar 18, 2016

miguelsousa commented Mar 18, 2016

anthrotype commented Mar 18, 2016

anthrotype commented Mar 18, 2016

coveralls commented Mar 19, 2016

anthrotype commented Mar 19, 2016

coveralls commented Mar 19, 2016

typesupply commented Mar 20, 2016

adrientetar commented Mar 20, 2016

adrientetar commented Mar 20, 2016

anthrotype commented Mar 21, 2016

anthrotype commented Apr 6, 2016

typesupply commented Apr 6, 2016

anthrotype commented Apr 6, 2016

coveralls commented Apr 6, 2016

anthrotype commented Apr 6, 2016

typesupply commented Apr 7, 2016