Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ufonormalizer] use repr() to stringify float; don't round to 10 digits #26

Merged
merged 3 commits into from
Apr 7, 2016

Conversation

anthrotype
Copy link
Member

@miguelsousa
Copy link
Member

Seems like a good change to me, but I'd like to hear from @typesupply.
@anthrotype there are two tests failing. If I understood correctly, those need to be updated, yes?

@anthrotype
Copy link
Member Author

ops, sorry yes I'll do that.

@anthrotype
Copy link
Member Author

I'm no longer sure this is a good idea... Let's think about it a bit it more.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.008%) to 91.032% when pulling 463d4cc on daltonmaag:repr-float into cf0aeea on unified-font-object:master.

@anthrotype
Copy link
Member Author

ok, I just fixed the tests -- and convinced myself again that using repr(float) is a good idea.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.008%) to 91.032% when pulling 82cbf18 on daltonmaag:repr-float into cf0aeea on unified-font-object:master.

@typesupply
Copy link
Contributor

Hm. I'm torn on this. The use of repr just to ensure that Py2 and Py3 output an edge case looks a bit like a hack to me. What I mean is, when we're all on Py3 XXX years from now, it's not going to matter and someone updating the validator is going to have to pause to figure out why rep is in there. I'm not objecting, I'm just wondering if the added complexity is worth it.

As far as the 10 digit limitation goes, I did that on purpose. Not every environment is going to have the same level of precision. Besides, we're making fonts on a pretty coarse grid. The difference between 0.0000000001 and 0.00000000005 is extremely small in practice. You can crank it up to 17 if you want, but it's going to make UFO file sizes bigger for a tiny gain in precision.

@adrientetar
Copy link

^ I thought the same as the second paragraph here, FWIW.

@adrientetar
Copy link

Now, I don't disagree that the discrepancy between versions is embarrasing, too.

@anthrotype
Copy link
Member Author

I may have contributed to create a bit of confusion around this float strings issue. Let me try to clarify it again.

The problem I see in the ufonormalizer has not to do with the difference between python 2 and 3's str vs repr of floats. The xmlConvertFloat currently uses explicit string formatting with a fixed number of decimal digits ("%.10f" % value). This of course always produces identical results in both python 2 and 3.

That was indeed an issue in ufoLib, since the latter was using the str built-in to stringify floats, and on Python2 str only displays 12 significant digits, whereas repr works the same way across Python 2 and 3 (i.e. uses up to 17 significant digits, but prefers the shortest representation among equivalent ones).

The issue I'm trying to solve here, by replacing a fixed number of decimal digits with repr, is that of roundtripping float => str => float. I do agree with you that being fonts designed on a coarse grid the difference between 0.0000000001 and 0.00000000005 is negligible from a design point of view; and that UFO file size would get bigger if we increased the float strings precision.

However, I think it should be left to the user to decide whether and how much to round floats. I believe that there are situations when you don't want to have any rounding, e.g. when you want to ensure that conditions that hold before floats have been serialized to a .glif file will still hold by the time these float strings are converted back to float numbers upon reading the file.

The built-in repr uses the maximum number of "significant" digits (which are not the same thing as the number of digits after the decimal point) that are required to ensure that eval(repr(f)) == f -- while also choosing the shortest representation among the possible ways of representing a binary double-precision floating point as a decimal number.

The UFO spec (rightly) leaves the door open to float numbers of arbitrary precision. The choice of 10 decimal positions, while I agree it's sensible and reasonable, it's still arbitrary nonetheless.
I think it'd be preferable if the ufonormalizer did not do any rounding of floats, and simply ensured that they maintain their original value as they are written out as strings and read back in. Alternatively, the rounding could be made into an option of the ufonormalizer script, if desired.

@anthrotype
Copy link
Member Author

I think it'd be preferable if the ufonormalizer did not do any rounding of floats, and simply ensured that they maintain their original value as they are written out as strings and read back in.

I'd like to reiterate my point above, especially considering that a similar PR was merged in the reference ufoLib implementation, so there's now a discrepancy between the way the latter and the ufoNormalizer treat float strings: unified-font-object/ufoLib#20

@typesupply
Copy link
Contributor

I'd like to reiterate my point above, especially considering that a similar PR was merged in the reference ufoLib implementation, so there's now a discrepancy between the way the latter and the ufoNormalizer treat float strings: unified-font-object/ufoLib#20

ufoLib is not a normalizer. The normalization guidelines are not requirements. Editors can do whatever they want within the parameters of the UFO specification.

The point of the normalizer is to take output from two environments and ensure that they are consistent to a certain level of precision. Here's an example:

  • Designer A uses Editor 1 to create a component in glyph "A" with a transformation with an x scale of 0.1.
  • Designer A commits UFO to a repository.
  • Designer B pulls the UFO from the repository.
  • Designer B edits glyph "B" in Editor 2.
  • Designer B commits the change to the repository.

Unbeknownst to Designers A and B, Editor 2 writes over all GLIF files, regardless of if the represented glyph has changed or not, and the language that Editor 2 is written in outputs floats differently that the language used by Editor 1. So, instead of "0.1" the transformation is now "0.10000000000000001". The designers then have a noisy, pointless diff. Mutliply this by N number of components in N number of fonts and it's a lot of unnecessary noise. Insert normalization between the editors and the repository and these differences are abstracted away.

I completely understand your desire for roundtripping unlimited precision in the abstract. I'm a precision-to-the-max guy myself. However, I'm not sure that the increased precision for some edge cases outweighs the effect of increasing the float length for all UFOs. I know that repr will reduce to only significant digits, but that's still potentially a lot of digits to represent precision that won't ultimately be relevant. I could see the benefit of arbitrary precision being an argument when calling the normalizer, but I don't think it's a good idea to force this onto everyone all the time.

Perhaps precision could be specified like this:

normalizeUFO("path/to/my.ufo", floatPrecision=10) # the default
normalizeUFO("path/to/my.ufo", floatPrecision=1000) # mega precise!
normalizeUFO("path/to/my.ufo", floatPrecision=None) # use repr

This would need to be worked into the guts of the normalizer, but that shouldn't be too hard.

Just to reiterate my position on the overall outlook for the UFO: I'm significantly more concerned about keeping the main UFO workflow working smoothly for daily common use. If an edge case can be handled without causing unnecessary effects in common use cases, that's great. If not, I prefer opt-out arguments than can be used by designers seeking specific handling of unusual situations. To be honest, I'm seeing more and more edge case handling and invalid data ignoring being pushed into the workflow at the potential expense of most users and it's making me very nervous. I'd like to try to make these slopes a lot less slippery rather than more.

@anthrotype
Copy link
Member Author

Thank you, Tal. I understand your point. I'll add a floatPrecision option as you suggested and keep the current default of 10 digits.

About the "ufoLib is not a normalizer', I wonder.. wouldn't it be desirable if ufoLib and the ufoNormalizer produced the same output, so that when one uses ufoLib to write a UFO, one would not need to do any normalization?

Cosimo Lupo added 3 commits April 6, 2016 15:49
also, convert scientific-notation floats to fixed-point notation

See unified-font-object/ufoLib#19
the default is the same as the current (10 decimal digits).
If value is None, then no rounding occurs and repr() is used

I had to use a global variable as I couldn't find a nice way to propagate
the option down to xmlConvertFloat, as it's used all over the places...
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) to 91.601% when pulling b5798d1 on daltonmaag:repr-float into 58ab51f on unified-font-object:master.

@anthrotype
Copy link
Member Author

I just added a floatPrecision=10 option to normalizeUFO, but did not expose it to the command line script. I guess this is fine for now.

@typesupply
Copy link
Contributor

This looks good. Thanks!

About the "ufoLib is not a normalizer', I wonder.. wouldn't it be desirable if ufoLib and the ufoNormalizer produced the same output, so that when one uses ufoLib to write a UFO, one would not need to do any normalization?

Ideally, but we don't have control over the entire toolchain that ufoLib is built on. There are probably lots of things deep in the standard library's plistlib that vary from our desired normalization. We should try to keep as much as we can in ufoLib in sync, though.

@typesupply typesupply merged commit 1ac5c43 into unified-font-object:master Apr 7, 2016
@anthrotype anthrotype deleted the repr-float branch April 8, 2016 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants