Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document recommendation for normalization #144

Open
benkiel opened this issue Aug 26, 2020 · 21 comments
Open

Document recommendation for normalization #144

benkiel opened this issue Aug 26, 2020 · 21 comments
Assignees
Labels
enhancement infrastructure Administrative and website issues. ufo3 UFO 3 issues.

Comments

@benkiel
Copy link
Contributor

benkiel commented Aug 26, 2020

As discussed, add a recommendation (not requirement) for XML formatting to the specification.

@benkiel benkiel added enhancement ufo3 UFO 3 issues. infrastructure Administrative and website issues. labels Aug 26, 2020
@typesupply
Copy link
Contributor

Did anyone volunteer to document what ufoNormalizer does?

@benkiel
Copy link
Contributor Author

benkiel commented Aug 31, 2020

Not to my knowledge. I'll take it on in a couple of weeks if no one else wants to.

@typesupply
Copy link
Contributor

I took a quick look at ufoNormalizer and I sort of remember how it works, so I'll take this on.

@typesupply typesupply self-assigned this Aug 31, 2020
@madig
Copy link
Contributor

madig commented Dec 14, 2020

Random finding: ufoLib and ufoNormalizer both indent the first <dict> element, macOS plist files don't.

Another: ufonormalizer likes the form <array/> instead of <array />.

Also: ufonormalizer does not contract <array></array> into <array/>.

@xorgy
Copy link

xorgy commented Apr 4, 2021

The indentation thing alone is worth a lot. A formatting recommendation would save us all from 90% whitespace diffs replacing almost every line in your font. In the wild I saw this today: https://github.com/eliheuer/baby-shark/pull/3/files

@belluzj
Copy link

belluzj commented Apr 12, 2021

There's this discussion going on over at the norad project (UFO library in Rust) linebender/norad#101

Having an official recommendation would help settle this sort of question.

@xorgy
Copy link

xorgy commented Apr 12, 2021

My 0.02USD: it's better for the recommendation to reflect the norms for authoring tools, than the norms for normalizers. plists in the wild have two-space indents starting at the second level of nesting; and the norm for XML formatting is two-space indents starting at the first level of nesting (default for libxml2 and probably some others). Far as I can tell, most mature authoring tools respect these norms.

@benkiel
Copy link
Contributor Author

benkiel commented Apr 12, 2021

Ignoring the last six years of the usage of ufonormalizer in workflows isn't the best idea either. A recommendation that breaks reasonable expectations for normalization based on history isn't great.

I looked into why tabs were chosen over spaces for the ufonormalizer: it was like that from the start six years ago, I'm not sure why that choice was made, but if I had to venture a guess it would be that ElementTree at that point used tabs and not spaces by default.

Personal opinion: it would be a great waste to spend effort on a recommendation if it is going to devolve into a tabs vs spaces war.

@typesupply
Copy link
Contributor

typesupply commented Apr 12, 2021

I looked into why tabs were chosen over spaces for the ufonormalizer: it was like that from the start six years ago, I'm not sure why that choice was made, but if I had to venture a guess it would be that ElementTree at that point used tabs and not spaces by default.

I made that decision and it was because… I don't know. Maybe it was ElementTree, maybe I was thinking about file size, maybe I had some other reason. It doesn't matter. It's the decision that was made and it is what is out there in real world use.

Personal opinion: it would be a great waste to spend effort on a recommendation if it is going to devolve into a tabs vs spaces war.

Yes x 100000000000000000000.

@belluzj
Copy link

belluzj commented Apr 12, 2021

I think that's why we need a recommendation, to avoid the space vs tabs war (or more accurately: to have it once and for all). As to which is the correct choice now, I would also vote for whatever ufonormalizer does because at Dalton Maag we're normalizing everything with ufonormalizer. However, if the community finds that spaces are better for some reason, then we'll do spaces, as long as all the tools align with the new decision.

@xorgy
Copy link

xorgy commented Apr 12, 2021

FDBP https://silnrsi.github.io/FDBP/en-US/UFO.html, for example, still recommends the psfnormalize format, which uses the default formats for plists and XML that I mentioned in my comment above.

@benkiel
Copy link
Contributor Author

benkiel commented Apr 12, 2021

Yes, as far as I can tell, psfnormalize came a bit after ufonormalizer. They have different ways of normalizing, neither is bad or wrong, just different. A recommendation is just that, a recommendation. A project is free to choose a normalizer that isn't the recommendation.

The UFO isn't going to ever mandate a certain style, as sometimes there are limits placed by different packages, programming languages, etc. on what is possible for XML formatting (again, we faced this when adding lxml as an option for ufoLib).

There can't be a recommendation that makes 100% of everyone happy, and that is ok.

@moyogo
Copy link
Collaborator

moyogo commented Apr 13, 2021

The UFO isn't going to ever mandate a certain style, as sometimes there are limits placed by different packages, programming languages, etc. on what is possible for XML formatting (again, we faced this when adding lxml as an option for ufoLib).

Note that lxml can use XSLT 1.0 to format XML whichever way one wants, it's just painful.

@xorgy
Copy link

xorgy commented Apr 13, 2021

A project is free to choose a normalizer that isn't the recommendation.

The ideal would be having normalized output coming directly out of tools. That seems to be one goal of Norad. Running external normalizers is not that inconvenient, and if they're installed on the target machine, you can include them in repository hooks... but it's not what happens by default when you save a UFO.

@typesupply
Copy link
Contributor

The ideal would be having normalized output coming directly out of tools.

Not all XML writers allow specifying the details of the output. We can write a recommendation, but we can't make it a requirement. As far as writing that recommendation goes, UFO is a self-funded project and we implement things when our time allows and/or our needs overlap. If someone wants to volunteer to write a recommendation as we've detailed, we'll be more than happy to review a PR.

Running external normalizers is not that inconvenient, and if they're installed on the target machine, you can include them in repository hooks... but it's not what happens by default when you save a UFO.

That's a workflow problem, right? If Project X is putting UFOs in a repository and they need the UFOs to share the same format to minimize their diffs, Project X should specify that at the project level and all contributors to Project X are required to use the same format. That's not the file format spec's problem to solve.

It also should be noted that noisy diffs when committing to a Git repository is not an issue faced by all UFO users.

@xorgy
Copy link

xorgy commented Apr 13, 2021

Note that lxml can use XSLT 1.0 to format XML whichever way one wants, it's just painful.

I wonder how tough it would be to control the whitespace in the XSLT, along with the ordering. If it works well, it only has to be painful once.

@benkiel
Copy link
Contributor Author

benkiel commented Apr 13, 2021

If anyone wants to take on making a XSLT to format the XML, we would love that.

@belluzj
Copy link

belluzj commented May 31, 2021

We open-sourced the XSLT that we've been using at Dalton Maag, here: https://github.com/daltonmaag/ufo-normalizer-xslt

It produces the same output as ufoNormalizer, at least when it comes to tabs and spaces and trivial things like that.

The main developer was @moyogo when he was working at Dalton Maag, credits to him!

@benkiel
Copy link
Contributor Author

benkiel commented Jun 3, 2021

@belluzj Thank you for doing this, will look into it in the coming weeks

@chrissimpkins
Copy link
Contributor

While at it, single vs. double quotes would be nice to resolve too. :)

@madig
Copy link
Contributor

madig commented Jul 23, 2021

I think the single quotes are an unfortunate historical accident in lxml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement infrastructure Administrative and website issues. ufo3 UFO 3 issues.
Projects
None yet
Development

No branches or pull requests

7 participants