Preserve empty strings #48

madig · 2018-05-17T21:40:58Z

Currently,

	<lib>
		<dict>
			<key>abc</key>
			<string></string>
		</dict>
	</lib>

is turned into

	<lib>
		<dict>
			<key>abc</key>
		</dict>
	</lib>

Add a test first.

anthrotype · 2018-05-17T22:01:49Z

Add a test first

🥇

madig · 2018-05-17T22:13:04Z

Hm. Maybe return xmlEscapeText(element.text) or "" instead?

anthrotype · 2018-05-17T22:16:00Z

src/ufonormalizer.py

@@ -1090,6 +1090,8 @@ def _convertPlistElementToObject(element):
            else:
                obj[key] = _convertPlistElementToObject(subElement)
    elif tag == "string":
+        if not element.text:
+            return ""
        return xmlEscapeText(element.text)


or maybe you could fix it in xmlEscapeText directly, ensuring that the latter never returns None, but always at least an empty string. Looking at the other places that function is used, it seems like the expectation is that it takes a string and returns some other modified string, not None.

A case for mypy!

Hm. Making xmlEscapeText always return a string makes another testcase fail

def xmlEscapeText(text): # type: (Optional[str]) -> str if text: text = text.replace("&", "&") text = text.replace("<", "<") text = text.replace(">", ">") return text return ""

writer = XMLWriter(declaration=None) writer.propertyListObject({None: ""}) self.assertEqual( writer.getText(), > '<dict>\n\t<key/>\n\t<string></string>\n</dict>') E AssertionError: '<dict>\n\t<key></key>\n\t<string></string>\n</dict>' != '<dict>\n\t<key/>\n\t<string></string>\n</dict>' E <dict> E - <key></key> E ? -- --- E + <key/> E <string></string> E </dict>

Relevant?

anthrotype · 2018-05-17T22:21:12Z

Maybe return xmlEscapeText(element.text) or ""

it depends where you want handle the case of element.text being None.
If that function's I/O is only meant to be strings, then it'd be the responsibility of the caller to do return xmlEscapeText(element.text or "")

madig · 2018-05-17T22:37:41Z

Hm, makes sense. My current fix can stand then?

madig · 2018-05-22T09:48:29Z

Anything missing or can this be merged?

anthrotype

lgtm

anthrotype · 2018-05-22T13:13:55Z

src/ufonormalizer.py

@@ -1090,6 +1090,8 @@ def _convertPlistElementToObject(element):
            else:
                obj[key] = _convertPlistElementToObject(subElement)
    elif tag == "string":
+        if not element.text:


actually, wait... shouldn't you check

if element.txt == "": return ""

instead of if not element.text?

The latter will be True also when element.text == None.

I just checked what lmxl does (to have a reference) and it seems to distinguish between an element that has text None or one that has an empty text string:

In [1]: from lxml import etree In [2]: e = etree.Element("a") In [3]: e.text In [4]: etree.tostring(e) Out[4]: b'<a/>' In [5]: e.text = "" In [6]: etree.tostring(e) Out[6]: b'<a></a>'

In this case, element.text == None.

Reading the file

<?xml version="1.0" encoding="UTF-8"?> <glyph name="A" format="2"> <advance width="753.0"/> <unicode hex="0041"/> <anchor x="377.0" y="0.0" name="bottom"/> <anchor x="678.0" y="10.0" name="ogonek"/> <anchor x="377.0" y="700.0" name="top"/> <outline> <contour> <point x="733.0" y="0.0" type="line"/> <point x="555.0" y="700.0" type="line"/> <point x="205.0" y="700.0" type="line"/> <point x="20.0" y="0.0" type="line"/> <point x="253.0" y="0.0" type="line"/> <point x="356.0" y="470.0" type="line"/> <point x="385.0" y="470.0" type="line"/> <point x="491.0" y="0.0" type="line"/> </contour> <contour> <point x="600.0" y="268.0" type="line"/> <point x="162.0" y="268.0" type="line"/> <point x="154.0" y="103.0" type="line"/> <point x="596.0" y="103.0" type="line"/> </contour> </outline> <lib> <dict> <key>com.schriftgestaltung.Glyphs.category</key> <string></string> <key>com.schriftgestaltung.Glyphs.lastChange</key> <string>2017/07/17 13:57:06</string> <key>com.schriftgestaltung.Glyphs.script</key> <string></string> <key>com.schriftgestaltung.Glyphs.subCategory</key> <string></string> </dict> </lib> </glyph>

with lxml.etree and writing it back out yields:

In [3]: e=etree.parse("..\..\Downloads\GlyphsUnitTestSans-Bold.ufo\glyphs\A_.glif") In [10]: print(etree.tostring(e.getroot()).decode()) <glyph name="A" format="2"> <advance width="753.0"/> <unicode hex="0041"/> <anchor x="377.0" y="0.0" name="bottom"/> <anchor x="678.0" y="10.0" name="ogonek"/> <anchor x="377.0" y="700.0" name="top"/> <outline> <contour> <point x="733.0" y="0.0" type="line"/> <point x="555.0" y="700.0" type="line"/> <point x="205.0" y="700.0" type="line"/> <point x="20.0" y="0.0" type="line"/> <point x="253.0" y="0.0" type="line"/> <point x="356.0" y="470.0" type="line"/> <point x="385.0" y="470.0" type="line"/> <point x="491.0" y="0.0" type="line"/> </contour> <contour> <point x="600.0" y="268.0" type="line"/> <point x="162.0" y="268.0" type="line"/> <point x="154.0" y="103.0" type="line"/> <point x="596.0" y="103.0" type="line"/> </contour> </outline> <lib> <dict> <key>com.schriftgestaltung.Glyphs.category</key> <string/> <key>com.schriftgestaltung.Glyphs.lastChange</key> <string>2017/07/17 13:57:06</string> <key>com.schriftgestaltung.Glyphs.script</key> <string/> <key>com.schriftgestaltung.Glyphs.subCategory</key> <string/> </dict> </lib> </glyph>

Which seems to match your finding. Hm.

if element.txt == "": return ""

Wouldn't change anything because

In [12]: ufonormalizer.xmlEscapeText("") Out[12]: ''

ok, it appeas that neither ElementTree nor lxml.etree distinguish between these two cases when parsing, they are always parsed as Element.text == None. Compare:

>>> import xml.etree.ElementTree as ET >>> assert ET.fromstring('<a></a>').text is None >>> assert ET.fromstring('<a/>').text is None >>> from lxml import etree >>> assert etree.fromstring('<a/>').text is None >>> assert etree.fromstring('<a></a>').text is None

In the case of ufonormalizer's XMLWriter, having a value being None however, means completely dropping the element, which produces an invalid plist, as Nikolaus noted.

So I think we are fine if we treat an element's text being None as an empty string, so we can write it back, like this PR is doing.

madig added 2 commits May 17, 2018 22:38

Add test data for preserving empty strings

9fb216f

Add empty string to test_normalizeGLIF_lib_defined data

a4487e4

Return empty string instead of None if lib key value is empty string.

bdd4cf7

madig changed the title ~~WIP: Preserve empty strings~~ Preserve empty strings May 17, 2018

anthrotype reviewed May 17, 2018

View reviewed changes

anthrotype approved these changes May 22, 2018

View reviewed changes

anthrotype reviewed May 22, 2018

View reviewed changes

anthrotype merged commit dd81e46 into unified-font-object:master May 22, 2018

madig mentioned this pull request May 22, 2018

lib value can be an empty string #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve empty strings #48

Preserve empty strings #48

madig commented May 17, 2018

anthrotype commented May 17, 2018

madig commented May 17, 2018

anthrotype May 17, 2018

madig May 17, 2018

madig May 17, 2018 •

edited

Loading

anthrotype commented May 17, 2018

madig commented May 17, 2018

madig commented May 22, 2018

anthrotype left a comment

anthrotype May 22, 2018 •

edited

Loading

madig May 22, 2018

madig May 22, 2018 •

edited

Loading

madig May 22, 2018

anthrotype May 22, 2018

Preserve empty strings #48

Preserve empty strings #48

Conversation

madig commented May 17, 2018

anthrotype commented May 17, 2018

madig commented May 17, 2018

anthrotype May 17, 2018

Choose a reason for hiding this comment

madig May 17, 2018

Choose a reason for hiding this comment

madig May 17, 2018 • edited Loading

Choose a reason for hiding this comment

anthrotype commented May 17, 2018

madig commented May 17, 2018

madig commented May 22, 2018

anthrotype left a comment

Choose a reason for hiding this comment

anthrotype May 22, 2018 • edited Loading

Choose a reason for hiding this comment

madig May 22, 2018

Choose a reason for hiding this comment

madig May 22, 2018 • edited Loading

Choose a reason for hiding this comment

madig May 22, 2018

Choose a reason for hiding this comment

anthrotype May 22, 2018

Choose a reason for hiding this comment

madig May 17, 2018 •

edited

Loading

anthrotype May 22, 2018 •

edited

Loading

madig May 22, 2018 •

edited

Loading