Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve empty strings #48

Merged
merged 3 commits into from
May 22, 2018
Merged

Preserve empty strings #48

merged 3 commits into from
May 22, 2018

Conversation

madig
Copy link
Contributor

@madig madig commented May 17, 2018

Currently,

	<lib>
		<dict>
			<key>abc</key>
			<string></string>
		</dict>
	</lib>

is turned into

	<lib>
		<dict>
			<key>abc</key>
		</dict>
	</lib>

Add a test first.

@anthrotype
Copy link
Member

Add a test first

🥇

@madig madig changed the title WIP: Preserve empty strings Preserve empty strings May 17, 2018
@madig
Copy link
Contributor Author

madig commented May 17, 2018

Hm. Maybe return xmlEscapeText(element.text) or "" instead?

@@ -1090,6 +1090,8 @@ def _convertPlistElementToObject(element):
else:
obj[key] = _convertPlistElementToObject(subElement)
elif tag == "string":
if not element.text:
return ""
return xmlEscapeText(element.text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe you could fix it in xmlEscapeText directly, ensuring that the latter never returns None, but always at least an empty string. Looking at the other places that function is used, it seems like the expectation is that it takes a string and returns some other modified string, not None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A case for mypy!

Copy link
Contributor Author

@madig madig May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Making xmlEscapeText always return a string makes another testcase fail

def xmlEscapeText(text):
    # type: (Optional[str]) -> str
    if text:
        text = text.replace("&", "&amp;")
        text = text.replace("<", "&lt;")
        text = text.replace(">", "&gt;")
        return text

    return ""
        writer = XMLWriter(declaration=None)
        writer.propertyListObject({None: ""})
        self.assertEqual(
            writer.getText(),
>           '<dict>\n\t<key/>\n\t<string></string>\n</dict>')
E       AssertionError: '<dict>\n\t<key></key>\n\t<string></string>\n</dict>' != '<dict>\n\t<key/>\n\t<string></string>\n</dict>'
E         <dict>
E       - 	<key></key>
E       ? 	    -- ---
E       + 	<key/>
E         	<string></string>
E         </dict>

Relevant?

@anthrotype
Copy link
Member

Maybe return xmlEscapeText(element.text) or ""

it depends where you want handle the case of element.text being None.
If that function's I/O is only meant to be strings, then it'd be the responsibility of the caller to do return xmlEscapeText(element.text or "")

@madig
Copy link
Contributor Author

madig commented May 17, 2018

Hm, makes sense. My current fix can stand then?

@madig
Copy link
Contributor Author

madig commented May 22, 2018

Anything missing or can this be merged?

Copy link
Member

@anthrotype anthrotype left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -1090,6 +1090,8 @@ def _convertPlistElementToObject(element):
else:
obj[key] = _convertPlistElementToObject(subElement)
elif tag == "string":
if not element.text:
Copy link
Member

@anthrotype anthrotype May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, wait... shouldn't you check

if element.txt == "":
    return ""

instead of if not element.text?

The latter will be True also when element.text == None.

I just checked what lmxl does (to have a reference) and it seems to distinguish between an element that has text None or one that has an empty text string:

In [1]: from lxml import etree

In [2]: e = etree.Element("a")

In [3]: e.text

In [4]: etree.tostring(e)
Out[4]: b'<a/>'

In [5]: e.text = ""

In [6]: etree.tostring(e)
Out[6]: b'<a></a>'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, element.text == None.

Copy link
Contributor Author

@madig madig May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the file

<?xml version="1.0" encoding="UTF-8"?>
<glyph name="A" format="2">
  <advance width="753.0"/>
  <unicode hex="0041"/>
  <anchor x="377.0" y="0.0" name="bottom"/>
  <anchor x="678.0" y="10.0" name="ogonek"/>
  <anchor x="377.0" y="700.0" name="top"/>
  <outline>
    <contour>
      <point x="733.0" y="0.0" type="line"/>
      <point x="555.0" y="700.0" type="line"/>
      <point x="205.0" y="700.0" type="line"/>
      <point x="20.0" y="0.0" type="line"/>
      <point x="253.0" y="0.0" type="line"/>
      <point x="356.0" y="470.0" type="line"/>
      <point x="385.0" y="470.0" type="line"/>
      <point x="491.0" y="0.0" type="line"/>
    </contour>
    <contour>
      <point x="600.0" y="268.0" type="line"/>
      <point x="162.0" y="268.0" type="line"/>
      <point x="154.0" y="103.0" type="line"/>
      <point x="596.0" y="103.0" type="line"/>
    </contour>
  </outline>
  <lib>
    <dict>
      <key>com.schriftgestaltung.Glyphs.category</key>
      <string></string>
      <key>com.schriftgestaltung.Glyphs.lastChange</key>
      <string>2017/07/17 13:57:06</string>
      <key>com.schriftgestaltung.Glyphs.script</key>
      <string></string>
      <key>com.schriftgestaltung.Glyphs.subCategory</key>
      <string></string>
    </dict>
  </lib>
</glyph>

with lxml.etree and writing it back out yields:

In [3]: e=etree.parse("..\..\Downloads\GlyphsUnitTestSans-Bold.ufo\glyphs\A_.glif")
In [10]: print(etree.tostring(e.getroot()).decode())
<glyph name="A" format="2">
  <advance width="753.0"/>
  <unicode hex="0041"/>
  <anchor x="377.0" y="0.0" name="bottom"/>
  <anchor x="678.0" y="10.0" name="ogonek"/>
  <anchor x="377.0" y="700.0" name="top"/>
  <outline>
    <contour>
      <point x="733.0" y="0.0" type="line"/>
      <point x="555.0" y="700.0" type="line"/>
      <point x="205.0" y="700.0" type="line"/>
      <point x="20.0" y="0.0" type="line"/>
      <point x="253.0" y="0.0" type="line"/>
      <point x="356.0" y="470.0" type="line"/>
      <point x="385.0" y="470.0" type="line"/>
      <point x="491.0" y="0.0" type="line"/>
    </contour>
    <contour>
      <point x="600.0" y="268.0" type="line"/>
      <point x="162.0" y="268.0" type="line"/>
      <point x="154.0" y="103.0" type="line"/>
      <point x="596.0" y="103.0" type="line"/>
    </contour>
  </outline>
  <lib>
    <dict>
      <key>com.schriftgestaltung.Glyphs.category</key>
      <string/>
      <key>com.schriftgestaltung.Glyphs.lastChange</key>
      <string>2017/07/17 13:57:06</string>
      <key>com.schriftgestaltung.Glyphs.script</key>
      <string/>
      <key>com.schriftgestaltung.Glyphs.subCategory</key>
      <string/>
    </dict>
  </lib>
</glyph>

Which seems to match your finding. Hm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if element.txt == "":
    return ""

Wouldn't change anything because

In [12]: ufonormalizer.xmlEscapeText("")
Out[12]: ''

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it appeas that neither ElementTree nor lxml.etree distinguish between these two cases when parsing, they are always parsed as Element.text == None. Compare:

>>> import xml.etree.ElementTree as ET
>>> assert ET.fromstring('<a></a>').text is None
>>> assert ET.fromstring('<a/>').text is None
>>> from lxml import etree
>>> assert etree.fromstring('<a/>').text is None
>>> assert etree.fromstring('<a></a>').text is None

In the case of ufonormalizer's XMLWriter, having a value being None however, means completely dropping the element, which produces an invalid plist, as Nikolaus noted.

So I think we are fine if we treat an element's text being None as an empty string, so we can write it back, like this PR is doing.

@anthrotype anthrotype merged commit dd81e46 into unified-font-object:master May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants