-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve empty strings #48
Preserve empty strings #48
Conversation
🥇 |
Hm. Maybe |
@@ -1090,6 +1090,8 @@ def _convertPlistElementToObject(element): | |||
else: | |||
obj[key] = _convertPlistElementToObject(subElement) | |||
elif tag == "string": | |||
if not element.text: | |||
return "" | |||
return xmlEscapeText(element.text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe you could fix it in xmlEscapeText
directly, ensuring that the latter never returns None, but always at least an empty string. Looking at the other places that function is used, it seems like the expectation is that it takes a string and returns some other modified string, not None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A case for mypy!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Making xmlEscapeText always return a string makes another testcase fail
def xmlEscapeText(text):
# type: (Optional[str]) -> str
if text:
text = text.replace("&", "&")
text = text.replace("<", "<")
text = text.replace(">", ">")
return text
return ""
writer = XMLWriter(declaration=None)
writer.propertyListObject({None: ""})
self.assertEqual(
writer.getText(),
> '<dict>\n\t<key/>\n\t<string></string>\n</dict>')
E AssertionError: '<dict>\n\t<key></key>\n\t<string></string>\n</dict>' != '<dict>\n\t<key/>\n\t<string></string>\n</dict>'
E <dict>
E - <key></key>
E ? -- ---
E + <key/>
E <string></string>
E </dict>
Relevant?
it depends where you want handle the case of element.text being None. |
Hm, makes sense. My current fix can stand then? |
Anything missing or can this be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -1090,6 +1090,8 @@ def _convertPlistElementToObject(element): | |||
else: | |||
obj[key] = _convertPlistElementToObject(subElement) | |||
elif tag == "string": | |||
if not element.text: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, wait... shouldn't you check
if element.txt == "":
return ""
instead of if not element.text
?
The latter will be True also when element.text == None
.
I just checked what lmxl does (to have a reference) and it seems to distinguish between an element that has text None
or one that has an empty text string:
In [1]: from lxml import etree
In [2]: e = etree.Element("a")
In [3]: e.text
In [4]: etree.tostring(e)
Out[4]: b'<a/>'
In [5]: e.text = ""
In [6]: etree.tostring(e)
Out[6]: b'<a></a>'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, element.text == None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the file
<?xml version="1.0" encoding="UTF-8"?>
<glyph name="A" format="2">
<advance width="753.0"/>
<unicode hex="0041"/>
<anchor x="377.0" y="0.0" name="bottom"/>
<anchor x="678.0" y="10.0" name="ogonek"/>
<anchor x="377.0" y="700.0" name="top"/>
<outline>
<contour>
<point x="733.0" y="0.0" type="line"/>
<point x="555.0" y="700.0" type="line"/>
<point x="205.0" y="700.0" type="line"/>
<point x="20.0" y="0.0" type="line"/>
<point x="253.0" y="0.0" type="line"/>
<point x="356.0" y="470.0" type="line"/>
<point x="385.0" y="470.0" type="line"/>
<point x="491.0" y="0.0" type="line"/>
</contour>
<contour>
<point x="600.0" y="268.0" type="line"/>
<point x="162.0" y="268.0" type="line"/>
<point x="154.0" y="103.0" type="line"/>
<point x="596.0" y="103.0" type="line"/>
</contour>
</outline>
<lib>
<dict>
<key>com.schriftgestaltung.Glyphs.category</key>
<string></string>
<key>com.schriftgestaltung.Glyphs.lastChange</key>
<string>2017/07/17 13:57:06</string>
<key>com.schriftgestaltung.Glyphs.script</key>
<string></string>
<key>com.schriftgestaltung.Glyphs.subCategory</key>
<string></string>
</dict>
</lib>
</glyph>
with lxml.etree
and writing it back out yields:
In [3]: e=etree.parse("..\..\Downloads\GlyphsUnitTestSans-Bold.ufo\glyphs\A_.glif")
In [10]: print(etree.tostring(e.getroot()).decode())
<glyph name="A" format="2">
<advance width="753.0"/>
<unicode hex="0041"/>
<anchor x="377.0" y="0.0" name="bottom"/>
<anchor x="678.0" y="10.0" name="ogonek"/>
<anchor x="377.0" y="700.0" name="top"/>
<outline>
<contour>
<point x="733.0" y="0.0" type="line"/>
<point x="555.0" y="700.0" type="line"/>
<point x="205.0" y="700.0" type="line"/>
<point x="20.0" y="0.0" type="line"/>
<point x="253.0" y="0.0" type="line"/>
<point x="356.0" y="470.0" type="line"/>
<point x="385.0" y="470.0" type="line"/>
<point x="491.0" y="0.0" type="line"/>
</contour>
<contour>
<point x="600.0" y="268.0" type="line"/>
<point x="162.0" y="268.0" type="line"/>
<point x="154.0" y="103.0" type="line"/>
<point x="596.0" y="103.0" type="line"/>
</contour>
</outline>
<lib>
<dict>
<key>com.schriftgestaltung.Glyphs.category</key>
<string/>
<key>com.schriftgestaltung.Glyphs.lastChange</key>
<string>2017/07/17 13:57:06</string>
<key>com.schriftgestaltung.Glyphs.script</key>
<string/>
<key>com.schriftgestaltung.Glyphs.subCategory</key>
<string/>
</dict>
</lib>
</glyph>
Which seems to match your finding. Hm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if element.txt == "":
return ""
Wouldn't change anything because
In [12]: ufonormalizer.xmlEscapeText("")
Out[12]: ''
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it appeas that neither ElementTree nor lxml.etree distinguish between these two cases when parsing, they are always parsed as Element.text == None
. Compare:
>>> import xml.etree.ElementTree as ET
>>> assert ET.fromstring('<a></a>').text is None
>>> assert ET.fromstring('<a/>').text is None
>>> from lxml import etree
>>> assert etree.fromstring('<a/>').text is None
>>> assert etree.fromstring('<a></a>').text is None
In the case of ufonormalizer's XMLWriter, having a value being None however, means completely dropping the element, which produces an invalid plist, as Nikolaus noted.
So I think we are fine if we treat an element's text being None as an empty string, so we can write it back, like this PR is doing.
Currently,
is turned into
Add a test first.