Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to prevent multiple indentions of notes #72

Merged
merged 6 commits into from
Aug 31, 2020

Conversation

colinmford
Copy link
Contributor

@colinmford colinmford commented Jul 8, 2020

This should fix #71 if approved.

It uses regex to remove the first tab in each line of a multi-paragraph text block.

Test added to make sure

<note>
	Line1
	Line2
	Line3
</note>

does not change when it goes through _normalizeGlifNote into something like

<note>
	Line1
		Line2
		Line3
</note>

etc.

@colinmford
Copy link
Contributor Author

Update:

Instead of using regex to remove the first tab, instead the text is split on this regex:

r"[\n|\r|\r\n]\t*"

Should match cross-platform newlines, plus a tab character if it's present. Results in a nice clean list:

NEWLINE_RE = re.compile(r"[\n|\r|\r\n]\t*")
text = "List1\n\tList2\n\tList3"
NEWLINE_RE.split(text)
>>> ("List1", "List2", "List3")

It replaces a .splitlines() function, however, which technically matches all of these below. I thought it was unlikely, but if we need to support any or all of the additional characters we could add them to the regex.

Representation Description
\n Line Feed
\r Carriage Return
\r\n Carriage Return + Line Feed
\v or \x0b Line Tabulation
\f or \x0c Form Feed
\x1c File Separator
\x1d Group Separator
\x1e Record Separator
\x85 Next Line (C1 Control Code)
\u2028 Line Separator
\u2029 Paragraph Separator

@colinmford
Copy link
Contributor Author

colinmford commented Jul 8, 2020

Update based on @alerque suggestion:

Removes Regex and uses textwrap.dedent() to fix the multiple indention issue, but breaks this test

element = ET.fromstring(
"<note> Line1 \t\n\n Line3\t </note>")
writer = XMLWriter(declaration=None)
_normalizeGlifNote(element, writer)
self.assertEqual(
writer.getText(),
"<note>\n\tLine1\n\t\n\t Line3\n</note>")

@colinmford
Copy link
Contributor Author

colinmford commented Aug 6, 2020

I made some revisions, following the advice that @benkiel made here. Now all tests are passing.

I added a dedent_tabs method that works like textwrap.dedent, but it will only dedent tabs and 4-space indentions.
https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/src/ufonormalizer.py#L1367-L1416

This and a few extra lines of stripping allow all the tests to pass:
https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/src/ufonormalizer.py#L1179-L1182

Finally, added additional tests to test for the case @benkiel brought up and other related cases:
https://github.com/colinmford/ufoNormalizer/blob/440f6a53f1e30be597393f76967f8ac1e528da27/tests/test_ufonormalizer.py#L848-L884

@benkiel benkiel merged commit ece58bc into unified-font-object:master Aug 31, 2020
@moyogo
Copy link
Collaborator

moyogo commented Aug 31, 2020

Thanks @colinmford!

@colinmford
Copy link
Contributor Author

Thanks @benkiel and @moyogo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trouble normalizing the indention of new lines in Glyph Notes
3 participants