InDesign finalizations #87

rigaspapas · 2018-02-15T15:36:51Z

Checklist (for the reviewer)

Problem

Empty strings appeared in the editor (when only non-printable characters were included)
Spaces around the text where not preserved

Steps to reproduce

Use the ifood IDML file from this doc
Use the TrustYouBranding file from the same doc and the recent research results string

Solution

Ignore strings that don't contain letters, symbols or punctuation (as defined here)
Extend the regular expression to ignore wrapping spaces

kouk · 2018-02-16T10:41:51Z

openformats/formats/indesign.py

@@ -104,6 +104,9 @@ def _can_skip_content(self, string):
            return True
        except ValueError:
            pass
+        # Special content in BackingStory.xml
+        if u'\ufeff' == string:
+            return True


@rigaspapas why not just check for content after stripping all non-printable characters?
ref: https://mayart.de/download/Indesign-IDML/special-idml-chars.pdf

one idea would be to use the unicodedata module to ignore all character not in printable categories:
https://www.unicode.org/reports/tr44/#General_Category_Values

The reason I'm saying this is who's to say we don't find another file that e.g. has a u'\u200c' character?

@kouk what we want to achieve is to avoid exporting text that is not translatable. There is no standard way in Unicode characters to distinguish letters from symbols. Is there any?

You are right that any non-printable character can cause the same unwanted result. I considered ignoring one specific character, because I guess that's what InDesign insert in every document automatically (not a user input).

coveralls · 2018-02-16T14:27:21Z

Coverage increased (+0.02%) to 95.63% when pulling 6f70b68 on TX-9144-skip-special-content into 1374e95 on devel.

kouk

there's a bug here that needs fixing (see my comment below about the missing test case). Ignore the other comment (unless you like oneliners)

kouk · 2018-02-16T15:00:04Z

openformats/tests/formats/indesign/test_indesign.py

+            u'<?ACE 8?> <Br/>;',
+            u'\ufeff',
+            u' \ufeff ',
+            u' \ufeff 5',


here's another test case:

u'\ufeff<Br/>;'

;-)

You are right. We should first strip the special characters and then check for the translatable ones. Thanks!

kouk · 2018-02-16T15:18:49Z

openformats/formats/indesign.py

+        for letter in string:
+            char_type = unicodedata.category(letter)
+            if char_type[0] in acceptable:
+                return True
        return False


just for fun:

from six.moves import map return any(c[0] in ["L", "P", "S"] for c in map(unicodedata.category, string))

or even

from six.moves import map return any(map(["L", "P", "S"].__contains__, map(itemgetter(0), map(unicodedata.category, string)))

@kouk I really like the first approach, which is very functional, but it would require a new dependency. So, should we leave it as is?

sure, it's fine like it is. But these examples don't really require the new dependency, you could do from itertools import imap. It's just that this way it's compatible with both python2 and python3.

Use python's unicodedata library to identify printable characters and ignore strings that don't contain any.

kouk

LGTM

rigaspapas added the bug label Feb 15, 2018

rigaspapas requested a review from SofiaMargariti February 15, 2018 15:36

kouk reviewed Feb 16, 2018

View reviewed changes

rigaspapas force-pushed the TX-9144-skip-special-content branch from 9cf6aca to 9770ffb Compare February 16, 2018 14:24

rigaspapas changed the title ~~Ignore special character in BackingStory.xml~~ InDesign finalizations Feb 16, 2018

rigaspapas requested a review from kouk February 16, 2018 14:31

kouk suggested changes Feb 16, 2018

View reviewed changes

Rigas Papathanasopoulos added 2 commits February 16, 2018 18:12

Ignore non-printable strings

374e591

Use python's unicodedata library to identify printable characters and ignore strings that don't contain any.

Don't include spaces arround the text for InDesign

6f70b68

rigaspapas force-pushed the TX-9144-skip-special-content branch from 9770ffb to 6f70b68 Compare February 16, 2018 16:14

kouk approved these changes Feb 16, 2018

View reviewed changes

rigaspapas merged commit 7eb0e10 into devel Feb 16, 2018

rigaspapas deleted the TX-9144-skip-special-content branch February 20, 2018 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InDesign finalizations #87

InDesign finalizations #87

rigaspapas commented Feb 15, 2018 •

edited

kouk Feb 16, 2018

kouk Feb 16, 2018

rigaspapas Feb 16, 2018

coveralls commented Feb 16, 2018 •

edited

kouk left a comment

kouk Feb 16, 2018 •

edited

rigaspapas Feb 16, 2018

kouk Feb 16, 2018

rigaspapas Feb 16, 2018

kouk Feb 16, 2018

kouk left a comment

InDesign finalizations #87

InDesign finalizations #87

Conversation

rigaspapas commented Feb 15, 2018 • edited

Checklist (for the reviewer)

Problem

Steps to reproduce

Solution

kouk Feb 16, 2018

Choose a reason for hiding this comment

kouk Feb 16, 2018

Choose a reason for hiding this comment

rigaspapas Feb 16, 2018

Choose a reason for hiding this comment

coveralls commented Feb 16, 2018 • edited

kouk left a comment

Choose a reason for hiding this comment

kouk Feb 16, 2018 • edited

Choose a reason for hiding this comment

rigaspapas Feb 16, 2018

Choose a reason for hiding this comment

kouk Feb 16, 2018

Choose a reason for hiding this comment

rigaspapas Feb 16, 2018

Choose a reason for hiding this comment

kouk Feb 16, 2018

Choose a reason for hiding this comment

kouk left a comment

Choose a reason for hiding this comment

rigaspapas commented Feb 15, 2018 •

edited

coveralls commented Feb 16, 2018 •

edited

kouk Feb 16, 2018 •

edited