Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InDesign finalizations #87

Merged
merged 2 commits into from Feb 16, 2018
Merged

InDesign finalizations #87

merged 2 commits into from Feb 16, 2018

Conversation

rigaspapas
Copy link
Contributor

@rigaspapas rigaspapas commented Feb 15, 2018

Checklist (for the reviewer)

  • Problem and solution are well-explained in the PR
  • Change is covered by unit-tests
  • Code is well documented
  • Code is styled well and is following best practices
  • Performs well when applied to big organizations/resources/etc from our production database
  • Errors are handled properly
  • All (conceivable) edge-cases are handled
  • Code is instrumented to report unexpected failures or increases in the failure rate
  • Commits have been squashed so that each one has a clear purpose (and green tests)
  • Proper labels have been applied

Problem

  1. Empty strings appeared in the editor (when only non-printable characters were included)
  2. Spaces around the text where not preserved

Steps to reproduce

  1. Use the ifood IDML file from this doc
  2. Use the TrustYouBranding file from the same doc and the recent research results string

Solution

  1. Ignore strings that don't contain letters, symbols or punctuation (as defined here)
  2. Extend the regular expression to ignore wrapping spaces

@@ -104,6 +104,9 @@ def _can_skip_content(self, string):
return True
except ValueError:
pass
# Special content in BackingStory.xml
if u'\ufeff' == string:
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rigaspapas why not just check for content after stripping all non-printable characters?
ref: https://mayart.de/download/Indesign-IDML/special-idml-chars.pdf

one idea would be to use the unicodedata module to ignore all character not in printable categories:
https://www.unicode.org/reports/tr44/#General_Category_Values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm saying this is who's to say we don't find another file that e.g. has a u'\u200c' character?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouk what we want to achieve is to avoid exporting text that is not translatable. There is no standard way in Unicode characters to distinguish letters from symbols. Is there any?

You are right that any non-printable character can cause the same unwanted result. I considered ignoring one specific character, because I guess that's what InDesign insert in every document automatically (not a user input).

@rigaspapas rigaspapas changed the title Ignore special character in BackingStory.xml InDesign finalizations Feb 16, 2018
@coveralls
Copy link

coveralls commented Feb 16, 2018

Coverage Status

Coverage increased (+0.02%) to 95.63% when pulling 6f70b68 on TX-9144-skip-special-content into 1374e95 on devel.

@rigaspapas rigaspapas requested a review from kouk February 16, 2018 14:31
Copy link
Contributor

@kouk kouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a bug here that needs fixing (see my comment below about the missing test case). Ignore the other comment (unless you like oneliners)

u'<?ACE 8?> <Br/>;',
u'\ufeff',
u' \ufeff ',
u' \ufeff 5',
Copy link
Contributor

@kouk kouk Feb 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's another test case:

u'\ufeff<Br/>;'

;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We should first strip the special characters and then check for the translatable ones. Thanks!

for letter in string:
char_type = unicodedata.category(letter)
if char_type[0] in acceptable:
return True
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for fun:

from six.moves import map
return any(c[0] in ["L", "P", "S"] for c in map(unicodedata.category, string))

or even

from six.moves import map
return any(map(["L", "P", "S"].__contains__,
               map(itemgetter(0), map(unicodedata.category, string))) 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouk I really like the first approach, which is very functional, but it would require a new dependency. So, should we leave it as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, it's fine like it is. But these examples don't really require the new dependency, you could do from itertools import imap. It's just that this way it's compatible with both python2 and python3.

Rigas Papathanasopoulos added 2 commits February 16, 2018 18:12
Use python's unicodedata library to identify printable characters and
ignore strings that don't contain any.
Copy link
Contributor

@kouk kouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rigaspapas rigaspapas merged commit 7eb0e10 into devel Feb 16, 2018
@rigaspapas rigaspapas deleted the TX-9144-skip-special-content branch February 20, 2018 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants