New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
po2sub gets encoding wrong and fails #3827
Comments
Edit: my bad, this is an old bug #3601 I also have the same problem with v3.2.0: xliff2po: WARNING: Error processing: input src/assets/i18n/messages.en.xlf, output None, template None: 'ascii' codec can't encode character '\xe9' in position 70: ordinal not in range(128) My file is UTF-8 encoded, not ascii. Everything is fine with v2.2.5, so this is a regression. |
Looking into this more, it seems that @afranke's and @Toub's bugs are separate. @afranke, this seems to be an issue with For @Toub's problem, it's an old bug that's affecting more converters than just xliff2po. web2py2po had the same issue. |
I've being debugging (not for 2 years, btw) and I have 2 notes on this issue: 1- While chardet fails to detect as UTF-8 single characters like >>> import chardet
>>> chardet.detect('Ce cycle, GNOME Shell a reçu une attention particulière sur l’optimisation des performances.'.encode('utf-8'))
{'encoding': 'utf-8', 'confidence': 0.87625, 'language': ''} 2- An ugly workaround for this issue is to force the desired encoding instead of relying on translate-toolkit to detect it. For example, take the following patch: --- translate/storage/subtitles.py-orig 2020-09-18 09:54:47.875337951 -0300
+++ translate/storage/subtitles.py 2020-09-18 09:55:55.799517988 -0300
@@ -105,7 +105,7 @@
def _parse(self):
try:
- self.encoding = detect(self.filename)
+ self.encoding = "utf-8"
self._format = determine(self.filename, self.encoding)
self._subtitlefile = new(self._format, self.filename, self.encoding)
for subtitle in self._subtitlefile.read(): and run it with a command like: $ patch -p3 venv/lib/python3.8/site-packages/translate/storage/subtitles.py < force-utf-8.patch po2sub's conversion works, but one might want to check if the po file is UTF-8 before it e.g. using |
Yes chardet is not always reliable in detecting utf-8, for example there is chardet/chardet#148 |
I installed translate-toolkit 2.3.0 via pip but the same problem with 2.0.0b5, and with the Fedora 28 package as well.
po2sub -t gnome330.srt po/fr.po fr.srt
fails withFor some reason it thinks it’s latin-1 when it should be utf-8.
I’m attaching the files so you can try it yourself.
The text was updated successfully, but these errors were encountered: