Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
PO file with utf-8 BOM signature at beginning of file will not be recognized as translatable #1640
Originally posted by langtechie:
(Tested on Firefox 3.6.8)
Uploading a PO file with utf-8 BOM signature at the beginning of the file, and Pootle will say there are no words to translate, even though the file has actual strings to translate.
Remove the utf-8 BOM and re-upload, and Pootle will recognized words to translate.
Expected: utf-8 files with BOM signature should be supported in Pootle as it is a common practice.
Thank you for mentioning this idea. To my best knowledge, none of the official gettext tools support BOMs, and no conforming PO editor should create a file with a BOM. Therefore I don't really consider this common practice. So this seems to be about handling broken files.
If we do handle the BOM, how do we handle a mismatch between the BOM and the encoding specified in the file header? Also, if the BOM indicates UTF-16 (which is not a valid PO encoding as far as I know), what should we do?
We plan to rely entirely on the parser in the gettext package in future, so it might become even harder to do anything different from the gettext package at that stage. So I'm not sure if this is necessarily a good idea. We'll have to think about it a bit more.
(In reply to
[dwayne@db storage]$ msgcat thirdwheel_django_sample.po
The BOM causes Gettext tools to fail. I think it is a common Windows practise but quoting Wikipedia "While Unicode standard allows BOM in UTF-8 , it does not require or recommend it. Byte order has no meaning in UTF-8"
(In reply to
Yes its a broken file. I tested with poedit. It opens without complaining but on saving removes the BOM.
Yes, UTF-16 is invalid (checked with msgconv). I would say we simply remove a character sequence of 0xEF,0xBB,0xBF if it appears in a PO file would be a good workaround for broken PO files.
I think we should approach it like poedit. Although its broken there is a very real possibility that someone unwittingly edits a PO file on a platform that adds BOMs.
No amount of hand waving will detract from a bad user experience.