Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
poterminology should suggest options when thresholding eliminates all terms #582
Poterminology with the default settings gets zero terms from the attached PO file of over half a million words.
The PO file:
The PO file seems to be seriously broken. What tool created this?
There are also duplicates, which our tools can usually handle, but gettext can’t.
The bigger issue here is the absence of #: comments. It seems poterminology can’t currently operate without #: comments (I verified with another file created with msgunfmt).
You need to specify —locs-needed=1 if you have an input file that lacks any location information, but poterminology can work with such a file. Note that this is already explained in the (excellent) wiki documentation http://translate.sourceforge.net/wiki/toolkit/poterminology:
Rather than requiring that a term appear in multiple input PO or POT files, this requires that it have been present in multiple source code files, as evidenced by location comments in the PO/POT sources.
[There is also a very relevant comment a bit further down:]
These two thresholds specify the number of different translation units (messages) in which a term must appear; they both work in the same way, but the first one applies to terms which appear as complete translation units in one or more of the source files (full message terms), and the second one to all other terms (substring terms). Note that translations are extracted only for full message terms; poterminology cannot identify the corresponding substring in a translation.
If you are working with a single input file without useful location comments, increasing these thresholds may be the only way to effectively reduce the output terminology. Generally, you should increase the —substr-needed threshold first, as the full message terms are more likely to be useful terminology.
Given that you are trying to get useful terminology from a single file with no location information, I suspect you will need to use the above thresholds.
Rather than reject this bug, I will take it as a suggestion for improvement, which is that poterminology should suggest option settings (based on the maximum observed values of all threshold quantities) if thresholding removes all terms. This would provide output something like the following:
C:>poterminology ansi.po ansi_terms.po
The suggestion text would include all options where the maximum threshold
:poterminology —locs-needed=1 newansi.po newansi_terms.po
:poterminology —locs-needed=0 newansi.po -o newansi_terms.po
Well, that works at least.