New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add localization support with po4a. #2793
Conversation
Hi @mebeim can you explain me, what is the current translation workflow? If any of the English strings are changed, how can translator be informed about this? |
@urbalazs that's what I was not understanding, thanks for clarifying. I thought you were just proposing a different translation mechanism for newly translated pages. Sorry for the misunderstanding. The issue you describe is currently unhandled, you're right. What you're proposing is interesting, but I'm not familiar with the tool and would like to have some real examples. Take for example the following scenario (points in chronological order):
What should be done before or after any of these points, using the tool you mentioned? |
@mebeim First of all, po4a have to be installed. Check the po4a.conf file, which I added with this PR. Scenario 1: someone create a new page. Scenario 2: someone translate a page. Scenario 3: some edit a translated page. |
@urbalazs shouldn't the |
@urbalazs so if a page is already translated and I want to edit it I should only touch the PO file... I see. Thanks for the thorough explanation though, looks cool, I will give it a try on my fork when I have some time. @waldyrious the pot file can be updated automatically by Travis for sure. |
@mebeim @waldyrious I added a bash script to this PR. Now you only have to add your language to the LANGS array (separated with spaces, e.g.: de hu it pt-BR ta zh ...) and execute the script: It does everything, what we need:
I really like it! Please note that if you add currently translated languages (it, pt-BR, ta, zh) to LANGS in bash script, the script will overwrite the existing translations! Translations now come from PO files exclusively! Existing translations need to be migrated to PO files. |
Shouldn't the PO files be checked in the repo as well ? This seems like the way to go, but it needs some effort to integrate it with our flow. @mebeim - Do you want to own this ? |
@agnivade I would like to, but I'm currently very busy so I've only managed to comment/review here and there lately. As I said earlier, I'd like to check this out on my fork to see how it works since I'm not familiar with the tool. Looks simple enough, but it indeed would require some work to be built into the current flow. I'll probably be able to take a look at this in two weeks. |
Great ! Please take your time, there is absolutely no rush. |
Cool, thanks @urbalazs! Looks like there's still some work to do, but great work so far! Does it support a wildcard, like this?
Also, our pages aren't actually asciidoc - they're markdown. Not sure if that affects things at all. Looks like there are some commits here with an email address different to the one that you've got attached toy our GitHub account by the way - I suspect that's why @CLAassistant is complaining. |
@agnivade PO files will be stored in the repo. I didn't send PO files yet, but my bash script will generate them to i18n folder.
No, as I know it doesn't. But this is not important for now, as my bash script will generate the po4a.conf file based on the current tldr pages.
I know, but po4a has no Markdown support. I tried all supported languages, and asciidoc works the best. It is not perfect, because asciidoc and Markdown are different. That's why I add a beautifier section to the bash script: I have to fix the generated output with
Yes, sorry for this. I forgot to change it in my local git config, and the first commit was made with my gmail address. But now I don't want to use GMail anymore, because I don't want to be "product". |
Cool, thanks for the clarification!
Ah, I see. I've got a similar reason for moving away from gmail myself. You may find an interactive rebase helpful in fixing the commit author, IIRC. |
Hey @urbalazs I was looking at po4a, and I noticed there's a "text" format. Would that have any advantage over asciidoc? For example longer lines? |
Using "text" format results these entries in POT and PO files:
The same with "asciidoc":
This is better for us, because the leading > and - characters are not part of the translatable text, so translators can not break the structure accidentally. |
@urbalazs oh, I see. So the line width limit is still 80 chars even with "text"? |
Yes, lines are wrapped when po4a generates the localized output. I tried to unwrap all lines before the operation, but without success. It seams, this is a builtin "feature" of po4a. |
Hi again @urbalazs, sorry if it took a long time, but I finally had the chance to take a look and test the proposed approach. Here's what I think: in short, this method looks much cooler, maintainable, and easier, but there are some problems that I think cannot be overlooked. Here's a list:
So, given the above points, most importantly number 1 and 2, while I think that an upgrade of the current translation workflow would be nice, to me it doesn't look like Let me know what you guys think about it, and of course do let me know if I got anything wrong above, since I am not familiar with the tool. The more opinions, the better! |
Thanks for working on this @mebeim !
I have a more of a general question. If the po4a workflow is adopted, would we have 2 sets of files then ? One raw set, which is to be edited/added to for changes. Another generated set, which will be consumed by the clients ? Sounds like a complicated workflow for a newcomer to contribute to. Right now, the md files are what is consumed. So anyone can see and contribute a PR, it's simple. But when po4a comes, folks have to go through the raw set, edit the correct translation msg and send a PR. |
Hello @mebeim
No, this is not true. Why do you think, that all translators have git knowledge? We should make translation easy! Forking, edit, PR create, sending is not the convenient way.
No. The expected workflow should be the follows (let assume, you already integrated Transifex or Weblate - let's call them "translation service"):
Then you have to loop step 1-4 in a regular bases. No more pull request is needed for translations. To all: Let's see this commit: 948147d Please let me know, if something is not clear or if you need help in anything! |
Example for Transifex integration and Weblate integration |
Hello,
sorry for the state of the documentation in po4a. I think it's rather
instructive, but very verbose. I'd need to apply the TL;DR approach :)
On Tue, Jun 25, 2019 at 04:52:13PM -0700, Marco Bonelli wrote:
1. Can pages be treated as independent translation units? As in: for
every page, once it's 100% translated (and only then), it will be
generated by `po4a`. You're saying that for this to happen we would
have to set the threshold to 100% and also create a separate `.po`
file for each page and `.pot` file for each page and language, is this
correct?
(a) Every page is an independent translation unit when it comes to the
generation. Each and every page that passes the limit is generated
while the other ones are just omitted. By default, that limit is at
80% which is useful in practice but you are free to set it at 100% if
you want.
(b) But all pages share the translation database, so that a given
paragraph appearing in more than one page only needs to be translated
once.
I was speaking of splitting the po files in many parts because that's
a possibility. But you'd not gain (a) since you only have it with a
big po file while you'd lose (b) so I would not advise it, actually.
But again, you are free of using the tool you see fit if you have a
strong advise here.
2. You're saying Markdown is supported, but I don't see it explicitly listed in the repository's README. Would that just be `[type: text]`?
Actually, that's an option of the text module. That would be something like:
[type:text] opt:"-o markdown" src/file.md $lang:doc/l10n/$lang/file.md
We could split it to a real module, just like what we did a while ago
for other textual formats. That would make them easier to find.
3. Is there any way to get around the line length limit? Don't know if this was updated or something.
I'm not sure I understand this question, sorry. Could you rephrase it?
4. So my understanding is that (for every translation unit) Weblate would manage the main `.pot` file and the other `.po` files and push them to the repository programmatically, am I right?
That would be easy to manage indeed updating the CI build to automatically generate translated pages from those.
That's also my understanding.
Thanks, Mt.
|
Thank you again @mquinson for the explanation. About the third point: I was talking about the following, quoting @urbalazs's comment [1] (emphasis mine):
and a followup on that [2] [3]:
Is this 80 columns limit unavoidable or compulsary? If not, how could it be disabled? |
Is this 80 columns limit unavoidable or compulsary? If not, how could it be disabled?
I think that the 'neverwrap' option of the Text module should be what you are looking for.
It is in the documentation, I hope it does what you want :) If not, we shall fix it...
|
@mquinson got it, thank you again! I will start experimenting with this when I have time, looks promising so far. Keep up the good work with |
I'm glad. Please do not hesitate any question that may arise.
Mt
|
I actually think this could be a great opportunity to introduce a line length limit in tldr-pages. Long lines typically mean we're trying to explain complex concepts, and the line limit could help identify and tackle those cases. I see it as similar to our limit to the number of examples per page, and perfectly in line with our mission of TL;DR-ing manpages and complex documentation. But if anyone disagrees, let's drop the idea for now, so as to not derail this discussion :) |
I need to mull on it for some time. In any case, let us discuss this in a separate issue. |
Makes sense. We can start with Opened #3145. |
Hi all! This thread has not had any recent activity. Are there any updates? Thanks! |
I second stale-bot's words 😄 Is anyone able to point out the state of this PR, and what's blocking progress? |
@waldyrious to sum it up: this seems like it could definitely be done, but it would take a lot of work and some big changes to the build process and translation contribution workflow, plus integration with Weblate, which would require either asking for a free host or self-hosting (I also don't know how the two options would differ in practice). A substantial part of the work would also consist in adapting existing translations. It does look promising, and nothing is really blocking the progress, it's just an overwhelming amount of work for anybody to just step in and say "ok, I'll handle this". |
Hmm. Sounds like it might be worth breaking it down a bit into multiple sub-components to make it easier to manage. It's a complicated problem though, so I'm not sure how to best do this. |
I don't think I agree with @mebeim here. I think that we are quite close to what's needed. I was still waiting for a follow up of you guys after starting playing and experimenting with po4a. This is where we are: Someone confident in tldr should try to apply this PR locally and play with the po4a command. Add the -k0 parameter in po4a.conf and ask a translation to a language that is not provided. The "translated" file will only contain english text, so that you can check whether the result is OK with you. (without -k0, such a page would not be generated as po4a wants at least 80% of a page to be translated to generate it). If the output is not looking nice, don't fiddle with the translate script and all its nasty sed commands. Ask me and the po4a dudes to fix the handling of markdown instead. That's much more productive to fix the bugs at their roots than to mask their effects with sed. Someone in the TLDR admin staff should contact the weblate administrator to ask for hosting. Since you are a free software, it may be easy. If he (or his server) is too busy, you should ask to crowdin, which will certainly accept. You may also consider http://zanata.org/ for free hosting of your translations. Once you have the po files generated with po4a and a working hosting solution, setting up everything is easy. Dealing with the existing translation is somewhat more difficult, but po4a comes with a tool to convert a master file and its translation into a po file that can be integrated in the regular po4a workflow. This tool must be used by someone understanding the translation (to adapt the translated file if the structure was somewhat modified), but that's not very difficult. It's straightforward if the structure of the master and translated files match exactly, and it's a bit long if not (but that's still not complex). See https://po4a.org/man/man1/po4a-gettextize.1.php The difficult part is to adapt the current translation workflow, as humans are involved. If the teams are already working and happy with their workflow, that's maybe not a good idea to force them to adapt to a methodology. You can still setup the po4a thing, and tell the other teams (and the ones which don't exist yet) to go for that, as it's easier on the long term. Note that the more you wait, the harder that part of the conversion will be. Note also that if you prefer not to use po4a, I'm perfectly fine with it. I don't sell po4a but I'm a potential user of the french version of TLDR. As long as the material gets translated and that translations are maintained whenever the original text changes, that's cool. If your translation workflow takes this into account, please forget about po4a! |
Yeah, the current issue with the existing translation workflow is twofold:
I could reach out to weblate etc. on behalf of tldr-pages, if that's helpful. The structure of translations and the original master should be identical..... unless the master has been updated and the translation hasn't. |
Hi, @mquinson. There's a problem that I didn't think of previously. Assume the following:
Do you know how something like this could be handled? Ideally, we would like to keep the "old" version of the Italian page until the new English strings are also translated and we have a "new" fully translated page again. I hope I was clear enough. |
But again, consider carefully whether it's really better to keep an old complete translation rather than providing a recent version with one or two new sentences in english in the middle of the translation. The good news is that each team can chose a different policy, if you want, as both approach are really easy to get with po4a. |
I would like to stress again that the support for markdown is somewhat experimental in po4a. If/when you see bugs, we'll fix them. |
@mquinson thank you for the quick response, I guess generating files into a new folder would indeed work. The scenario still applies even with a 80% threshold. Since as of now translation are basically all manual, it's like working with a 100% threshold, and I'm assuming this threshold just to make it simpler, but of course once the whole thing is set up we will discuss what's best to set as threshold. As per the Markdown support, I'll let you know for sure if there's something that looks like a bug. For what I could see until now, it seems to work fine. |
Hi all! This thread has not had any recent activity. Are there any updates? Thanks! |
Hi everyone. This thread is being closed as there was no response to the previous prompt. However, please leave a comment whenever you're ready to resume, so the thread can be reopened. Thanks again! |
Hello @sbrl
Pull request is created as you asked. I also generate an initial POT file for translators.
More information in #2339 (comment).