Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n translation workflow migration #21

Closed
rexwangcc opened this issue Nov 11, 2020 · 12 comments
Closed

i18n translation workflow migration #21

rexwangcc opened this issue Nov 11, 2020 · 12 comments

Comments

@rexwangcc
Copy link
Contributor

The existing zh-cn documentation source repo has a well-designed streamlined translation workflow It was designed to work with documentation written in .rst files. With the new docsite, we need to migrate the translateion workflow first before migrating the zh-cn docs themselves.

Related discussion: https://github.com/taichi-dev/taichi.graphics/pull/17#discussion_r511641423

@rexwangcc
Copy link
Contributor Author

rexwangcc commented Nov 11, 2020

After doing some research about i18n today, I have a second and even reckless thought.

It turns out some projects like Minecraft, electron or PostgreSQL are using professional collaboration platforms such as Crowdin or Transifex which have pretty streamlined workflows and tools for supporting collaborated content localizations. I quickly tried Crowdin and its online editor seems to be pretty friendly:

Screen Shot 2020-11-11 at 12 02 46 AM

Screen Shot 2020-11-11 at 12 03 57 AM

I try to think of the pros and cons of using them instead:

Pros

  • Using these platforms lowers the bar of docs translation, contributors will probably not need to set up git, make and poedit.
  • Maintainers does not have to spend extra efforts pulling the source, resolving conflicts and updating content as we have done a few times in the past.
  • These platforms provide fancy views of the progress of the translation, coordination could be easier and does not have to be done on Github as we did before. They also provide tools to escape special syntaxs, so translators can worry less about the formatting and styles.
  • More languages can be easily integrated.

Cons

  • The biggest concern of mine is that in order to use these platforms without paying them on a monthly basis, we need to apply for an open-source tier, for example, Open-source project setup request form which has a long list of requirements (no.4 is what I'm not sure about). I'm not sure about @yuanming-hu 's long-term plan for the project, so we might not be qualified for the open-source free tier now or in the future.
  • I haven't done thorough research into the platforms, so I'm not sure if the locale files can be ported (such as into .po files) once the translation is done. This can be risky since we probably want a platform agnostic approach for the super long-term. (Since Crowdin was there for a long time and a lot of much larger projects are using it, I'm less concerning about this)

Above is just a random idea came into mind, if folks don't think it's feasible, we can start to migrate the existing translation workflow asap.

@yuanming-hu
Copy link
Member

Thank you so much for the investigations! Crowdin does sound like a good cloud-based tool to use.

  • The biggest concern of mine is that in order to use these platforms without paying them on a monthly basis, we need to apply for an open-source tier, for example, Open-source project setup request form which has a long list of requirements (no.4 is what I'm not sure about). I'm not sure about @yuanming-hu 's long-term plan for the project, so we might not be qualified for the open-source free tier now or in the future.

no.4 is no problem. I don't think we will have any commercial products in foreseeable future. I can go apply for a license if we decide to use Crowdin. (Even if we don't have the free license I can somehow pay for it, so no worries about this one :-)

  • I haven't done thorough research into the platforms, so I'm not sure if the locale files can be ported (such as into .po files) once the translation is done. This can be risky since we probably want a platform agnostic approach for the super long-term. (Since Crowdin was there for a long time and a lot of much larger projects are using it, I'm less concerning about this)

Yeah, I am also a little concerned about this, so I talked to Crowdin:

Screenshot from 2020-11-11 20-41-15
Screenshot from 2020-11-11 20-41-27
Screenshot from 2020-11-11 20-41-33

It does seem promising: we can upload/download/modify online the PO files :-)

@rexwangcc
Copy link
Contributor Author

WOW, thank you so much for reaching out to them so promptly!

I think we could try to apply for the Open Source tier plan and investigate with our docer team, worst case we could fall back to the existing workflow. (it's a relatively big commitment to pay $ out of anyone's (including your's) pocket monthly for such localization tooling for now since we primarily focus on only 2 locales)

The PO import/export feature sounds really nice, and the online editor is AFAIK pretty intuiative for collaboration. I can work on the initial iteration of migrating and uploading the existing PO files!

It'd be heplful if others from the docer team provide some insights here!

@isdanni
Copy link
Collaborator

isdanni commented Nov 16, 2020

Thanks for the idea! This sounds good to me. I just signed up on Crowdin as a free member(@isdanni). (Also, watching the taichi.graphics repo now so I don't miss any discussions in the future ;))

As for localization, I only have few questions:

  1. are we creating two projects separately, one for EN and one for CN, or just one with the same file hierarchy as on website?
  2. If I'm not mistaken the workflow should be: one uploads the entire /doc folder => documentation team work on docs together(so they don't need to use version control, etc, much more accessible for contributors) => for each period release, one download, replace the previous version in /doc and /zh and build?

Pros
Using these platforms lowers the bar of docs translation, contributors will probably not need to set up git, make and poedit.
Maintainers does not have to spend extra efforts pulling the source, resolving conflicts and updating content as we have done a few times in the past.
These platforms provide fancy views of the progress of the translation, coordination could be easier and does not have to be done on Github as we did before. They also provide tools to escape special syntaxs, so translators can worry less about the formatting and styles.
More languages can be easily integrated.

Completely agree with the pros here! Each doc build is getting heavy as the website size grows. Once the migration down and workflow is well maintained, we can definitely welcome language translations here.

@rexwangcc
Copy link
Contributor Author

rexwangcc commented Nov 18, 2020

Thanks for the idea! This sounds good to me. I just signed up on Crowdin as a free member(@isdanni). (Also, watching the taichi.graphics repo now so I don't miss any discussions in the future ;))

As for localization, I only have few questions:

  1. are we creating two projects separately, one for EN and one for CN, or just one with the same file hierarchy as on website?
  2. If I'm not mistaken the workflow should be: one uploads the entire /doc folder => documentation team work on docs together(so they don't need to use version control, etc, much more accessible for contributors) => for each period release, one download, replace the previous version in /doc and /zh and build?

Pros
Using these platforms lowers the bar of docs translation, contributors will probably not need to set up git, make and poedit.
Maintainers does not have to spend extra efforts pulling the source, resolving conflicts and updating content as we have done a few times in the past.
These platforms provide fancy views of the progress of the translation, coordination could be easier and does not have to be done on Github as we did before. They also provide tools to escape special syntaxs, so translators can worry less about the formatting and styles.
More languages can be easily integrated.

Completely agree with the pros here! Each doc build is getting heavy as the website size grows. Once the migration down and workflow is well maintained, we can definitely welcome language translations here.

Thank you so much for providing inputs on this, re your questions, I did some investigations with Crowdin with my personal project, sorry for the delayed reply:

  1. are we creating two projects separately, one for EN and one for CN, or just one with the same file hierarchy as on website?

Assuming we are talking about "Crowdin project" here, I believe once @yuanming-hu helps us submit an application to them, this will result in 1 project called "Taichi", and it can have a number of different translations. It seems Crowdin supports templating the file structure such docs/%locale%/%original_file_name% which will result in docs/zh-CN/index.md for docs/en/index.md, this can help us keep the website's file hierarchy. I believe the PO files will be stored on Crowdin and we only need to version control and resolve the translated files. Crowdin will use Github integration API to automatically and periodically configurable) make PRs from a translate branch to master. I also prefer this way since following the website hierarchy is easier for both sides, and we should exclude all other files except .md for translations.

  1. If I'm not mistaken the workflow should be: one uploads the entire /doc folder => documentation team work on docs together(so they don't need to use version control, etc, much more accessible for contributors) => for each period release, one download, replace the previous version in /doc and /zh and build?

I thought it was the case too until I tried using Crowdin myself. While it would definitely take our docer team sometime to set it up before we get a smooth workflow, I guess the workflow would be more like:

  1. We upload/modify/re-use the existing PO files in Crowdin, setup the Github integrations on Crowdin side. This happend only once.
  2. Crowdin creates a translation branch, which is used for their bots to version-control and backup the translation for us. Crowdin periodically syncs/pulls our repo, and resolve updated/untranslated files automatically.
  3. One either looks at the target "locale" (e.g. zh-CN) website or periodically goes to Crowdin to monitor the translation progress and finds what content requires translation or updated translation, if so, use the online editor to finish the translate work. Review work will also happens there.
  4. Crowdin will periodically pushe the translations back to Github repo and make PRs to merge from translate to master, we review the PR again as a double check and merge the PR.

Basically Crowdin's Github integration bot will handle sync between this Github repo and their system, and we redirect docers to the Crowdin project to focus on translations without the source code / build overhead. Crowdin seems to be smart enough to resolve source docs changes. Since this is just my superficial understanding of Crowdin based workflow, please let me know if there are missing pieces!

@isdanni
Copy link
Collaborator

isdanni commented Nov 18, 2020

Thanks for the idea! This sounds good to me. I just signed up on Crowdin as a free member(@isdanni). (Also, watching the taichi.graphics repo now so I don't miss any discussions in the future ;))
As for localization, I only have few questions:

  1. are we creating two projects separately, one for EN and one for CN, or just one with the same file hierarchy as on website?
  2. If I'm not mistaken the workflow should be: one uploads the entire /doc folder => documentation team work on docs together(so they don't need to use version control, etc, much more accessible for contributors) => for each period release, one download, replace the previous version in /doc and /zh and build?

Pros
Using these platforms lowers the bar of docs translation, contributors will probably not need to set up git, make and poedit.
Maintainers does not have to spend extra efforts pulling the source, resolving conflicts and updating content as we have done a few times in the past.
These platforms provide fancy views of the progress of the translation, coordination could be easier and does not have to be done on Github as we did before. They also provide tools to escape special syntaxs, so translators can worry less about the formatting and styles.
More languages can be easily integrated.

Completely agree with the pros here! Each doc build is getting heavy as the website size grows. Once the migration down and workflow is well maintained, we can definitely welcome language translations here.

Thank you so much for providing inputs on this, re your questions, I did some investigations with Crowdin with my personal project, sorry for the delayed reply:

  1. are we creating two projects separately, one for EN and one for CN, or just one with the same file hierarchy as on website?

Assuming we are talking about "Crowdin project" here, I believe once @yuanming-hu helps us submit an application to them, this will result in 1 project called "Taichi", and it can have a number of different translations. It seems Crowdin supports templating the file structure such docs/%locale%/%original_file_name% which will result in docs/zh-CN/index.md for docs/en/index.md, this can help us keep the website's file hierarchy. I believe the PO files will be stored on Crowdin and we only need to version control and resolve the translated files. Crowdin will use Github integration API to automatically and periodically configurable) make PRs from a translate branch to master. I also prefer this way since following the website hierarchy is easier for both sides, and we should exclude all other files except .md for translations.

  1. If I'm not mistaken the workflow should be: one uploads the entire /doc folder => documentation team work on docs together(so they don't need to use version control, etc, much more accessible for contributors) => for each period release, one download, replace the previous version in /doc and /zh and build?

I thought it was the case too until I tried using Crowdin myself. While it would definitely take our docer team sometime to set it up before we get a smooth workflow, I guess the workflow would be more like:

  1. We upload/modify/re-use the existing PO files in Crowdin, setup the Github integrations on Crowdin side. This happend only once.
  2. Crowdin creates a translation branch, which is used for their bots to version-control and backup the translation for us. Crowdin periodically syncs/pulls our repo, and resolve updated/untranslated files automatically.
  3. One either looks at the target "locale" (e.g. zh-CN) website or periodically goes to Crowdin to monitor the translation progress and finds what content requires translation or updated translation, if so, use the online editor to finish the translate work. Review work will also happens there.
  4. Crowdin will periodically pushe the translations back to Github repo and make PRs to merge from translate to master, we review the PR again as a double check and merge the PR.

Basically Crowdin's Github integration bot will handle sync between this Github repo and their system, and we redirect docers to the Crowdin project to focus on translations without the source code / build overhead. Crowdin seems to be smart enough to resolve source docs changes. Since this is just my superficial understanding of Crowdin based workflow, please let me know if there are missing pieces!

I see 👍 Thanks so much for such a detailed explanation! I guess now I will wait for the application on Clowdin's end is done and then start the work. Will look into details to be more familiar with the workflow.

@yuanming-hu
Copy link
Member

Thank you so much for all the discussions here! I just now created the project on Crowdin: https://crowdin.com/project/taichi-programming-language @rexwangcc and @isdanni should have received an email invitation as the project manager.

I also sent out an open-source project request so that we can use it for free. Crowdin has a 14-day free-trial (open source or not), hopefully, our request can get approved within 14 days.

(Sorry about my delayed reply - it's been a tough year in the US and I'll likely have more time during Thanksgiving!)

@isdanni
Copy link
Collaborator

isdanni commented Nov 18, 2020

Thank you so much for all the discussions here! I just now created the project on Crowdin: https://crowdin.com/project/taichi-programming-language @rexwangcc and @isdanni should have received an email invitation as the project manager.

I also sent out an open-source project request so that we can use it for free. Crowdin has a 14-day free-trial (open source or not), hopefully, our request can get approved within 14 days.

(Sorry about my delayed reply - it's been a tough year in the US and I'll likely have more time during Thanksgiving!)

Cool! Sorry my email wasn't verified before so missed the last invite, would you mind sending it again? 😂

@rexwangcc
Copy link
Contributor Author

Thank you so much for all the discussions here! I just now created the project on Crowdin: https://crowdin.com/project/taichi-programming-language @rexwangcc and @isdanni should have received an email invitation as the project manager.

I also sent out an open-source project request so that we can use it for free. Crowdin has a 14-day free-trial (open source or not), hopefully, our request can get approved within 14 days.

(Sorry about my delayed reply - it's been a tough year in the US and I'll likely have more time during Thanksgiving!)

Thanks! I'm now a member of that project, will try to configure the Github integration today after hours!

(The numbers are climbing unsettlingly again, take care!)

@yuanming-hu
Copy link
Member

@isdanni I was able to change your role from translator to manager just now :-) No need to emailing. Thanks!

@rexwangcc Thank you so much! (Hopefully, the COVID situation in Cambridge can go under control again soon.)

@yuanming-hu
Copy link
Member

yuanming-hu commented Nov 19, 2020

Update: the license is approved :-) I believe we can use it as long as we want in the foreseeable future.

Screenshot from 2020-11-19 09-16-47

@rexwangcc
Copy link
Contributor Author

rexwangcc commented Nov 20, 2020

Update: Crowdin integration is now in a relatively stable state, now contributors could go to https://crowdin.com/project/taichi-programming-language/zh-CN# to claim untranslated files and translate online, translations will be reviewed on Crowdin and automatically get opened as PRs from branch l10n_master to master every 6 hours. Since the files matchers are wildcards in crowdin.yml, any new docs or updated doc strings will be automatically detected by Crowdin.

Thank you @yuanming-hu @isdanni both for providing insights into this and helping set up the workflow!! I'm going to close this issue and rely on #22 to track the work of migrating the existing PO files, hopefully by Thanksgiving.

I noticed there are stale PRs in the main repo that updates the docs that we should either merge or close and mirror in this repo; there are also outdated docs in this repo compared to the docs/ in the main repo. Once we finish the migration, we may consider take the docs folder out of the main repo ASAP to avoid inconsistent states.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants