Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MessageFormatter for translation including plural logic #3286

Merged
merged 10 commits into from Jan 10, 2024

Conversation

maccabeelevine
Copy link
Member

@maccabeelevine maccabeelevine commented Jan 3, 2024

POC for https://openlibraryfoundation.atlassian.net/browse/VUFIND-1644

TODO

  • Update all language files to match; remove obsolete strings
  • Update changelog (note new mechanism, note removal of "Yesterday" string)

languages/en.ini Outdated Show resolved Hide resolved
@demiankatz
Copy link
Member

@maccabeelevine, thanks for getting the ball rolling on this!

I don't like the idea of creating a new hacky mechanism if we can achieve the same effect with something more standards-based, so I decided to take half an hour and see if I could adjust your code to use the MessageFormatter to achieve the same effect. Turns out, I could! See commit 522c3f8 and let me know what you think about this...

The main potential issue is that the MessageFormatter is not compatible with our %%token%% convention. In this instance, I just stripped the %% off the token name in the calling code, and everything is fine. This is probably not a big problem, though in theory we could keep the percent markers and strip them off in the translator logic if we really wanted to (but I think doing extra pointless work seems undesirable).

If you think the original approach is better, we can of course revert my changes. I just wanted to share my demonstration for further consideration.

If we want to go forward with this approach, we should update all the language files to use the new format. I think I can probably do this pretty easily with the existing language template tool and a bit of search-and-replace post-processing. But I'll hold off on that work until we have consensus on the path forward.

Thanks again!

@demiankatz
Copy link
Member

Another option to consider: always use the MessageFormatter when there are tokens. This would require us to eliminate all percent signs from all token names and reformat all tokens in all language files, so it's not a small project... but I suspect it's feasible, and it might make things more straightforward in the long run.

@demiankatz demiankatz added improvement architecture pull requests that involve significant refactoring / architectural changes labels Jan 3, 2024
@demiankatz demiankatz added this to the 10.0 milestone Jan 3, 2024
@maccabeelevine
Copy link
Member Author

I don't like the idea of creating a new hacky mechanism if we can achieve the same effect with something more standards-based, so I decided to take half an hour and see if I could adjust your code to use the MessageFormatter to achieve the same effect. Turns out, I could! See commit 522c3f8 and let me know what you think about this...

Nice. This is much cleaner IMHO. I wonder though how it would work with translations that are longer than "Yesterday", i.e. a full sentence or so for each variation. Can you do a multi-line in the context of the .ini file? If not, is it readable?

The main potential issue is that the MessageFormatter is not compatible with our %%token%% convention. In this instance, I just stripped the %% off the token name in the calling code, and everything is fine. This is probably not a big problem, though in theory we could keep the percent markers and strip them off in the translator logic if we really wanted to (but I think doing extra pointless work seems undesirable).

I don't like having no special characters in the substitution key, although in practice I agree it's probably not a concern. I don't think we have to mandate using MessageFormatter for token substitution, but if we do use MF, it's probably worth it to suggest that going forward.

@demiankatz
Copy link
Member

Nice. This is much cleaner IMHO. I wonder though how it would work with translations that are longer than "Yesterday", i.e. a full sentence or so for each variation. Can you do a multi-line in the context of the .ini file? If not, is it readable?

Unfortunately, the .ini file format does not support multi-line text, so very complex strings could become hard to read. But I don't think that's a deal-breaker here, since we still gain a lot of power and flexibility. If this becomes a significant problem, we could always investigate using a different storage format for our translations.

I don't like having no special characters in the substitution key, although in practice I agree it's probably not a concern. I don't think we have to mandate using MessageFormatter for token substitution, but if we do use MF, it's probably worth it to suggest that going forward.

I mean, with MessageFormatter, the special characters are { ... } instead of %% ... %%. So it's not like the keys in the language strings are undelimited or anything. Indeed, when providing tokens, using the key name without the delimiters really makes more sense than what we have done historically anyway. But it's a big change, so I agree that it probably makes sense to transition gently and just suggest using MF for new token strings. In the future, we can make the translator start to throw deprecation warnings if it encounters tokens using the legacy method, and we can gradually phase the old way out.

@demiankatz
Copy link
Member

Update: I went ahead and updated all of the existing language files. I tested that this works correctly in every existing locale. I went ahead and deleted the "Yesterday" translation since I don't think it's likely to be used in any other context. I'll add a changelog note about it when this is merged.

@demiankatz demiankatz marked this pull request as ready for review January 3, 2024 14:24
@demiankatz
Copy link
Member

I also changed the updated en.ini example to use "7" as "Past Week" instead of "30" as "Past Month." Since month lengths vary, that language is potentially misleading... but a week is always a week. :-)

Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test case to cover the new functionality and ensured that the full existing test suite is passing. I think this may be ready to merge now, but I'd like another opinion or two first. I'll share this on Slack and give it a couple of days to see if anyone has anything to add.

@maccabeelevine
Copy link
Member Author

Update: I went ahead and updated all of the existing language files.

I get that we can't leave the translations as-is, but I assume several of them are now incorrect given how plurals work in those languages. Assume these will be passed through Lokalize during the v10 cycle.

@demiankatz
Copy link
Member

@maccabeelevine, none of these should be incorrect. They use the existing "Yesterday" translation for a value of 1, and the existing translation for everything else. Nothing will behave differently than before.

@demiankatz
Copy link
Member

There was just one language (Galician) that lacked a Yesterday translation, so I did a little research and fixed it manually.

@maccabeelevine maccabeelevine changed the title Support token match variation on translate Support MessageFormatter for translation including plural logic Jan 3, 2024
@demiankatz demiankatz merged commit 2ff49d4 into vufind-org:dev Jan 10, 2024
7 checks passed
@demiankatz
Copy link
Member

Thanks again, @maccabeelevine. I'm merging this now so we can check out how it behaves in Lokalise and experiment a bit. We'll likely want to do some followup work to apply the mechanism more broadly once we are confident in it. I'll make sure to capture that as a to-do in an appropriate open JIRA ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture pull requests that involve significant refactoring / architectural changes improvement
Projects
None yet
2 participants