Support MessageFormatter for translation including plural logic #3286

maccabeelevine · 2024-01-03T12:28:04Z

POC for https://openlibraryfoundation.atlassian.net/browse/VUFIND-1644

TODO

Update all language files to match; remove obsolete strings
Update changelog (note new mechanism, note removal of "Yesterday" string)

languages/en.ini

module/VuFind/src/VuFind/I18n/Translator/TranslatorAwareTrait.php

demiankatz · 2024-01-03T13:32:37Z

@maccabeelevine, thanks for getting the ball rolling on this!

I don't like the idea of creating a new hacky mechanism if we can achieve the same effect with something more standards-based, so I decided to take half an hour and see if I could adjust your code to use the MessageFormatter to achieve the same effect. Turns out, I could! See commit 522c3f8 and let me know what you think about this...

The main potential issue is that the MessageFormatter is not compatible with our %%token%% convention. In this instance, I just stripped the %% off the token name in the calling code, and everything is fine. This is probably not a big problem, though in theory we could keep the percent markers and strip them off in the translator logic if we really wanted to (but I think doing extra pointless work seems undesirable).

If you think the original approach is better, we can of course revert my changes. I just wanted to share my demonstration for further consideration.

If we want to go forward with this approach, we should update all the language files to use the new format. I think I can probably do this pretty easily with the existing language template tool and a bit of search-and-replace post-processing. But I'll hold off on that work until we have consensus on the path forward.

Thanks again!

demiankatz · 2024-01-03T13:36:05Z

Another option to consider: always use the MessageFormatter when there are tokens. This would require us to eliminate all percent signs from all token names and reformat all tokens in all language files, so it's not a small project... but I suspect it's feasible, and it might make things more straightforward in the long run.

maccabeelevine · 2024-01-03T13:53:14Z

I don't like the idea of creating a new hacky mechanism if we can achieve the same effect with something more standards-based, so I decided to take half an hour and see if I could adjust your code to use the MessageFormatter to achieve the same effect. Turns out, I could! See commit 522c3f8 and let me know what you think about this...

Nice. This is much cleaner IMHO. I wonder though how it would work with translations that are longer than "Yesterday", i.e. a full sentence or so for each variation. Can you do a multi-line in the context of the .ini file? If not, is it readable?

The main potential issue is that the MessageFormatter is not compatible with our %%token%% convention. In this instance, I just stripped the %% off the token name in the calling code, and everything is fine. This is probably not a big problem, though in theory we could keep the percent markers and strip them off in the translator logic if we really wanted to (but I think doing extra pointless work seems undesirable).

I don't like having no special characters in the substitution key, although in practice I agree it's probably not a concern. I don't think we have to mandate using MessageFormatter for token substitution, but if we do use MF, it's probably worth it to suggest that going forward.

module/VuFind/src/VuFind/I18n/Translator/TranslatorAwareTrait.php

demiankatz · 2024-01-03T14:03:53Z

Nice. This is much cleaner IMHO. I wonder though how it would work with translations that are longer than "Yesterday", i.e. a full sentence or so for each variation. Can you do a multi-line in the context of the .ini file? If not, is it readable?

Unfortunately, the .ini file format does not support multi-line text, so very complex strings could become hard to read. But I don't think that's a deal-breaker here, since we still gain a lot of power and flexibility. If this becomes a significant problem, we could always investigate using a different storage format for our translations.

I don't like having no special characters in the substitution key, although in practice I agree it's probably not a concern. I don't think we have to mandate using MessageFormatter for token substitution, but if we do use MF, it's probably worth it to suggest that going forward.

I mean, with MessageFormatter, the special characters are { ... } instead of %% ... %%. So it's not like the keys in the language strings are undelimited or anything. Indeed, when providing tokens, using the key name without the delimiters really makes more sense than what we have done historically anyway. But it's a big change, so I agree that it probably makes sense to transition gently and just suggest using MF for new token strings. In the future, we can make the translator start to throw deprecation warnings if it encounters tokens using the legacy method, and we can gradually phase the old way out.

demiankatz · 2024-01-03T14:21:17Z

Update: I went ahead and updated all of the existing language files. I tested that this works correctly in every existing locale. I went ahead and deleted the "Yesterday" translation since I don't think it's likely to be used in any other context. I'll add a changelog note about it when this is merged.

demiankatz · 2024-01-03T14:24:53Z

I also changed the updated en.ini example to use "7" as "Past Week" instead of "30" as "Past Month." Since month lengths vary, that language is potentially misleading... but a week is always a week. :-)

demiankatz

I've added a test case to cover the new functionality and ensured that the full existing test suite is passing. I think this may be ready to merge now, but I'd like another opinion or two first. I'll share this on Slack and give it a couple of days to see if anyone has anything to add.

maccabeelevine · 2024-01-03T14:48:44Z

Update: I went ahead and updated all of the existing language files.

I get that we can't leave the translations as-is, but I assume several of them are now incorrect given how plurals work in those languages. Assume these will be passed through Lokalize during the v10 cycle.

demiankatz · 2024-01-03T14:58:12Z

@maccabeelevine, none of these should be incorrect. They use the existing "Yesterday" translation for a value of 1, and the existing translation for everything else. Nothing will behave differently than before.

demiankatz · 2024-01-03T14:58:43Z

There was just one language (Galician) that lacked a Yesterday translation, so I did a little research and fixed it manually.

demiankatz · 2024-01-10T13:59:12Z

Thanks again, @maccabeelevine. I'm merging this now so we can check out how it behaves in Lokalise and experiment a bit. We'll likely want to do some followup work to apply the mechanism more broadly once we are confident in it. I'll make sure to capture that as a to-do in an appropriate open JIRA ticket.

Support token match variation on translate

8410a1d

maccabeelevine commented Jan 3, 2024

View reviewed changes

languages/en.ini Outdated Show resolved Hide resolved

maccabeelevine commented Jan 3, 2024

View reviewed changes

module/VuFind/src/VuFind/I18n/Translator/TranslatorAwareTrait.php Outdated Show resolved Hide resolved

Switch to using MessageFormatter.

522c3f8

Uncomment line to fix build failure.

72fa5cd

Fix comment.

018c1c8

demiankatz added improvement architecture pull requests that involve significant refactoring / architectural changes labels Jan 3, 2024

demiankatz added this to the 10.0 milestone Jan 3, 2024

demiankatz added 3 commits January 3, 2024 08:41

php-cs-fixer

d5f44e5

phpstan fix.

f1f2c4a

Remove unnecessary comment.

d3b144a

maccabeelevine commented Jan 3, 2024

View reviewed changes

module/VuFind/src/VuFind/I18n/Translator/TranslatorAwareTrait.php Outdated Show resolved Hide resolved

Rename variable.

4a247c7

Adjust all language files.

1a2e001

demiankatz marked this pull request as ready for review January 3, 2024 14:24

Add test coverage.

01f0056

demiankatz approved these changes Jan 3, 2024

View reviewed changes

maccabeelevine changed the title ~~Support token match variation on translate~~ Support MessageFormatter for translation including plural logic Jan 3, 2024

demiankatz merged commit 2ff49d4 into vufind-org:dev Jan 10, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MessageFormatter for translation including plural logic #3286

Support MessageFormatter for translation including plural logic #3286

maccabeelevine commented Jan 3, 2024 •

edited by demiankatz

demiankatz commented Jan 3, 2024

demiankatz commented Jan 3, 2024

maccabeelevine commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz left a comment

maccabeelevine commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz commented Jan 10, 2024

Support MessageFormatter for translation including plural logic #3286

Support MessageFormatter for translation including plural logic #3286

Conversation

maccabeelevine commented Jan 3, 2024 • edited by demiankatz

demiankatz commented Jan 3, 2024

demiankatz commented Jan 3, 2024

maccabeelevine commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz left a comment

Choose a reason for hiding this comment

maccabeelevine commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz commented Jan 3, 2024

demiankatz commented Jan 10, 2024

maccabeelevine commented Jan 3, 2024 •

edited by demiankatz