New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluralization #14
Pluralization #14
Conversation
Hey @KL-7, thanks for taking this on! My initial concept for this feature was to override the % operator for string interpolation so you could do something like this: replacements = { :horse_count => 3,
:horses => { :one => "is 1 horse",
:other => "are %{horse_count} horses" } }
"there %{horse_count:horses} in the barn" % replacements Here's an alternate way to do it: replacements = { :horse_count => 3,
:horses => { :one => "1 horse",
:other => "%{horse_count} horses" },
:to_be => { :one => "is",
:other => "are" } }
"there %{horse_count:to_be} %{horse_count:horses} in the barn" % replacements Just like we're already doing with most of TwitterCLDR's functionality, we should support this native, Ruby-ish way as well as provide a formatter like you've already described that the % function delegates to: f = TwitterCldr::Formatters::Plurals::PluralFormatter.new("there %{horse_count:horses} in the barn", :es)
f.to_s(replacements) # or f.format(replacements)
One last suggestion: What would you think about also accepting a replacements = { :horse_count => 3,
:horses => { :one => "is 1 horse",
:other => lambda { |context| context[:horse_count] <= 3 ? "are many beautiful horses" : "are many horses" } } }
"there %{horse_count:horses} in the barn" % replacements |
|
||
def format(number, patterns) | ||
rule = Rules.rule_for(number, locale) | ||
pattern = patterns.fetch(rule) { raise ArgumentError.new("Missing pattern for #{rule.inspect}.") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. No, I don't think an error should be thrown here. Instead I think we should leave the original text in the string, so instead of "5 houses"
you get "%{houses_count:houses}"
without any replacements.
However! I've been considering for a while whether to include a global option to raise errors instead of just letting things slide. It really depends on the use case. At Twitter, we would most likely not want TwitterCLDR to raise errors because we occasionally launch features that aren't 100% translated anyway, but other projects (or other companies) might feel differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree on that: it depends on the project. I can easily imagine people that won't be very happy if some untranslated text will get unnoticed into production because of some mistake or typo that was silently skipped by the formatter.
@camertron, thanks for code review and such a detailed response. Can we formalize a bit the pluralization mechanism to make sure I clearly understand your vision? I'll explain how I see it and you let me know if I get smth wrong. When we receive some string for processing we do the following:
As you said throwing exception by default is not desirable, If during the process some required element is not found in the interpolation hash we ignore current interpolation pattern and move to the next one. I definitely like the idea of accepting lambdas as it brings more flexibility, but I think it's not essential for the initial implementation and can be added later. Does that sound good to you? |
@KL-7 yes, that looks great. As a side note that may be helpful, the additional |
@camertron, right, interpolation like |
Oh man, bitten once again by the differences between 1.8 and 1.9! Fortunately we can get around the issue. At the moment, the magic method As a side note, it would be great if, for these localized objects, we could inherit from the original object so callers can perform all the same operations they can on the original object. That's a bit difficult because Ruby doesn't support multiple inheritance, but we should be able to accomplish the same thing by turning |
@camertron, I'm afraid I don't understand how that applies to interpolation with I think the best solution would be to include Unfortunately, I don't see a good place for |
I'd like to try to avoid monkey patching
"There %{horse_count:horses} in the barn".localize % { :horse_count => 3 ... } This still doesn't explain how we ourselves are going to do To recap, here's my opinion:
What do you think? |
I felt that 'overriding String#%' won't be that easy =) You're right, monkey-patching is great, but sometimes it gets messy. I mostly agree with your plan, but have a couple of comments:
|
@camertron, now that things turned that way I start wondering why do we need to care about other kinds of interpolation at all? If we don't override |
@camertron, I updated |
@KL-7, your points are well taken, here are a few additional things to think about:
"there %{horse_count:horses} in the barn, %{user}!".pluralize(:horse_count => count,
:horses => "are %{horse_count}") % { :user => current_user } you can combine the hashes and do this instead: "there %{horse_count:horses} in the barn, %{user}!".pluralize(:horse_count => count,
:horses => "are %{horse_count}",
:user => current_user) We aren't obligated to provide this functionality, but it's certainly nice, and wouldn't take much effort to implement. Besides, we're going to need an interpolation function anyway - why not provide it to everyone? I'm still on the fence with numbers, but if we support everything else, we might as well do them too. After all, it's just a single call to
|
|
||
PLURAL_INTERPOLATION_RE = /%\{(.+?):(.+?)\}/ | ||
|
||
def initialize(locale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to accept a hash of options here and use TwitterCldr.get_locale
if no locale is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see most of the formatters use Formatters::Base#extract_locale
and TwitterCldr.get_locale
(that also takes FastGettext
locale into consideration) is mostly used in localize
methods. Is that on purpose? Would it be more consistent to use extract_locale
here as in other formatters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right you are. extract_locale
is better.
This is looking awesome! Just a few more changes and I'll merge it in. Looks like your most recent commits don't handle the regular interpolation case for |
end | ||
|
||
def interpolate_pattern(pattern, placeholder, number) | ||
pattern.gsub("%{#{placeholder}}", number.to_s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I add interpolation utility function from gettext this line will be replaced with a call to this function.
@camertron, ah, sorry, I forgot to mention that your last arguments did convince me =) I'm going to extract interpolation function from gettext and use it for handling everything that left after our pluralization process. And then I'll add Btw, gettext offers Ruby license or LGPL. Licensing is not my strongest skill, but I can read agreements and learn what is expected from us if we're going to take some code from this project. |
@camertron, two more things. Is it better to delegate to that general interpolation function right inside |
@KL-7 Ok cool, glad we agree! I sincerely appreciate having these thoughtful discussions. OSS licenses generally let anyone use the code, provided you include a copy of the license in your derived work. We can include a copy of the LGPL license in our LICENSE file and specify what parts of the gem it applies to. I don't think there's any difference in calling the general interpolation function before or after |
Hey @camertron, I added Another question arose related once again to the 1.9.3-p125 :001 > '%{name}' % {}
KeyError: key{name} not found Should we mimic that behavior as it's done, e.g., in i18n gem, or should we silently ignore that situation and leave the string unchanged? |
Hmm that's a tough question. In this case, I think we should throw an error to maintain consistency with what's already in place. Anyone who's using this type of interpolation already in their code should be expecting |
@camertron, I agree with you, but it's slightly inconsistent with the way we're treating missing translations, though, as it's a bit different situation, throwing an exception here and not throwing it for missing pluralization rules does make sense to me. What do you think about tests for |
Yes, you're right, there's no need to test all the functionality |
@camertron, I added interpolation function and updated
|
@@ -0,0 +1,52 @@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should probably not be hanging out right inside lib
- consider moving it into a child directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I restructure directories a bit as I suggested here this file will live under lib/twitter_cldr
.
Hey @KL-7 just one comment on your code, but otherwise it's looking darn good.
And finally, to the other topic at hand. When my coworkers and I initially talked about what pluralization implementation would be best for TwitterCLDR, we agreed that the implementation you have created was the right answer. Just yesterday, we had another discussion that has augmented the implementation a bit. The good news is we can keep all of your changes, and simply offer the augmentations as an additional way to write plurals. Here's an example of this second way: 'there %{horse_count("one": "is one horse", "other": "are %{horse_count} horses")} in the barn' % { :horse_count => 3 } This technique unifies the whole sentence together, meaning the translators of this phrase don't have to translate first the whole sentence, then each individual plural rule, which might be confusing. Imagine if you were asked to translate just the string "is one horse" without any context at all. It simply doesn't make sense without the whole sentence. In other languages like Japanese, you might even want to put the plural in an entirely different place. Finally, it's easier on the programmer who won't have to build a hash with the correct options for the current language. In some projects, however, it might be easier for the programmer to specify a hash. Imagine, for example, a project that isn't translated. The programmer would have no objections to supplying a hash with Lastly, notice how I've used JSON to represent the plural data. I wanted to make it as easy as possible to parse using a standard format. It still might be tricky, and we can definitely talk about it. What are your thoughts? |
@camertron, I added license information and some notes regarding the code adapted from i18n and gettext gems. Please, check it out and let me know if there's smth I should change. There are some comment from me:
|
@KL-7 NOTICE looks good, thanks for adding the additional licensing text.
'there %|{ "horse_count": { "one": "is one horse", "other": "are %{horse_count} horses" } }| in the barn' Finally, we should make this a separate pull request instead of trying to fit too much into this one. Go ahead and put whatever polish you'd like on this PR and I'll happily merge it in. Also, do you really need to rebase? Might be nice to keep your commit history intact. |
|
|
@camertron, I rebased pluralization branch against master, cleaned up commits history a bit and added comments for interpolation and pluralization methods. I'm pretty much satisfied with this PR now. If you feel the same, you can merge it. |
Pluralization support, phase 1.
I tried to lay the ground for pluralization formatter. It's just a beginning and not yet ready for merging. This PR is more for discussion as I already have a couple of questions:
Currently number and the noun in the correct pluralization form are simply joined, but I'm going to make
format
method accepting real patterns where the number will replace some placeholder so the words order and phrasing can be adjusted by passing a language-specific patterns hash. After that is done instead of callingone would do smth like
I assume that for this purpose existing tokenizers can be used, but I need to dig into that a bit more. What I want to know now is whether the overall idea is right.
Is raising an exception when required pattern is not found is a good idea? I'm asking because I don't see much exceptions across the project. Should some default value or
nil
be returned instead in that case?Interface with
format
method accepting a number and a hash of patterns is a good start, but if someone need to format phrases with the same word (and therefore the same patterns hash) over and over it might be annoying to pass this hash every time or to store it somewhere outside of formatter object.If, e.g., someone needs to format different amounts of hours across the project it might be easier to setup formatter with the proper patterns in constructor and then pass to
format
method only a number. Drawback of this solution is that it requires one formatter for every word that needs to be formatted and it doesn't sound good if someone wants to pluralize a lot of different words simultaneously.What if we provide some interface for setting up formatter with a specific locale and a dictionary of words for every one of which user provides a patterns hash? Then he can use it like
f.format(1, 'hour')
and the formatter will find proper set of patterns for this word, pluralization rules for the number, choose pattern for this rule, and format the final phrase.May be the interface won't be exactly like that but I'd be happy to hear your thoughts on that.