Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localization #124

Merged
merged 6 commits into from Aug 24, 2019

Conversation

@cmyr
Copy link
Member

commented Aug 20, 2019

This adds basic localization, based on Fluent.

tldr;

We have added a LocalizedString type. To convert this to an actual string, we pass in the Env and the Data; this will lazily compute the concrete value for this string, for the current locale.

As a demo, LocalizedString can now be used in Label.

This is fairly rough, but is also not cutting corners; the strategy employed here is a reasonable long-term approach to localization.

There are a few things that could be refined:

  • lenses a localized string can accept arguments, such as when you want to format a number. Currently you provide arguments indirectly, by providing a closure that can generate the required value from the current data and environment. I think this is an okay initial solution, but this is another place where lenses could really shine; in particular it would let us invalidate strings more efficiently.

  • Env: I've jammed a bunch of new stuff in here, via a new L10nManager struct. I'm not sure if this should live in Env or somewhere else.

  • resource bundles: The main limitation of this right now is that there is no system for packaging a druid application along with other necessary resources, so this will only work correctly for examples that are run from the druid root directory. We use include_str! to include the en_US localization with the app, so that we have a reasonable fallback.

@cmyr cmyr force-pushed the l10n branch from 07308fe to 7d32b8a Aug 20, 2019

@raphlinus

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

Before getting to the code, the bloat regression:

Before After
Windows debug build time 25s 40s
Windows release build time 29s 85s
Windows release exe size 372k 640k
macOS debug build time 16s 49s
macOS release build time 21s 126s
macOS release exe size 600k 912k

Methodology is building the caclulator example starting with an empty target subdir, but with the cargo caches warmed.

The blog post on Rust bloat is coming. I'm not going to block this PR on that though.

@raphlinus
Copy link
Member

left a comment

Generally this looks pretty good to me, a number of specific suggestions inline.

///
/// [Unicode language identifier]: https://unicode.org/reports/tr35/#Unicode_language_identifier
pub fn get_locale() -> String {
//TODO ahem

This comment has been minimized.

Copy link
@raphlinus

raphlinus Aug 20, 2019

Member

I believe the correct API here is GetUserPreferredUILanguages function, but am not 100% sure. This function returns a list of languages, but in my testing I was only able to get one. For what we have now, I think just grabbing the first language is good.

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

This would be annoying for me to do without access to a windows machine, so I'm going to leave this stubbed out for now.

(I'll open an issue when this goes in)

druid-shell/src/windows/util.rs Outdated Show resolved Hide resolved
src/env.rs Outdated Show resolved Hide resolved
src/localization.rs Outdated Show resolved Hide resolved
src/localization.rs Outdated Show resolved Hide resolved
src/widget/button.rs Outdated Show resolved Hide resolved
Localized(LocalizedString<T>),
Literal(String),
}

/// A label with static text.

This comment has been minimized.

Copy link
@raphlinus

raphlinus Aug 20, 2019

Member

This description feels a little stale, it's not necessarily static. And I think the distinction between this an DynLabel is getting fuzzier, not sure we still need both.

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

Yea, it's interesting. DynLabel is definitely more flexible, but how useful that flexibility is in practice is hard to say.

src/widget/button.rs Outdated Show resolved Hide resolved
/// environment and data.
pub fn resolve<'a>(&'a mut self, env: &Env, data: &T) -> &'a str {
//TODO: this recomputes the string if either the language has changed,
//or *anytime* we have arguments. Ideally we would be using a lens

This comment has been minimized.

Copy link
@raphlinus

raphlinus Aug 20, 2019

Member

I don't think we have to do anything with lenses here, this logic can piggy back on just the changes to T. The way to do that is to have paint call a method that resolves the string iff the resolution hasn't been done yet, otherwise just fetches the resolution, then have update recompute the string. As I said below, even better is if the "recompute" method lets us decide whether to propagate an invalidation. Just redoing the string every time T changes seems cheap enough to me.

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

My thinking here is that with something like lenses, you could indicate per-arg where in the data tree it comes from, and then we could stash that and do better invalidation automatically by comparing whether those items have changed. I'm not sure how much of a saving this this, but would definitely be a handful fewer allocations per localized string, per update.

key: &'static str,
placeholder: Option<&'static str>,
args: Option<HashMap<&'static str, ArgSource<T>>>,
resolved: Option<String>,

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

Please, note that Fluent supports compound messages which are value + attributes. This is particularly useful for UI toolkits which may use multiple related messages to localize a single UI widget.

See for example main Firefox menubar - https://searchfox.org/mozilla-central/source/browser/locales/en-US/browser/menubar.ftl or Preferences file - https://searchfox.org/mozilla-central/source/browser/locales/en-US/browser/preferences/preferences.ftl

And DOM - https://searchfox.org/mozilla-central/source/browser/base/content/browser-menubar.inc and https://searchfox.org/mozilla-central/source/browser/components/preferences/in-content/preferences.xul

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

Right: attributes were something that I didn't really understand, and so had initially ignored.

Is the relationship between value and attributes an 'either' relationship? Can a message have no value and no attributes? Can a message have both?

In terms of the design of this API, I could imagine us translating attributes into something like a LocalizationGroup that contains all the strings for a given resource; I'll look into this when I do the work on window menus, since they want both a title and a tooltip string.

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

Can a message have no value and no attributes? Can a message have both?

Yes, Yes.

In terms of the design of this API, I could imagine us translating attributes into something like a LocalizationGroup that contains all the strings for a given resource;

Why not just use FluentMessage returned by get_message? https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/src/bundle.rs#L28

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

Thank you very much for your review and attention here, @zbraniecki, it's very appreciated.

Tried finding this in the docs, without luck: under what circumstance would a message not have a value? What are the implications of this?

And for attributes, after I've called get_message I call format_pattern for each attribute, passing in the same FluentArgs each time?

I don't have a very clear vision of this LocalizationGroup type in my head, I'm just imagining ways to let application developers indicate that some widget references known attributes of some message.

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

Tried finding this in the docs, without luck: under what circumstance would a message not have a value? What are the implications of this?

That depends on bindings. For DOM scenario, we use value to indicate the content of the element, and `attributes for attributes.

So:

<p data-l10n-id="key1"/>
<menuitem data-l10n-id="key2"/>
key1 = Hello World
key2 =
    .label = My Menu Item
    .accesskey = M
    .tooltip = Open { key2.label }

will result in:

<p data-l10n-id="key1">Hello World</p>
<menuitem data-l10n-id="key2"
                    label="My Menu Item"
                    accesskey="M"
                    tooltip="Open My Menu Item"/>

key1 has only a value, key2 has only attributes. There are cases where the message will have both, but it has to have at least one or the other for the parser to accept it.

And for attributes, after I've called get_message I call format_pattern for each attribute, passing in the same FluentArgs each time?

Yep! That's how you'd format a full message :)
In my POC of a patch to swap JS for Rust in Gecko - https://phabricator.services.mozilla.com/D36286

there's (sth similar to, but with ugly C FFI bindings):

pub fn fluent_bundle_compound(bundle: &FluentBundleRc, id: &nsAString, args: &FluentArgs) -> LocalizationMessage {
    if let Some(message) = bundle.get_message(id.to_string().as_str()) {
        let mut errors = vec![];
        let value = message.value.map(|v| bundle.format_pattern(v, args, &mut errors);
        let mut attributes = vec![];
        for (name, value) in message.attributes {
            let val = bundle.format_pattern(value, args.as_ref(), &mut errors);
            attributes.push((name, val));
        }
        return LocalizationMessage { value, attributes };
    }
}

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

okay that's very helpful, thanks!

My take away from this is that we need to also define the expected structure of our user's ftl files; we could say for the time being that all messages must have a value, and that attributes are not supported, and then we could revisit this down the road.

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

My take away from this is that we need to also define the expected structure of our user's ftl files;

I don't understand why would you do that, but your call :)

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

It's very possible that my thoughts here will change as my understanding improves. :)

if let Some(res) = self.resources.get(&path) {
res.clone()
} else {
let string = fs::read_to_string(&path).unwrap_or_else(|_| FALLBACK_STRINGS.to_string());

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

The way you wrote the fallback here will only use the fallback in case of a missing file. Much more common scenario where we lazily iterate to fallback is when a string is missing or there are runtime errors during resolution of a string.

That's why we use Localization API with cachediterator and return an iterator out of the L10nManager.

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

Okay, thanks for pointing that out, I think I was confused. My L10nManager is intended to be at about the level of fluent_fallback::Localization; it keeps a stack of bundles for the current locale, localizes messages with that stack, and rebuilds that stack when the locale changes. I'll flesh this out a bit more and see how well I'm understanding things.


let labeltext =
LocalizedString::new("hello-counter").with_arg("count", |_env, data: &u32| (*data).into());
let label = Label::new(labeltext);

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

What I would recommend here is to design the fluent-druid binding that allows for L10nId/L10nArgs to be passed to the label, rather than eagerly resolved into a string that is passed to the label.

The way it is done now resembles the pre-fluent paradigm in Firefox and it eventually led to developers resolving strings very deep in their code and then carrying around data with resolved strings up to the UI.
This is counter-productive (if the widget doesn't get displayed, why localize it?), and it locks a lot of language selection logic deep in the imperative code without any means to "retranslate" on fly.

What I'd suggest here is sth like:

let label = Label::new();
label.set_l10n_attributes("hello-counter", fluent_args!["count" => counter);

...

fn update_counter() {
  counter += 1;
  label.set_l10n_attributes("hello-counter", fluent_args!["count" => counter);
}

This will bring this API much closer to how Fluent paradigm operates and in the future will allow you to bring a lot of features like runtime pseudolocales, l10n cycle indepenent of release cycle and runtime locale switching.

It also allows your layout/drawing part handle the actual localization in batches, right before the next paint, rather than synchronously string-by-string.

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

I think what this is doing is close to what you describe? It probably isn't clear from this example how druid works, but this is just describing how to get the arguments required from the application state as needed.

Basically the label knows the fluent 'message' it is supposed to display, and for each arg required by that message, the label has a closure that can extract the value from the application state.

When the application state changes, we will automatically update the localization with the new data, and then invalidate and paint only if it has changed. Does that sound close to your suggestion?

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

When the application state changes, we will automatically update the localization with the new data, and then invalidate and paint only if it has changed.

Ahhh! That makes sense!

Does that sound close to your suggestion?

Yes!

So, to stay closer to Fluent terminology, I'd name your LocalizationString a LocalizationMessage (for it may have more than one string in compound case), and the rest makes sense now. You pass it to Label and then before painting the Label resolve its localization. That looks good!

The one interesting piece you seem to be aiming for is something we played with as "responsive localization" where a localization is hooked to potential changes in the data state and react with retranslation.

Our main focus at the time was mobile UI l10n and we wanted to react to functions, so we had sth like:

preferences = { SCREEN_WIDTH() ->
    [narrow] Security
   *[medium] Privacy & Security
    [wide] Privacy & Security Settings
}

here's a video: https://www.youtube.com/watch?v=V5K6YN7MMOA

We ended up finding this model very complicated on the Fluent bindings side and we couldn't justify the complexity at the time.
I still am intrigued by it, so if you think you can get it working, I'd say go for it!

I would still recommend having a "non-automagic" approach available. In other words, it should be possible to do:

label.set_l10n(l10n_id, l10n_args);

and have the label retranslate.

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

So, to stay closer to Fluent terminology, I'd name your LocalizationString a LocalizationMessage (for it may have more than one string in compound case), and the rest makes sense now.

Yea, this is much closer to a fluent 'message' than anything. I'm not sure how much of fluent's naming we want to surface through our API, but at the very least this relationship should be clarified in our docs.

This also relates to the LocalizationGroup I was mentioning earlier; my thinking was that it might make sense to separate the compound from the non-compound case, because it could let us provide simpler APIs to our users.

The one interesting piece you seem to be aiming for is something we played with as "responsive localization" where a localization is hooked to potential changes in the data state and react with retranslation.

Our main focus at the time was mobile UI l10n and we wanted to react to functions, so we had sth like:

...

here's a video: https://www.youtube.com/watch?v=V5K6YN7MMOA

Ah, very cool. We may definitely want to explore something like this, given our cross-platform aspirations. For the time being, the use case is just things like the following, where a button increments a counter and that automatically updates a label:

2019-08-21 13 17 07

but there's obviously room for more sophistication here down the road.

I would still recommend having a "non-automagic" approach available. In other words, it should be possible to do:

label.set_l10n(l10n_id, l10n_args);

and have the label retranslate.

We do expose an API that let's you pass a msg + args directly to the 'current bundle stack' and just get a string back; this will need some tweaking in light of attributes, but is at least a fallback option, but we're going to strongly encourage using abstractions like this LocalizedString; we'll see how far that gets us, I guess, and if it just isn't flexible enough we can revisit.

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

We do expose an API that let's you pass a msg + args directly to the 'current bundle stack' and just get a string back;

That's not what I suggested. This is an anti-pattern in the Fluent world.

I suggest an ability to update the id/args combo for a given UI widget.

A litmus test for when it works is:

let msg = LocalizedMessage::new("hello-counter");
let label = Label::new(msg);

fn update_label(label: &Label) {
  let l10n_id = if (User::is_logged_in()) {
     "welcome-back"
  } else {
     "welcome-new-user"
  };
  let msg = LocalizedMessage::new(l10n_id);
  label.set_l10n(msg);
}

This makes a clear distinction between what the engineer does (maintain the id/args status of all of the UI) from what the bindings do (they can iterate over the whole UI and retranslate at any point, or localize only right before layout etc.)

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

Our label type is parameterized over its state, so for the time being I'd like to see if we can enforce the idea that you don't set args directly, you just describe how to get them. I think I understand your point, though, which is that depending on state you might want to use completely different messages. One possible solution that fits this model is that you could set the message conditionally based on some attribute in the state, but have this be declarative.

In any case thanks for the clarification, I definitely have some more thinking to do here!

This comment has been minimized.

Copy link
@zbraniecki

zbraniecki Aug 21, 2019

One possible solution that fits this model is that you could set the message conditionally based on some attribute in the state, but have this be declarative.

Hmm, that may work, but it feels very complicated. As I said, our experience with trying to do stuff reactively on the state, led us to sticking to simple one-way assignments, and postponing the reactive magic for later.

Based on that experience I think it would be reasonable to provide the same basic functionality for setting/updating the l10n annotation of UI widgets independently of the state-reacting flow.

In @fluent/dom we provide setAttributes method to do so - https://github.com/projectfluent/fluent.js/blob/master/fluent-dom/src/dom_localization.js#L87 and you can see how it's used all around Firefox UI - https://searchfox.org/mozilla-central/search?q=setAttributes&path=.js

I think you can build the state-reactive model on top of the simple model that we provide, but not the other way around.

This comment has been minimized.

Copy link
@cmyr

cmyr Aug 21, 2019

Author Member

I think I see what you're saying, and I can totally imagine having an additional API that just takes args directly; this would be helpful in circumstances where for some reason there is some state that isn't in the explicit state object, which is possible.

cmyr added a commit that referenced this pull request Aug 21, 2019
Fallback, better invalidation, and fixups
This is in response to the discussion at #124.
@zbraniecki

This comment has been minimized.

Copy link

commented Aug 21, 2019

@raphlinus I opened projectfluent/fluent-rs#139 to investigate reducing the size of fluent-bundle crate. I completely agree with your blog post and consider fluent-bundle to be such foundational crate.

@sparecycles

This comment has been minimized.

Copy link

commented Aug 21, 2019

This PR introduces second versions of proc-macro2, quote, syn, and unicode-xid. Relevant to the concerns in the blog?

I sometimes hear that it’s ok to depend on commonly-used crates, because their cost is amortized among the various users that share them. I’m not convinced, for a variety of reasons. For one, it’s common that you get different versions anyway.

@Eijebong

This comment has been minimized.

Copy link

commented Aug 22, 2019

You can probably avoid the syn/quote/proc-macro2/unicode-xid dupe by running cargo update -p proc-macro-hack --precise 0.5.8. That'll help with compile times ;)

cmyr added a commit that referenced this pull request Aug 22, 2019
Fallback, better invalidation, and fixups
This is in response to the discussion at #124.

@cmyr cmyr force-pushed the l10n branch from 465065e to 1bbca8a Aug 22, 2019

cmyr added a commit that referenced this pull request Aug 22, 2019
Fallback, better invalidation, and fixups
This is in response to the discussion at #124.

@cmyr cmyr force-pushed the l10n branch from 1bbca8a to f808182 Aug 22, 2019

@raphlinus
Copy link
Member

left a comment

Looks good. I think the invalidation is basically correct, but we might want to fine-tune it when we wire up finer grain invalidation logic, to make sure it's both minimal and correct. Thanks!

cmyr added a commit that referenced this pull request Aug 22, 2019
Fallback, better invalidation, and fixups
This is in response to the discussion at #124.

@cmyr cmyr force-pushed the l10n branch from f808182 to 050c6a2 Aug 22, 2019

cmyr added a commit that referenced this pull request Aug 23, 2019
Fallback, better invalidation, and fixups
This is in response to the discussion at #124.
Fallback, better invalidation, and fixups
This is in response to the discussion at #124.

@cmyr cmyr force-pushed the l10n branch from 050c6a2 to f68558f Aug 24, 2019

@cmyr

This comment has been minimized.

Copy link
Member Author

commented Aug 24, 2019

Okay: I think there's more work to do here at some point but I'm happy to get this in as a checkpoint, we can always iterate. @Eijebong thanks for the find, is saving me 10s on a clean debug build. 😁

@cmyr cmyr merged commit 3112076 into master Aug 24, 2019

3 checks passed

xi-editor.druid Build #20190824.3 succeeded
Details
xi-editor.druid (mac) mac succeeded
Details
xi-editor.druid (windows) windows succeeded
Details

@cmyr cmyr deleted the l10n branch Aug 24, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.