Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure localized strings (RFC) #1048

Merged
merged 34 commits into from Mar 2, 2023

Conversation

richardolsson
Copy link
Member

@richardolsson richardolsson commented Feb 23, 2023

Description

This PR is a proof-of-concept and request-for-comments proposal for a new way to structure and handle internationalization.

Some terminology

These terms are used in the description of this PR (and in the code).

  • Internationalization, or i18n, refers to the process of making a codebase "translatable", in this case using messages(), <Msg/> and useMessages() is the process of internationalizing
  • Localization, or l10n, refers to the process of translating an internationalized codebase. This PR does not really change the localization at this point, but uses the same YAML files as before for that.
  • A message is an internationalized string with some ID to identify it
  • A localized message is a translated string, to replace some message in the code
  • String interpolation is the mechanism by which a string is created from variables, e.g. using template ${strings}, using 'concatenated' + ' ' + 'strings' or any other mechanism
  • The ICU MessageFormat is a standard for how to write localized messages. It's the syntax that we use for interpolating localized messages using simple variables and also more complex things like pluralization (e.g. {orgCount, plural, =0 {No organizations} =1 {One organization} other {# organizations}})
  • The Intl API is a browser and Node API used for localization. It uses the ICU MessageFormat standard, and is used behind the scenes by the react-intl library we used for internationalization

Benefits, in short

Here is a short list of benefits (compared to the old system). See details in the sections below.

  • Message IDs are type-safe – no more misspelling 'long.id.strings'
  • Values for interpolated messages are type-safe – no more forgetting values={{ foo: 'bar' }}
  • Default (English) strings are compiled into the program – no more ugly.effing.id.string rendered to the user
  • Because the messages are code, the (default) strings hot-reload when changed
  • Because the messages are code, we can analyze it to gain insights

Defining messages (and default strings)

With the old format there was no reliable source of truth for what messages (should) exist. You could refer to a message in code by the string 'some.nice.id' and the program would expect it to exist. But you could also define a localized string in a YAML file in a way that translates to the ID 'some.other.id'. In this case, do you have two different messages (one missing, one unused) or a single one which was misspelled?

With this PR, the source of truth for which messages exist is the messages.ts file in each feature. Messages are defined using the messages() function, along with a default (English) text. Here's an example from the migrated search feature:

import { m, messages } from 'core/i18n';

export default messages('feat.search', {
  /* … */
  placeholder: m('Type to search'),
  results: {
    campaign: m('Campaign'),
    /* … */
  },
});

The messages can then be referenced, not by their string ID, but by their actual typescript identifier/name, i.e. messages.results.campaign. This means that they can never be misspelled, and also that we get some extra type safety.

Type-safe interpolation values

The ICU MessageFormat has support for variables, that are (in their simplest form) referenced in localized messages using a {placeholder}. A property with the correct name (in this case placeholder) must be passed to the relevant Intl method (generally Intl.formatMessage()) or an error is thrown.

In the old solution, accidentally forgetting to pass values, or passing values by the wrong names was a very simple mistake to make. Especially when refactoring.

With this PR, the shape of the values passed to a message is made type-safe, by defining the shape at the time of defining the message, using im() (for "Interpolated Message") instead of the simpler m() function. Below is an example from the migrated tags feature:

import { im, messages } from 'core/i18n';

export default messages('feat.tags', {
  dialog: {
    /* … */
    groupCreatePrompt: im<{ groupName: string }>('Add "{groupName}"'),
    /* … */
});

Trying to use the dialog.groupCreatePrompt message anywhere in the code will only work if the groupName property is supplied with a string value. This is enforced by the <Msg/> component and the useMessages() hook.

The <Msg/> component

The new <Msg/> component is inspired by the <FormattedMessage/> component from react-intl (and how we've traditionally sometimes abbreviate it as <Msg/>).

The <Msg/> component can be used in two ways, for plain messages and interpolated messages respectively:

import { im, m, messages, Msg  } from 'core/i18n';

const definedMessages = messages({
  hello: im<{ name: string }>('Hello {name}!'),
  goodbye: m('Goodbye!')
});

// Will work (no values required):
<Msg id={definedMessages.goodbye}/>

// TypeScript will complain (no values allowed)
<Msg id={definedMessages.goodbye} values={{ foo: 'bar' }}/>

// TypeScript will complain (missing values)
<Msg id={definedMessages.hello}/>

// TypeScript will complain (values have wrong shape)
<Msg id={definedMessages.hello} values={{ foo: 'bar' }}/>

// Will work (values provided with correct shape)
<Msg id={definedMessages.hello} values={{ name: 'Clara' }}/>

// TypeScript will complain (unknown, misspelled message)
<Msg id={definedMessages.helo}/>

The <Msg/> component will render a fragment with the formatted string in it. No extra DOM elements will be created.

The useMessages() hook

A very common scenario is wanting to use a message (for internationalization) but not wanting to render it as a React element, e.g. to use it as the value of a string property or send it to the API. For those scenarios we've traditionally used useIntl(), which returns a reference to Intl, and then done intl.formatMessage({ id: 'some.id' }).

This has the obvious issues of using string IDs, but has the added annoyance of requiring an object as it's first parameter (not just an ID string) and another object for it's values, often resulting in expressions that look like this to simply format a string with some value:

intl.formatMessage(
  { id: 'misc.tags.tagManager.addValue' },
  { tag: selectedValueTag.title }
)

The new useMessages() hook is an alternative to this, which takes a map of messages (from messages()) and creates a new map where every message is a function that can be called to get the localized string. It employs the same measures of type-safety as <Msg/>, and can be used like this:

import { im, m, messages, useMessages  } from 'core/i18n';

const definedMessages = messages({
  hello: im<{ name: string }>('Hello {name}!'),
  goodbye: m('Goodbye!')
});

const funcs = useMessages(definedMessages);

// Will work (no values required)
console.log( funcs.goodbye() );

// Will work (values correct)
console.log( funcs.hello({ name: 'Clara' }) );

// Typescript will complain (same reasons as <Msg/>)
console.log( funcs.goodbye({ foo: 'bar' }) );
console.log( funcs.hello() );
console.log( funcs.hello({ foo: 'bar' }) );
console.log( funcs.helo() );

Let's compare that old intl.formatMessage() with the new expression:

// Old:
const intl = useIntl();
intl.formatMessage(
  { id: 'misc.tags.tagManager.addValue' },
  { tag: selectedValueTag.title }
)

// New:
const msgs = useMessages(messages);
msgs.manager.addValue({ tag: selectedValueTag.title })

Localizing (defining translations)

Messages are still localized in YAML files. I have not at this point made any changes to where they are defined as that's a separate (but related) matter from the type-safe internationalization framework.

A message defined with messages('feat.tags', { title: 'Tags' }) could be translated to Swedish in a YAML file at src/locale/feat/tags/sv.yml as title: Etiketter, or the file could be at src/features/tags/locale/sv.yml, or according to some other pattern.

That's something we need to decide next.

Loading translations

In the current codebase, localized messages are read from the YAML files by the server, and then served alongside other data as part of the page load. This means that every page needs to define what messages are needed, which is done using the localeScope option for scaffold().

This PR doesn't really change this, except in one way. Default (English) translations are now included as part of the message definitions (m('This is the default')) which means that if there is something preventing the localized (e.g. Swedish) messages from loading, usually programmer mistake, there will absolutely always be an English fallback, because if the ID was referenced in the code, it will have been compiled into the javascript.

This means that the "Missing message" errors will no longer happen, and the message.id.strings that would previously render in the UI will instead always fall back to English.

Because all messages pass through a single point at runtime (messages()) it's possible to aggregate while rendering a page what subsets of messages are likely needed. This could (I'm not sure) allow us to flag them as needed and load them, without having to specify localeScope.

The type safety and single source of truth also opens up for some creative ways of solving this and other issues using tooling.

Tooling potential

Now that messages are code, and there is a single source of truth for what messages should exist, that opens up for some interesting tooling potential. Here are some ideas.

NOTE: I have made some proof of concept experiments (not in this PR) using ts-morph to verify that the following ideas are viable. ts-morph is a TypeScript AST API, which basically lets us inspect typescript code programmatically, allowing us to build tools like these.

  • A linting tool that could run in CI and prevent any merges that includes messages that are not used anywhere (made possible by the fact that messages are code, along with the ts-morph equivalent of right-clicking a message in VSCode and selecting "Find references")
  • A linting tool that could run in CI and flag if there are messages referenced under a page, that aren't specified in the localeScope for that page (and hence will not be loaded). This could be made obsolete by an even smarter runtime loader, which may or may not be possible.
  • A tool to find any messages in YAML that don't correspond to any messages in code, i.e. stale old translations that should be deleted
  • A tool to generate en.yml files from the messages, which we could use to generate the files that are needed for our translate.zetkin.org interface to work.

Re: Dates, numbers etc

Nothing changes for numbers, dates, etc. This PR only concerns "messages", i.e. predefined strings – how to define them, reference them etc. That's where we've been having problems.

For dates, numbers etc, the react-intl components (e.g. <FormattedDate/>) work fine. In some cases we've wrapped them (e.g. <ZUIRelativeTime/>). In other cases where dates and numbers are concerned, they are actually inside messages and can be localized using MessageFormat (e.g. {orgCount, number}).

Screenshots

None

Changes

  • Adds a new set of modules under core/i18n
    • Helper functions for defining messages, messages(), m() and im()
    • A <Msg/> component inspired by <FormattedMessage/> but type-safe
    • A useMessages() component, designed as a type-safe replacement for most of our use-cases for useIntl()
  • Specifies messages and migrates all components in the search feature
  • Specifies messages and migrates all components in the tags feature
  • Deletes unused YAML files
  • Changes tests so that they use the new (per-feature) IDs

Notes to reviewer

I would like to discuss the following questions:

  • Is this a meaningful improvement? Please let me know what you think in general!
  • Is YAML still the way to go for localized messages?
  • Where should we put the localized messages? In the feature directories (e.g. src/features/tags/locale/sv.yml), or still globally (e.g. src/locale/feat/tags/sv.yml)?
  • What patterns should we use for naming the return value of useMessages()?
    • Now I usually do const msg = useMessages(messages) but I don't love it, because msg implies a single message when it's actually a map of all messages.
    • Maybe import messages.ts as messageIds instead, and then do const messages = useMessages(messageIds)?
  • Should we keep the old pattern for unit tests, where strings are rendered as IDs, and referenced as such in the test code? This was because we didn't have the messages loaded and had no fallbacks, but now we could if we wanted use the fallbacks text in tests.

Related issues

Related to #998

@richardolsson
Copy link
Member Author

richardolsson commented Feb 23, 2023

Actions from discussion with team today:

  • Rename messages() to makeMessages()
  • Try to merge m() and im() as a single overloaded m() (if possible)
  • Establish a pattern for useMessages() that looks like this: const messages = useMessages(messageIds)
  • Add new getByMsgId() test utility, to retrieve from render() by type safe message ID instead of string

@richardolsson
Copy link
Member Author

I have taken the actions decided at yesterday's meeting. It is unfortunately not possible to use m() both for interpolated messages and plain messages, because there is nothing differentiating them (at the time of calling the function) except the expected return type, so there is nothing for the implementation to use when deciding what type to return.

I will proceed next to migrate the entire codebase to use the new pattern.

@richardolsson
Copy link
Member Author

Some updates:

  • I have replaced all uses of <FormattedMessage> and useIntl() with the new <Msg> and useMessages() solutions.
  • I have removed all obsolete YAML files
  • I have NOT significantly restructured the internal structure of message maps but mostly just transferred the structure of the YAML files to the new object maps. The type safety means this can be safely done later.

News:

  • I modified makeMessages() to always add a global map to contain some translations that are used often enough that it feels fair to always include them, e.g. personFields. I will comment on a line where this is used (see below).
  • I created a getServerMessages() function that imitates useMessages() but allows messages to be used server-side, e.g. in download/export endpoints or RPC calls. Will comment on a line where it's used (see below).

action={
<FormattedMessage id={`zui.accessList.roles.${item.role}`} />
}
action={<Msg id={messageIds.global.roles[item.role]} />}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of the new global map that is always included in the map returned by makeMessages()


const newView = await apiClient.post<ZetkinView, ZetkinViewPostBody>(
`/api/orgs/${orgId}/people/views`,
{
folder_id: folderId || undefined,
title: messages['misc.views.newViewFields.title'],
title: messages.newViewFields.title(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New getServerMessages() in action here.

@richardolsson richardolsson linked an issue Feb 27, 2023 that may be closed by this pull request
2 tasks
@richardolsson
Copy link
Member Author

I actually managed to solve the issue of overloading m() to work as im()! See commit 4f61365 for how I did it.

Copy link
Contributor

@rebecarubio rebecarubio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You´re making the line skyrocketing! Wohoooo! 🚀

@richardolsson richardolsson merged commit 11b600c into main Mar 2, 2023
@richardolsson richardolsson deleted the issue-998/restructure-localized-strings branch March 2, 2023 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restructure localized strings
2 participants