Restructure localized strings (RFC) #1048

richardolsson · 2023-02-23T09:01:16Z

Description

This PR is a proof-of-concept and request-for-comments proposal for a new way to structure and handle internationalization.

Some terminology

These terms are used in the description of this PR (and in the code).

Internationalization, or i18n, refers to the process of making a codebase "translatable", in this case using messages(), <Msg/> and useMessages() is the process of internationalizing
Localization, or l10n, refers to the process of translating an internationalized codebase. This PR does not really change the localization at this point, but uses the same YAML files as before for that.
A message is an internationalized string with some ID to identify it
A localized message is a translated string, to replace some message in the code
String interpolation is the mechanism by which a string is created from variables, e.g. using template ${strings}, using 'concatenated' + ' ' + 'strings' or any other mechanism
The ICU MessageFormat is a standard for how to write localized messages. It's the syntax that we use for interpolating localized messages using simple variables and also more complex things like pluralization (e.g. {orgCount, plural, =0 {No organizations} =1 {One organization} other {# organizations}})
The Intl API is a browser and Node API used for localization. It uses the ICU MessageFormat standard, and is used behind the scenes by the react-intl library we used for internationalization

Benefits, in short

Here is a short list of benefits (compared to the old system). See details in the sections below.

Message IDs are type-safe – no more misspelling 'long.id.strings'
Values for interpolated messages are type-safe – no more forgetting values={{ foo: 'bar' }}
Default (English) strings are compiled into the program – no more ugly.effing.id.string rendered to the user
Because the messages are code, the (default) strings hot-reload when changed
Because the messages are code, we can analyze it to gain insights

Defining messages (and default strings)

With the old format there was no reliable source of truth for what messages (should) exist. You could refer to a message in code by the string 'some.nice.id' and the program would expect it to exist. But you could also define a localized string in a YAML file in a way that translates to the ID 'some.other.id'. In this case, do you have two different messages (one missing, one unused) or a single one which was misspelled?

With this PR, the source of truth for which messages exist is the messages.ts file in each feature. Messages are defined using the messages() function, along with a default (English) text. Here's an example from the migrated search feature:

import { m, messages } from 'core/i18n';

export default messages('feat.search', {
  /* … */
  placeholder: m('Type to search'),
  results: {
    campaign: m('Campaign'),
    /* … */
  },
});

The messages can then be referenced, not by their string ID, but by their actual typescript identifier/name, i.e. messages.results.campaign. This means that they can never be misspelled, and also that we get some extra type safety.

Type-safe interpolation values

The ICU MessageFormat has support for variables, that are (in their simplest form) referenced in localized messages using a {placeholder}. A property with the correct name (in this case placeholder) must be passed to the relevant Intl method (generally Intl.formatMessage()) or an error is thrown.

In the old solution, accidentally forgetting to pass values, or passing values by the wrong names was a very simple mistake to make. Especially when refactoring.

With this PR, the shape of the values passed to a message is made type-safe, by defining the shape at the time of defining the message, using im() (for "Interpolated Message") instead of the simpler m() function. Below is an example from the migrated tags feature:

import { im, messages } from 'core/i18n';

export default messages('feat.tags', {
  dialog: {
    /* … */
    groupCreatePrompt: im<{ groupName: string }>('Add "{groupName}"'),
    /* … */
});

Trying to use the dialog.groupCreatePrompt message anywhere in the code will only work if the groupName property is supplied with a string value. This is enforced by the <Msg/> component and the useMessages() hook.

The `<Msg/>` component

The new <Msg/> component is inspired by the <FormattedMessage/> component from react-intl (and how we've traditionally sometimes abbreviate it as <Msg/>).

The <Msg/> component can be used in two ways, for plain messages and interpolated messages respectively:

import { im, m, messages, Msg  } from 'core/i18n';

const definedMessages = messages({
  hello: im<{ name: string }>('Hello {name}!'),
  goodbye: m('Goodbye!')
});

// Will work (no values required):
<Msg id={definedMessages.goodbye}/>

// TypeScript will complain (no values allowed)
<Msg id={definedMessages.goodbye} values={{ foo: 'bar' }}/>

// TypeScript will complain (missing values)
<Msg id={definedMessages.hello}/>

// TypeScript will complain (values have wrong shape)
<Msg id={definedMessages.hello} values={{ foo: 'bar' }}/>

// Will work (values provided with correct shape)
<Msg id={definedMessages.hello} values={{ name: 'Clara' }}/>

// TypeScript will complain (unknown, misspelled message)
<Msg id={definedMessages.helo}/>

The <Msg/> component will render a fragment with the formatted string in it. No extra DOM elements will be created.

The `useMessages()` hook

A very common scenario is wanting to use a message (for internationalization) but not wanting to render it as a React element, e.g. to use it as the value of a string property or send it to the API. For those scenarios we've traditionally used useIntl(), which returns a reference to Intl, and then done intl.formatMessage({ id: 'some.id' }).

This has the obvious issues of using string IDs, but has the added annoyance of requiring an object as it's first parameter (not just an ID string) and another object for it's values, often resulting in expressions that look like this to simply format a string with some value:

intl.formatMessage(
  { id: 'misc.tags.tagManager.addValue' },
  { tag: selectedValueTag.title }
)

The new useMessages() hook is an alternative to this, which takes a map of messages (from messages()) and creates a new map where every message is a function that can be called to get the localized string. It employs the same measures of type-safety as <Msg/>, and can be used like this:

import { im, m, messages, useMessages  } from 'core/i18n';

const definedMessages = messages({
  hello: im<{ name: string }>('Hello {name}!'),
  goodbye: m('Goodbye!')
});

const funcs = useMessages(definedMessages);

// Will work (no values required)
console.log( funcs.goodbye() );

// Will work (values correct)
console.log( funcs.hello({ name: 'Clara' }) );

// Typescript will complain (same reasons as <Msg/>)
console.log( funcs.goodbye({ foo: 'bar' }) );
console.log( funcs.hello() );
console.log( funcs.hello({ foo: 'bar' }) );
console.log( funcs.helo() );

Let's compare that old intl.formatMessage() with the new expression:

// Old:
const intl = useIntl();
intl.formatMessage(
  { id: 'misc.tags.tagManager.addValue' },
  { tag: selectedValueTag.title }
)

// New:
const msgs = useMessages(messages);
msgs.manager.addValue({ tag: selectedValueTag.title })

Localizing (defining translations)

Messages are still localized in YAML files. I have not at this point made any changes to where they are defined as that's a separate (but related) matter from the type-safe internationalization framework.

A message defined with messages('feat.tags', { title: 'Tags' }) could be translated to Swedish in a YAML file at src/locale/feat/tags/sv.yml as title: Etiketter, or the file could be at src/features/tags/locale/sv.yml, or according to some other pattern.

That's something we need to decide next.

Loading translations

In the current codebase, localized messages are read from the YAML files by the server, and then served alongside other data as part of the page load. This means that every page needs to define what messages are needed, which is done using the localeScope option for scaffold().

This PR doesn't really change this, except in one way. Default (English) translations are now included as part of the message definitions (m('This is the default')) which means that if there is something preventing the localized (e.g. Swedish) messages from loading, usually programmer mistake, there will absolutely always be an English fallback, because if the ID was referenced in the code, it will have been compiled into the javascript.

This means that the "Missing message" errors will no longer happen, and the message.id.strings that would previously render in the UI will instead always fall back to English.

Because all messages pass through a single point at runtime (messages()) it's possible to aggregate while rendering a page what subsets of messages are likely needed. This could (I'm not sure) allow us to flag them as needed and load them, without having to specify localeScope.

The type safety and single source of truth also opens up for some creative ways of solving this and other issues using tooling.

Tooling potential

Now that messages are code, and there is a single source of truth for what messages should exist, that opens up for some interesting tooling potential. Here are some ideas.

NOTE: I have made some proof of concept experiments (not in this PR) using ts-morph to verify that the following ideas are viable. ts-morph is a TypeScript AST API, which basically lets us inspect typescript code programmatically, allowing us to build tools like these.

A linting tool that could run in CI and prevent any merges that includes messages that are not used anywhere (made possible by the fact that messages are code, along with the ts-morph equivalent of right-clicking a message in VSCode and selecting "Find references")
A linting tool that could run in CI and flag if there are messages referenced under a page, that aren't specified in the localeScope for that page (and hence will not be loaded). This could be made obsolete by an even smarter runtime loader, which may or may not be possible.
A tool to find any messages in YAML that don't correspond to any messages in code, i.e. stale old translations that should be deleted
A tool to generate en.yml files from the messages, which we could use to generate the files that are needed for our translate.zetkin.org interface to work.

Re: Dates, numbers etc

Nothing changes for numbers, dates, etc. This PR only concerns "messages", i.e. predefined strings – how to define them, reference them etc. That's where we've been having problems.

For dates, numbers etc, the react-intl components (e.g. <FormattedDate/>) work fine. In some cases we've wrapped them (e.g. <ZUIRelativeTime/>). In other cases where dates and numbers are concerned, they are actually inside messages and can be localized using MessageFormat (e.g. {orgCount, number}).

Screenshots

None

Changes

Adds a new set of modules under core/i18n
- Helper functions for defining messages, messages(), m() and im()
- A <Msg/> component inspired by <FormattedMessage/> but type-safe
- A useMessages() component, designed as a type-safe replacement for most of our use-cases for useIntl()
Specifies messages and migrates all components in the search feature
Specifies messages and migrates all components in the tags feature
Deletes unused YAML files
Changes tests so that they use the new (per-feature) IDs

Notes to reviewer

I would like to discuss the following questions:

Is this a meaningful improvement? Please let me know what you think in general!
Is YAML still the way to go for localized messages?
Where should we put the localized messages? In the feature directories (e.g. src/features/tags/locale/sv.yml), or still globally (e.g. src/locale/feat/tags/sv.yml)?
What patterns should we use for naming the return value of useMessages()?
- Now I usually do const msg = useMessages(messages) but I don't love it, because msg implies a single message when it's actually a map of all messages.
- Maybe import messages.ts as messageIds instead, and then do const messages = useMessages(messageIds)?
Should we keep the old pattern for unit tests, where strings are rendered as IDs, and referenced as such in the test code? This was because we didn't have the messages loaded and had no fallbacks, but now we could if we wanted use the fallbacks text in tests.

Related issues

Related to #998

richardolsson · 2023-02-23T16:23:16Z

Actions from discussion with team today:

Rename messages() to makeMessages()
Try to merge m() and im() as a single overloaded m() (if possible)
Establish a pattern for useMessages() that looks like this: const messages = useMessages(messageIds)
Add new getByMsgId() test utility, to retrieve from render() by type safe message ID instead of string

richardolsson · 2023-02-24T05:53:44Z

I have taken the actions decided at yesterday's meeting. It is unfortunately not possible to use m() both for interpolated messages and plain messages, because there is nothing differentiating them (at the time of calling the function) except the expected return type, so there is nothing for the implementation to use when deciding what type to return.

I will proceed next to migrate the entire codebase to use the new pattern.

richardolsson · 2023-02-27T21:37:09Z

Some updates:

I have replaced all uses of <FormattedMessage> and useIntl() with the new <Msg> and useMessages() solutions.
I have removed all obsolete YAML files
I have NOT significantly restructured the internal structure of message maps but mostly just transferred the structure of the YAML files to the new object maps. The type safety means this can be safely done later.

News:

I modified makeMessages() to always add a global map to contain some translations that are used often enough that it feels fair to always include them, e.g. personFields. I will comment on a line where this is used (see below).
I created a getServerMessages() function that imitates useMessages() but allows messages to be used server-side, e.g. in download/export endpoints or RPC calls. Will comment on a line where it's used (see below).

richardolsson · 2023-02-27T21:37:14Z

src/zui/ZUIAccessList/index.tsx

-              action={
-                <FormattedMessage id={`zui.accessList.roles.${item.role}`} />
-              }
+              action={<Msg id={messageIds.global.roles[item.role]} />}


Example of the new global map that is always included in the map returned by makeMessages()

richardolsson · 2023-02-27T21:37:54Z

src/features/views/rpc/createNew/server.ts


  const newView = await apiClient.post<ZetkinView, ZetkinViewPostBody>(
    `/api/orgs/${orgId}/people/views`,
    {
      folder_id: folderId || undefined,
-      title: messages['misc.views.newViewFields.title'],
+      title: messages.newViewFields.title(),


New getServerMessages() in action here.

richardolsson · 2023-03-01T05:57:44Z

I actually managed to solve the issue of overloading m() to work as im()! See commit 4f61365 for how I did it.

rebecarubio

You´re making the line skyrocketing! Wohoooo! 🚀

richardolsson added 7 commits February 23, 2023 07:22

Create framework with component and hook for localized messages

6b2adf8

Migrate search feature to new localization framework

7c4a60d

Migrate tags feature to new localization framework

90d4619

Improve documentation for messages()

cf6aafb

Use message IDs as fallback when running in test suite

30def2d

Update tests to use new per-feature IDs

bd8f1d9

Add comment about _typeFunc()

2d7fe1a

richardolsson requested review from Bagera, djbusstop, kaulfield23, niklasva82, olleolleolle, rebecarubio and ziggabyte February 23, 2023 11:03

richardolsson added 3 commits February 23, 2023 17:51

Rename makeMessage() and change pattern for messageIds

1d255ba

Add getByMessageId() getter for component tests

f931b3b

Replace getByText() with getByMessageId() when applicable

188fa03

richardolsson added 11 commits February 25, 2023 13:36

Improve types for i18n framework

5812314

Improve testing utils for new message format

3210b52

Migrate ZUI components to new message format

456e192

Migrate tasks feature to new message format

d9a0868

Migrate calendar feature to new message format

00146a3

Move QueryStatusAlert from smartSearch feature to tasks

9a1e9a1

Migrate Smart Search feature to new message format

976b519

Migrate profile feature to new message format

dc8c7b3

Migrate call assignment feature to new message format

c2b6a8d

Migrate campaign feature to new message format

c42b11a

Migrate journeys feature to new message format

d09d40f

richardolsson added 11 commits February 27, 2023 08:20

Migrate events feature to new message format

c932fb2

Create global messages that are always included with makeMessages()

caf25b8

Migrate some ZUI and profile strings to global scope

53da7ba

Migrate views feature to new message format

9bbcaf3

Migrate pages and layouts to new message format

b41f4a6

Merge branch 'main' into issue-998/restructure-localized-strings

1c0232c

Migrate work-in-progress surveys feature to new message format

ed01e37

Migrate remaining misc parts to new message format

6aa893e

Remove obsolete YAML files

b6a5f4f

Add support for server-side localized messages inspired by useMessages()

fc59579

Migrate server-side logic to new message format

9aea5d1

richardolsson commented Feb 27, 2023

View reviewed changes

richardolsson linked an issue Feb 27, 2023 that may be closed by this pull request

Restructure localized strings #998

Closed

2 tasks

richardolsson added 2 commits March 1, 2023 06:43

Create component for rendering smart search time frames

2867a6d

Refactor m() and im<Values>() as a single overloaded m()

4f61365

rebecarubio approved these changes Mar 2, 2023

View reviewed changes

richardolsson merged commit 11b600c into main Mar 2, 2023

richardolsson deleted the issue-998/restructure-localized-strings branch March 2, 2023 12:50

richardolsson mentioned this pull request Mar 21, 2023

Fix incorrect message ID in ConversationSettings #1114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure localized strings (RFC) #1048

Restructure localized strings (RFC) #1048

richardolsson commented Feb 23, 2023 •

edited

richardolsson commented Feb 23, 2023 •

edited

richardolsson commented Feb 24, 2023

richardolsson commented Feb 27, 2023

richardolsson Feb 27, 2023

richardolsson Feb 27, 2023

richardolsson commented Mar 1, 2023

rebecarubio left a comment

Restructure localized strings (RFC) #1048

Restructure localized strings (RFC) #1048

Conversation

richardolsson commented Feb 23, 2023 • edited

Description

Some terminology

Benefits, in short

Defining messages (and default strings)

Type-safe interpolation values

The <Msg/> component

The useMessages() hook

Localizing (defining translations)

Loading translations

Tooling potential

Re: Dates, numbers etc

Screenshots

Changes

Notes to reviewer

Related issues

richardolsson commented Feb 23, 2023 • edited

richardolsson commented Feb 24, 2023

richardolsson commented Feb 27, 2023

richardolsson Feb 27, 2023

Choose a reason for hiding this comment

richardolsson Feb 27, 2023

Choose a reason for hiding this comment

richardolsson commented Mar 1, 2023

rebecarubio left a comment

Choose a reason for hiding this comment

richardolsson commented Feb 23, 2023 •

edited

The `<Msg/>` component

The `useMessages()` hook

richardolsson commented Feb 23, 2023 •

edited