-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add message parse mode (code vs text) design doc #474
Conversation
Nice start, thanks! Could you add examples of the current syntax so I can share it with a couple people? There are currently only examples of the alternative syntax. |
@LeaVerou The Start in code, encapsulate text proposal is effectively our current syntax, amended by adding I have presented it here initially as an "alternative", so that its selection as our choice going forward may be made based on its merits, rather than pre-existing conditions. |
@LeaVerou I would actually give us some time before pushing into this too far. A lot has happened in our F2F this week. I suspect that there will be a significant modification of both the syntax (for reasons unrelated to text mode as well as the discussion about authoring considerations)... say give it a week (i.e. by 2023-09-19)? I do thank you for sparking what was a simmering issue in our group. I think we'd like to queue up our thinking and some backing material so that its accessible. We super value fresh eyes, since it is easy to get into groupthink. |
exploration/0474-text-vs-code.md
Outdated
Limiting the range of characters that need to be escaped in plain text is important. | ||
Following past precedent, | ||
this design doc will only consider encapsulation styles which | ||
start with `{` and end with `}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "characters that need to be escaped in plain text" refers to the text portion of the syntax. We should be able to consider whatever natural-to-use characters make sense in the non-textual parts of the syntax, if they would represent a significant improvement in usability. That is, this constraint applies mainly to the pattern portion of the syntax and might not apply outside that.
Gotcha. FWIW our discussion sparked some ideas about new TAG design principles to guide the design of text-based syntaxes, which would hopefully be more broadly useful. I can ping you once there's more on that front, in case they also help this syntax redoing. |
I think we’ve identified that there are messages that ought to have a maximally simple representation (e.g. “Hello word”) and that beyond some increase in complexity a more complex structure is needed (e.g. “You have 3 messages”). I think we’re mostly agreed on the simple representation (e.g. As I see it, there are two relevant viewpoints from which to look at the complex message syntax:
Regarding the first viewpoint, I think that’s tantamount to asking if we’d like users to see MF2 as having one or more internal layers. Our current syntax has two such layers, where to start we’re explicitly in “code” and then we may enter “text”. This question is excacerbated by adopting “text” for simple messages; by dropping the My preference would be to work towards making the syntax feel more like a template format, and to have “simple” and “text” conceptually similar, as we have with our current syntax. Are there some explicit benefits from having a simple vs. text separation that I’m not aware of, other than whitespace representation? Regarding the second viewpoint, I don’t think I have sufficient data to understand how messages with intentional and necessary leading or trailing whitespace are formed. My presumption and our discussion yesterday suggests that in a majority of cases this is a bug that’s easy for developers to make, so trimming all of it could prevent a decent amount of surprisal. But when is that not the case? What do these messages look like, and most importantly, how are translators made aware that the spaces are required and should be retained? I think we need to answer these questions to be able to explain whatever format we end up with for complex messages. |
@eemeli I think these are some key insights. Thanks for this. I think we made a choice (we can reconsider it if necessary, although I don't think we need to) that MF2 is not a resource format. I'm not sure if "templating language" is the right characterization, but let's go with it for now. The reason it isn't a resource format is the same reason that we have all-on-a-line as something we support. The resource format wraps around the MF syntax and, necessarily, embeds us. This is where many of the considerations surrounding escaping come from. We want users to be able to author and edit our format when it is hosted in a variety of tools/formats... and with the explicit recognition that they will edit messages in place using whatever passes for a text editor--or a resource editor that understands the wrapper format (and is trying to syntax highlight that, rather than trying to highlight our format nested inside it). This puts some extra pressure on us, witness
The thing I would want to convey here is that the world is a big place. There are many applications, runtime environments, UI frameworks, etc. Developers are trying to meet various different needs/demands at the same time and most have only marginal familiarity with I18N. We need to enable these folks to get work done because they are our primary customers--they will vote with their feet if we don't make their lives better and that is the main thing that will ensure the success or failure of MF2. I am not saying we should bend over backwards to let developers write bad strings, notably "starts with space so I can concatenate". But there exist a variety of places where developers want to control whitespace and each is a special snowflake of developer need--some want tabs; some want some newlines; some want to mimic horizontal spacing; some are used for emphasis or to provide space for a visual element inserted as an overlay. But, frankly, these are all corner cases and the primary case is: I want to write an I18N bug. When evaluating an application, one of the first things I look for are resource strings that start with space, contain only a space, or contain only a period (or other punctuation)--these signal "string math". So with regard to spacing, my line of questioning last night was intended to sound out how to enable those folks to get their job done with a minimum of fuss without over-encouraging the use of "the bug factory". My current thinking is that there are three proposals in play: All textual space is meaningfulAny whitespace that appears outside "code mode" has meaning and must be preserved. That means, in our demo messages, that this message:
Produces the message:
This is unsurprising to a very rigorous developer, but probably a surprise to newcomers and translators. Note that most developers see this in the resource format, btw, as some flavor of: "myHelloString" = "#local $foo = {42 :number}\nHello {$foo}"; // newline is text is slightly less surprising here Exterior whitespace has no meaningThis means that space around the pattern string must be escaped to be part of the pattern. Using the same example as above produces:
To get a space or newline, it must be quoted onto the string. Option A: Character quotingQuote the individual characters you want:
Option B: Quote the pattern
I think I currently prefer Option 2B. This makes any leading whitespace explicitly part of the message when quoted and authors and translators don't have to do "special things" with the spaces. It keeps those spaces from being a "special literal thing" and maybe being a "part" when formatted to parts. This is actually the intention of the developer and translator--a "languageX" translator can remove or change the number of spaces or remove them all without doing anything outside of normal localization. I think that's a feature that's worth more than the implicit spanking for our customer the developer. (Tools should be encouraged to emit warnings about spaces inside the quotes (or in the case of Option 1, about all whitespace outside the quotes) Note that Option 2A or B works perfectly fine as a leading space free message:
Your second question was:
This is misleading as phrased? Leading/trailing whitespace that is part of the message is not special. They are just characters. The problem is identifying which characters are inside the pattern. In 2B, without quotes, there are never any whitespace characters in that position 🙈😈 |
I'd go further: we’re neither resource format nor templating language. Instead, I tend to think we’re a storage format for variants. That’s smaller scope than an entire resource and smaller scope than an entire template. We made that choice back when we decided to only allow top-level selection. Our logic happens "outside", while variants are "inside". The outside syntax should be friendly to developers and machines. The inside syntax should be friendly to translators. These are the two layers @eemeli mentions above—I think they are a feature. Furthermore, we don't really even have logic. Messages are not imperative templates with This is why I'm hesitant to draw too much inspiration from existing templating languages. When I see things like the following Jinja: {% for item in navigation %}
<li><a href="{{ item.href }}">{{ item.caption }}</a></li>
{% endfor %} ...I can see some parallels to what we're doing, but I'm also cautious of the differences. I'm rather happy with the current syntax in that it doesn't make code statements looks like placeholders — because placeholders can be moved around, and text can move around them too, and we specifically don't want that for Templating languages also typically either don't worry too much about whitespace (and delegate the problem to HTML), or need to introduce extra syntax to control it. For example, Jinja uses I'm not opposed to introducing modality and starting in text, especially for single-variant messages without declarations. However, the more explicit we are about whitespace, the easier it will be, in my mind, to embed messages in code or container formats, because code and container formats most likely will already have opinions about whitespace. This is indeed similar to escaping. |
... not expecting us to adopt it, but we need to make progress in deciding the specific issues here.
... which is perhaps indicative of an answer to one of the questions about double-bracketing `match`...
exploration/0474-text-vs-code.md
Outdated
{match {$count :number integer=true}} | ||
{when 0} Hello {$user}. Today is {$now} and you have no geese. | ||
{when one} Hello {$user}. Today is {$now} and you have {$count} goose. | ||
{when few} { Hello {$user}, this message has spaces on the front and end. } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work, parsing-wise?
{when few} { Hello {$user}, this message has spaces on the front and end. }
^
how does a parser know this isn't a placeholder's open brace?
I think it may be a good idea to consider using double braces for certain fetures (e.g. for placeholders, or as pattern delimiters). Alternatively, we may want to revisit the idea of using double sigils for different meanings, e.g. {%
, {[
, and {{
. See #269.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doubling will make the syntax harder to use/write?
In the above, the whitespace is consumed until you see the {
(or any text). The parser knows this isn't a placeholder's opening brace only by scanning ahead. In this message, the embedded {$user
is what resolves it. It is possible that one could reach the closing bracket in some messages and that would be the resolution.
The question here is whether we favor this syntax for its usability more than efficiency in parsing.
In my opinion, the real problem here is the when
clause. Not only is it visually hard to distinguish but it makes {
/}
ambiguous (the brackets can be any of three different things). Consider instead:
#match {$count :number integer=true}
[when 0] This has no spaces.
[when one] This has no spaces.
[when few] { This has spaces quoted }
[when *] {| |} This has spaces quoted on the front only as placeholder literal.
And it writes single line as:
#match {$count :number integer=true}[when 0]This has no spaces.[when one] No sp...
...aces [when few]{ Quoted spaces }[when *]{| |} Quoted spaces
The above still has some forward looking ambiguity:
{ $user :foo} // not resolved till you see $
{ foo :foo } // not resolved till you see the `:`
{ foo {$bar}} // not resolved till you see the 2nd `{`
{ foo foo } // not resolved till you see the closing `}`
So you're right: the parser is more of an adventure once the pattern can be unquoted.
One option we discussed is that the pattern must still be quoted once we are in code mode:
Hello world
Hello {$user}
{input $user}
{Hello {$user}} // because code mode
exploration/0474-text-vs-code.md
Outdated
|
||
{match {$count :number integer=true}} | ||
{when 0} Hello {$user}. Today is {$now} and you have no geese. | ||
{when one} Hello {$user}. Today is {$now} and you have {$count} goose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The {when one} Hello {$user}
part makes me really nervous: it looks like two placeholders on both sides of Hello
. Furthermore, I think the fact that the space left of Hello
will be trimmed but the one on the right will not is a footgun. I think this is the main reason why I've been opposed to trimming (although I understand which problems it addresses).
Could we consider using a different set of brackets for statements? For example, if we made #
special in patterns, too, we could consider something like the following:
#[one] Hello {$user}.
Does this look like less of a footgun now when it comes to the rules about which space will be trimmed and which one won't?
Interestingly, there's something about square brackets that makes me not actually mind the following:
#[one]Hello {$user}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above 😸
I kind of like this proposal.
#match {$expr} {$expr}
#[key key] Hello {$user}
#[key * ] {$user} hello
#[* * ] { Quoted pattern }
Or:
#match {$expr} {$expr}#[key key] Hello {$user}#[key * ] {$user} hello#[* * ] { Quoted pattern }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Past feedback from CLDR in particular has been very critical about reserving additional characters than \{}
in text mode. However, a two-character sequence like #[
would probably be much more acceptable, given how rare it is in actual message contents.
If we do have a real concern about an all-code-delimited solution looking like the "statement" parts may look too "placeholder", that should be added to the doc.
In the continuing absence of any exemplars of non-i18n-buggy leading or trailing whitespace, that may end up as a sole reason to prefer quoting entire patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quotes the pattern because one is doing something weird inside the pattern. I'm way way way more cautious about quoting sub-patterns, because those feel more like I18N bugs to me:
This is{| |}{$foo}{| |}an I18N bug waiting to happen.
{This is }{$bar}{ another I18N bug waiting to happen.}
{This} {is} {just} {silly} {😸}
Interestingly, this past week I was consulting with some folks and saw a string like "\n\n{$placeholder}", which was being used to make an on-screen list (in a for
loop). That's not really an I18N bug, although it's not normal either. (And actually what it was was a hardcoded string "\n\n" followed by a + someVar
that I made them move to externalized with a formatter...)
As I noted in Seville, there are many special-snowflake cases for exterior whitespace. They are not "compelling" use cases and yes one could code around them. If the proposal is to remove quoted patterns, I suppose I could get behind that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, a two-character sequence like
#[
would probably be much more acceptable, given how rare it is in actual message contents.
With the warning that #
is a comment in some of the container formats (for example Java properties and gettext .po files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I see potential in the #[...]
syntax for code (statements). A lot of languages allows these extra directive for code, often called attributes: C++ uses [[foo]]
, C# uses [foo]
, Rust uses #[foo]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, this past week I was consulting with some folks and saw a string like "\n\n{$placeholder}", which was being used to make an on-screen list (in a
for
loop). That's not really an I18N bug, although it's not normal either. (And actually what it was was a hardcoded string "\n\n" followed by a+ someVar
that I made them move to externalized with a formatter...)
This appears to be a non-locale-dependent use case of leading whitespace, where the \n\n
is effectively used as markup, yes? So if (for whatever reason) this message needed to be expressed entirely within MF2, would it make sense to expect this to be represented as {|\n\n|}
, where the \n
represent actual newline characters?
One comparable and valid locale-dependent case that I can imagine existing is sentence concatenation in a context that needs to account for both CJK and non-CJK scripts. Similarly to the \n\n
, I could imagine a space after a period to be included as a leading or trailing space in a single-sentence message for the non-CJK scripts, rather than being handled in code depending on the locale's script.
Are there any other locale-dependent uses of leading or trailing whitespace that we ought to consider? And is the case I represent above actual, or purely hypothetical? As in, could someone here state that they do have messages like this in their corpus? And if they do, how do they communicate to their translators that in these particular cases, the whitespace should be removed in CJK locales, while something like the \n\n
above should be left alone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other locale-dependent uses of leading or trailing whitespace that we ought to consider? And is the case I represent above actual, or purely hypothetical?
There are lots of locale-dependent cases too. I cited a random example that came my way in the past week because I don't have the luxury of grepping an employer's vast collection of strings at the moment. I'm hopeful someone else can do that, but I don't expect to learn anything from it.
how do they communicate to their translators that in these particular cases, the whitespace should be removed in CJK locales, while something like the \n\n above should be left alone?
"Carefully." I have seen comments in resource files, comments in translation kits, comments in tooling. I also note that most localization engineering shops maintain tools for checking that target language strings match source strings in terms of start/end spacing, punctuation, and placeables. Some languages (such as CJK) produce a lot of noise or the need for message suppression or tuning in these cases.
I don't have a problem with users putting {| \n \t |}
onto the front of a pattern as a quoted blob to preserve across translations. But you appear to be building towards the suggestion of only allowing that case. I think that format is inconvenient to write and runs up against languages that want other behavior--and also that this isn't really a problem for most translation processes (where the whitespace is already treated as meaningful)
Either way we have to describe how exterior whitespace is handled in patterns. So it doesn't really help me, individually, decide how to choose between auto-trimmed vs. non-trimmed unquoted patterns. I think the thing I'm trying to puzzle out for myself is: which is the most natural and least surprising representation of a pattern?
when [*]No whitespace is no whitespace.
when [*] Whitespace is trimmed.
when [*]
All whitespace is trimmed.
when [*] Whitespace is meaingful, so there is a space before this string.
when [*]\n All whitespace is meaningful. // read \n as newline
when [*]
{
Unquoted whitespace is trimmed, but this message has newline and spaces around it
}
when [*]{\n The same message normalized to a single line with\n }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are lots of locale-dependent cases too. [...] I don't have the luxury of grepping an employer's vast collection of strings at the moment. I'm hopeful someone else can do that, but I don't expect to learn anything from it.
I really hope someone can do that. I have heard now on numerous occasions that there are many locale-dependent uses for leading or trailing whitespace, but this specific one -- how CJK scripts do not use spaces between clauses -- is literally the only one I am aware of.
If we end up making a specific accommodation for this in MF2, this argument needs to be really well made, and presented in this design doc. I am not the right person for doing that, and so I continue to ask others to help here.
[...] Some languages (such as CJK) produce a lot of noise or the need for message suppression or tuning in these cases.
Could we improve that experience? Rather than perpetuating a sub-optimal CJK translation experience, could we somehow explicitly differentiate localizable leading/trailing whitespace from markup whitespace? If those spaces needed to be explicit, and we did adopt expression attributes, that could be done with
{| | @translate}
I don't have a problem with users putting
{| \n \t |}
onto the front of a pattern as a quoted blob to preserve across translations. But you appear to be building towards the suggestion of only allowing that case. I think that format is inconvenient to write and runs up against languages that want other behavior--and also that this isn't really a problem for most translation processes (where the whitespace is already treated as meaningful)
My claim is that almost all leading & trailing whitespace is not really localizable content, and by default should not be in messages. I do still want to allow for the possibility of including such whitespace, and having a way of making it clear to both humans and tooling when such whitespace is markup, and when it is localizable.
We need to identify and enumerate the explicit use cases for leading/trailing whitespace, and to make our syntax choices based on that. Thus far, we have not done so; we've just accepted the assertion that leading/trailing whitespace must be accommodated for ergonomically. Once we do have such a list written down somewhere, we may use that to direct our choice, e.g. within the context of the #485 beauty contest.
And yes, until convinced otherwise, my current position is indeed that we should require leading/trailing whitespace to be explicitly quoted, because that's the only way to differentiate localizable and non-localizable whitespace. To make me change my mind, I can imagine at least the following categories of arguments that could be made:
- Locale-dependence. Show me that there is a great range of different types of localizable dynamic message strings where leading/trailing whitespace needs to be handled differently in different locales, such that my base assumption about most such cases being "markup" is invalid, and that either both are as common, or that it's more common for there to be a locale dependency.
- Better syntax. Show me syntax which will lead developers to leaving more non-localizable whitespace out of messages and/or communicating better to translators the localizability of whitespace.
- Numbers. Look into the data, i.e. the corpus of localizable dynamic message strings that you have access to, and tell me that such a large fraction of them include valid, localizable leading/trailing whitespace, that special accommodation must be made for them in the syntax. Extra credit if you can further say how common this is in multi-variant messages. I don't need to see your messages, I'm interested in the frequency, and a data-driven argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My claim is that almost all leading & trailing whitespace is not really localizable content, and by default should not be in messages. I do still want to allow for the possibility of including such whitespace, and having a way of making it clear to both humans and tooling when such whitespace is markup, and when it is localizable.
I think your claim is reasonable. But I would also observe that this is less true for desktop and CLI applications. I would also note that trailing whitespace is probably as important as leading whitespace.
Overall, I think my reaction on this thread is to look to overall syntax first and whitespace handling as a downstream consideration. A number of syntax options end up with pattern quoting that may make this discussion moot. And, as noted above, each of the whitespace handling options represent different compromises, depending on who is looking and what the use case is.
Following yesterday's call, I've updated this PR. It's no longer considering any explicit syntax, but more precisely the "parse mode" question that this is nominally about, approaching this from two related axes:
On the trimming, I've split out three choices from the previous design and propose one as a part of the solution here. @aphillips With the specific syntax questions taken out, I think the remainder here could be used as a basis for our next-step discussions. In addition, of course, to whatever changes @echeran and @mihnita may be proposing to our pattern-exterior whitespace consensus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a key insight that this version currently hides is this:
The whitespace handling is not about the message as a whole. It is about identifying the boundary between the pattern and code.
In a couple of places the options given here talk about, e.g., whitespace between declarations--which no one in this group would expect to be part of a pattern (I think?).
By recasting this as pattern/code boundary handling I think we could make it clearer. That would make options look more like:
(assumption: simple patterns are not trimmed or are a separate debate; quoted variant patterns are not trimmed)
- (old syntax) Start in code, all patterns are quoted.
- (Implement text-mode-first syntax #500 syntax) Start in text, code is quoted, all variant patterns are quoted.
- Start in text, code is quoted, unquoted variant patterns are trimmed.
- Same as 2 except uses @eemeli's "minimal trimming" of unquoted variant patterns
- Start in text, code is quoted, unquoted variant patterns are not trimmed.
|
||
## Objective | ||
|
||
Decide whether text patterns or code statements should be enclosed in MF2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is that clear.
Decide whether text patterns or code statements should be enclosed in MF2. | |
Decide how to segregate and identify between _pattern_ text and code statements in MF2. | |
This includes whether parsing a message expects to start with _pattern_ text or with code statements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "how" part would be an extension of what the design doc is currently doing. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, but this isn't about "whether text patterns or code statements should be enclosed" but rather about which should be enclosed and when. And the design decision is really about the general syntax (text-vs-code and trimmed-vs-untrimmed-vs-quoted)
Expressing the trimming on patterns rather than statements | ||
means that leading and trailing spaces are also trimmed from simple messages. | ||
This option is not chosen due to this being somewhat surprising, | ||
especially when messages are embedded in host formats that have predefined means | ||
of escaping and/or trimming leading and trailing spaces from a value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trimming simple patterns like this is a bridge too far for me. It should be a separate decision for the "trim XXX" options whether they are trimmed. I can make a plausible argument for why simple patterns should behave differently when trimmed than variant patterns do.
This option is not chosen due to adding an excessive | ||
quoting burden on all messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not include these "This option is not chosen due..." paragraphs. I think it is okay to call out objective or subjective reasons for why we might not choose a given alternative, e.g.
This option is not chosen due to adding an excessive | |
quoting burden on all messages. | |
- This option makes plain text strings invalid as messages. | |
- This option requires additional quoting for simple messages. |
Our choice section should deal with the logic of why a given option was chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what you mean by "choice section"? I'm not sure that I understand what that is.
Co-authored-by: Addison Phillips <addison@unicode.org>
Trim whitespace between and around statements such as `input` and `when`, | ||
but do not otherwise trim any leading or trailing whitespace from a message. | ||
This allows for whitespace such as spaces and newlines to be used outside patterns | ||
to make a message more readable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about this resulting in surprising misbehavior... consider a developer making the following change, in which removing all inputs from a message that still starts and ends with a line feed would not be expected to affect whitespace:
logOutMessage = ```
-{%input username}
-
-Log out {$username}?
+Log out?
```;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, fair point. The issue here is that the above edit would add a new line to the message's start, yes? One way to avoid this would be to require a message with statements to not have leading whitespace. That might be a reasonable restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @gibson042's illustration being informative. I would have guessed (if I didn't know anything about MF2) that the "before" of logOutMessage
included a newline. With some of the options here, the newline after {%input $username}
is consumed, but not the blank link after that. In other options both newlines are consumed unless the user specifically quoted the pattern or whitespace.
Also, note that many formats would encode the example as:
// using our current syntax but without quoting the pattern
var logOutMessage = "{{{#input $username}\n\nLog out {$username}?}}"
The way I see it, it's about enabling the same sort of thing we do elsewhere in the syntax with e.g.
The spaces there are not required by the syntax, but they make that line much more readable.
That's needed in the "trim minimally" option to clarify its corner cases. I do not think we should choose such an approach.
I rather hope that we could establish that here, or at least explicitly put it down in writing.
I agree that there are multiple ways of looking at what we're doing. I think different choices on trimming lead to different points of view being more or less appropriate, such that no one viewpoint is optimal for all possible solutions. If you've specific suggestions for the alternatives presented here, those might be easier to assess individually. Note also that I've tried to not say here how code ought to be "encapsulated"; that I think can be discussed as a separate dimension: Do we use wrapping {quotes} or a starting %sigil with an implicit end? Is the encapsulation around one or multiple statements? These questions hopefully don't need to be mingled into this discussion. |
Submitting this initially as a draft, as it does not yet propose a solution, only alternatives.
The intent here is to document the choice we make, and to provide the basis for explaining it to others. This is also in response to comments made by @LeaVerou in discussions at W3C TPAC.
The terms "most", "many", "sometimes", "some", and "rarely" in the use cases is intentional, and draws on my experience with localizable messages. If it's useful, I can dig up actual statistics from the corpus of Fluent messages at Mozilla, which as a format is relatively close to MF2. Are there other sets of comparable existing messages from which we could get any such data?