Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions spec/formatting.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ These are divided into the following categories:
does not provide for the function `:func` to be successfully resolved:

```
{The value is {|horse| :func}.}
{The value is {horse :func}.}
```

```
Expand Down Expand Up @@ -390,7 +390,7 @@ These are divided into the following categories:
an option `field` to be provided with a string value,

```
{Hello, {|horse| :get field=name}!}
{Hello, {horse :get field=name}!}
```

```
Expand Down Expand Up @@ -448,7 +448,7 @@ Between the brackets, the following contents are used:
followed by the value of the Literal,
and then by U+007C VERTICAL LINE `|`

Examples: `{|horse|}`, `{|42|}`
Examples: `{|your horse|}`, `{|42|}`

- Expression with Variable Operand: U+0024 DOLLAR SIGN `$`
followed by the Variable Name of the Operand
Expand Down
35 changes: 21 additions & 14 deletions spec/message.abnf
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,15 @@ body = pattern
pattern = "{" *(text / expression) "}"
selectors = match 1*([s] expression)
variant = when 1*(s key) [s] pattern
key = nmtoken / literal / "*"
key = literal / "*"

expression = "{" [s] (((literal / variable) [s annotation]) / annotation) [s] "}"
annotation = (function *(s option)) / reserved

option = name [s] "=" [s] (literal / nmtoken / variable)
literal = quoted / unquoted
variable = "$" name
function = (":" / "+" / "-") name
option = name [s] "=" [s] (literal / variable)

; reserved keywords are always lowercase
let = %x6C.65.74 ; "let"
Expand All @@ -26,14 +29,17 @@ text-char = %x0-5B ; omit \
/ %x7E-D7FF ; omit surrogates
/ %xE000-10FFFF

literal = "|" *(literal-char / literal-escape) "|"
literal-char = %x0-5B ; omit \
/ %x5D-7B ; omit |
/ %x7D-D7FF ; omit surrogates
/ %xE000-10FFFF
quoted = "|" *(quoted-char / quoted-escape) "|"
quoted-char = %x0-5B ; omit \
/ %x5D-7B ; omit |
/ %x7D-D7FF ; omit surrogates
/ %xE000-10FFFF

variable = "$" name
function = (":" / "+" / "-") name
; based on https://www.w3.org/TR/xml/#NT-Nmtoken,
; but cannot start with U+002D HYPHEN-MINUS or U+003A COLON ":"
unquoted = unquoted-start *name-char
unquoted-start = name-start / DIGIT / "."
/ %xB7 / %x300-36F / %x203F-2040

; reserve additional sigils for future use
reserved = reserved-start reserved-body
Expand All @@ -47,18 +53,19 @@ reserved-char = %x00-08 ; omit HTAB and LF
/ %x7E-D7FF ; omit surrogates
/ %xE000-10FFFF

name = name-start *name-char ; based on https://www.w3.org/TR/xml/#NT-Name, but cannot start with U+003A COLON ":"
nmtoken = 1*name-char ; equal to https://www.w3.org/TR/xml/#NT-Nmtoken
; based on https://www.w3.org/TR/xml/#NT-Name,
; but cannot start with U+003A COLON ":"
name = name-start *name-char
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
/ %x370-37D / %x37F-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFD / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "." / ":"
/ %xB7 / %x0300-036F / %x203F-2040
name-char = name-start / DIGIT / "-" / "." / ":"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that names should be more like variable names in programming languages.
So we should not allow - and :, maybe even .
If we allow . then we can/should say what it means (if it means something).
Maybe something like a "namespace"?

/ %xB7 / %x300-36F / %x203F-2040

text-escape = backslash ( backslash / "{" / "}" )
literal-escape = backslash ( backslash / "|" )
quoted-escape = backslash ( backslash / "|" )
reserved-escape = backslash ( backslash / "{" / "|" / "}" )
backslash = %x5C ; U+005C REVERSE SOLIDUS "\"

Expand Down
90 changes: 54 additions & 36 deletions spec/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
1. [Reserved Sequences](#reserved)
1. [Tokens](#tokens)
1. [Keywords](#keywords)
1. [Text and Literals](#text-and-literals)
1. [Text](#text)
1. [Literals](#literals)
1. [Names](#names)
1. [Escape Sequences](#escape-sequences)
1. [Whitespace](#whitespace)
Expand Down Expand Up @@ -307,7 +308,7 @@ The key `*` is a "catch-all" key, matching all selector values.

```abnf
variant = when 1*(s key) [s] pattern
key = nmtoken / literal / "*"
key = literal / "*"
```

A _well-formed_ message is considered _valid_ if the following requirements are satisfied:
Expand Down Expand Up @@ -357,23 +358,31 @@ _Functions_ do not accept any positional arguments other than the _literal_ or _
```abnf
expression = "{" [s] (((literal / variable) [s annotation]) / annotation) [s] "}"
annotation = (function *(s option)) / reserved
option = name [s] "=" [s] (literal / nmtoken / variable)
option = name [s] "=" [s] (literal / variable)
```

Expression examples:

```
{|1.23|}
{1.23}
```

```
{|1.23| :number maxFractionDigits=1}
{|-1.23|}
```

```
{1.23 :number maxFractionDigits=1}
```

```
{|Thu Jan 01 1970 14:37:00 GMT+0100 (CET)| :datetime weekday=long}
```

```
{|My Brand Name| :linkify href=|foobar.com|}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not a good example?
This makes it non-localizable.

```

```
{$when :datetime month=2-digit}
```
Expand Down Expand Up @@ -410,22 +419,20 @@ The grammar defines the following tokens for the purpose of the lexical analysis
### Keywords

The following three keywords are reserved: `let`, `match`, and `when`.
Reserved keywords are always lowercase.

```abnf
; reserved keywords are always lowercase
let = %x6C.65.74 ; "let"
match = %x6D.61.74.63.68 ; "match"
when = %x77.68.65.6E ; "when"
```

### Text and Literals
### Text

_Text_ is the translatable content of a _pattern_, and _Literal_ is used for matching
variants and providing input to expressions.
Any Unicode code point is allowed in either, with the exception of
the relevant delimiters (`{` and `}` for Text, `|` for Literal),
`\` (which starts an escape sequence), and
surrogate code points U+D800 through U+DFFF (which cannot be encoded into UTF-8).
_Text_ is the translatable content of a _pattern_.
Any Unicode code point is allowed,
except for surrogate code points U+D800 through U+DFFF.
The characters `\`, `{`, and `}` MUST be escaped as `\\`, `\{`, and `\}`.

All code points are preserved.

Expand All @@ -438,55 +445,66 @@ text-char = %x0-5B ; omit \
/ %xE000-10FFFF
```

### Literals

_Literal_ is used for matching variants and providing input to _expressions_.
_Quoted_ literals may include content with any Unicode code point,
except for surrogate code points U+D800 through U+DFFF.
The characters `\` and `|` MUST be escaped as `\\` and `\|`.
_Unquoted_ literals have a much more restricted range that
is intentionally close to the XML's [Nmtoken](https://www.w3.org/TR/xml/#NT-Nmtoken),
with the restriction that it MUST NOT start with `-` or `:`,
as those would conflict with _function_ start characters.
Comment on lines +456 to +457
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines don't feel normative the way the rest of this passage does. Perhaps remove them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the comparison of unquoted and name to their XML counterparts potentially useful, and this seems like a decent way of expressing that. My opinions here are not too strong though, so happy to take input from others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite how I would approach this. I think it needs more explanation about literals in general and then about the unquoted ones. The relationship to Nmtoken (which we broke on purpose) isn't that relevant any more. Perhaps:

Suggested change
with the restriction that it MUST NOT start with `-` or `:`,
as those would conflict with _function_ start characters.
_Literal_ values are used to pass data to various parts of a `message`:
* As the value of a `key` in a `when` statement
* As the `argument` in an `expression`
* As the `value` in an `option`
A `Literal` is a sequence of _Unicode code points_ and can include any Unicode character. Surrogate code points are not allowed.
The characters `\\` U+005C REVERSE SOLIDUS and `|` U+007C VERTICAL BAR **_must_** be escaped (as `\\` and `\|` respectively) when they appear in the value of a `Literal`.
Spaces are significant in a `Literal`.
A `Quoted` literal is surrounded by `|` characters.
A `Literal` can be `Unquoted` when its content matches that production. The content restrictions for `Unquoted` follow best practices for the use of Unicode in formal grammars and are intentionally similar to, for example, XML's [Nmtoken](https://www.w3.org/TR/xml/#NT-Nmtoken).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+A Literal is a sequence of Unicode code points and can include any Unicode character. Surrogate code points are not allowed.

Should be:

A Literal is a sequence of Unicode code points, and can contain any Unicode code points except for surrogate code points and non-character code points.

Reason: "Unicode character" would mean "assigned Unicode character", which is unnecessarily fragile across versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, non-characters (U+FFFF for example) are not excluded. Only surrogate code points are. This is consistent with e.g. DOMString and USVString.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine


All code points are preserved.

```abnf
literal = "|" *(literal-char / literal-escape) "|"
literal-char = %x0-5B ; omit \
/ %x5D-7B ; omit |
/ %x7D-D7FF ; omit surrogates
/ %xE000-10FFFF
literal = quoted / unquoted

quoted = "|" *(quoted-char / quoted-escape) "|"
quoted-char = %x0-5B ; omit \
/ %x5D-7B ; omit |
/ %x7D-D7FF ; omit surrogates
/ %xE000-10FFFF

unquoted = unquoted-start *name-char
unquoted-start = name-start / DIGIT / "."
/ %xB7 / %x300-36F / %x203F-2040
```

### Names

The _name_ token is used for variable names (prefixed with `$`),
function names (prefixed with `:`, `+` or `-`),
as well as option names.
A name MUST NOT start with an ASCII digit and certain basic combining characters.
It is based on XML's [Name](https://www.w3.org/TR/xml/#NT-Name),
with the restriction that it MUST NOT start with `:`,
as that would conflict with _function_ start characters.
Otherwise, the set of characters allowed in names is large.

The _nmtoken_ token doesn't have _name_'s restriction on the first character
and is used as variant keys and option values.

_Note:_ _nmtoken_ is intentionally defined to be the same as XML's [Nmtoken](https://www.w3.org/TR/xml/#NT-Nmtoken)
in order to increase the interoperability with data defined in XML.
In particular, the grammatical data [specified in LDML](https://unicode.org/reports/tr35/tr35-general.html#Grammatical_Features)
and [defined in CLDR](https://unicode-org.github.io/cldr-staging/charts/latest/grammar/index.html)
uses Nmtokens.

```abnf
variable = "$" name
function = (":" / "+" / "-") name
```

```abnf
name = name-start *name-char ; based on https://www.w3.org/TR/xml/#NT-Name, but cannot start with U+003A COLON ":"
nmtoken = 1*name-char ; equal to https://www.w3.org/TR/xml/#NT-Nmtoken
name = name-start *name-char
name-start = ALPHA / "_"
/ %xC0-D6 / %xD8-F6 / %xF8-2FF
/ %x370-37D / %x37F-1FFF / %x200C-200D
/ %x2070-218F / %x2C00-2FEF / %x3001-D7FF
/ %xF900-FDCF / %xFDF0-FFFD / %x10000-EFFFF
name-char = name-start / DIGIT / "-" / "." / ":"
/ %xB7 / %x0300-036F / %x203F-2040
name-char = name-start / DIGIT / "-" / "." / ":"
/ %xB7 / %x300-36F / %x203F-2040
```

### Escape Sequences

Escape sequences are introduced by the backslash character (`\`) and allow the appearance of lexically meaningful characters in the body of `text`, `literal`, or `reserved` sequences respectively:
Escape sequences are introduced by U+005C REVERSE SOLIDUS `\`
and allow the appearance of lexically meaningful characters
in the body of `text`, `quoted`, or `reserved` sequences respectively:

```abnf
text-escape = backslash ( backslash / "{" / "}" )
literal-escape = backslash ( backslash / "|" )
quoted-escape = backslash ( backslash / "|" )
reserve-escape = backslash ( backslash / "{" / "|" / "}" )
backslash = %x5C ; U+005C REVERSE SOLIDUS "\"
```
Expand Down