Skip to content

Latest commit

 

History

History
304 lines (246 loc) · 9.94 KB

README.md

File metadata and controls

304 lines (246 loc) · 9.94 KB

DRAFT MessageFormat 2.0 Data Model

This section defines a data model representation of MessageFormat 2 messages.

Implementations are not required to use this data model for their internal representation of messages. Neither are they required to provide an interface that accepts or produces representations of this data model.

The major reason this specification provides a data model is to allow interchange of the logical representation of a message between different implementations. This includes mapping legacy formatting syntaxes (such as MessageFormat 1) to a MessageFormat 2 implementation. Another use would be in converting to or from translation formats without the need to continually parse and serialize all or part of a message.

Implementations that expose APIs supporting the production, consumption, or transformation of a message as a data structure are encouraged to use this data model.

This data model provides these capabilities:

  • any MessageFormat 2 message (including future versions) can be parsed into this representation
  • this data model representation can be serialized as a well-formed MessageFormat 2 message
  • parsing a MessageFormat 2 message into a data model representation and then serializing it results in an equivalently functional message

This data model might also be used to:

  • parse a non-MessageFormat 2 message into a data model (and therefore re-serialize it as MessageFormat 2). Note that this depends on compatibility between the two syntaxes.
  • re-serialize a MessageFormat 2 message into some other format including (but not limited to) other formatting syntaxes or translation formats.

To ensure compatibility across all platforms, this interchange data model is defined here using TypeScript notation. Two equivalent definitions of the data model are also provided:

  • message.json is a JSON Schema definition, for use with message data encoded as JSON or compatible formats, such as YAML.
  • message.dtd is a document type definition (DTD), for use with message data encoded as XML.

Note that while the data model description below is the canonical one, the JSON and DTD definitions are intended for interchange between systems and processors. To that end, they relax some aspects of the data model, such as allowing declarations, options, and attributes to be optional rather than required properties.

Note

Users relying on XML representations of messages should note that XML 1.0 does not allow for the representation of all C0 control characters (U+0000-U+001F). Except for U+0000 NULL , these characters are allowed in MessageFormat 2 messages, so systems and users relying on this XML representation for interchange might need to supply an alternate escape mechanism to support messages that contain these characters.

Important

The data model uses the field name name to denote various interface identifiers. In the MessageFormat 2 syntax, the source for these name fields sometimes uses the production identifier. This happens when the named item, such as a function, supports namespacing.

In the Tech Preview, feedback on whether to separate the namespace from the name and represent both separately, or just, as here, use an opaque single field name is desired.

Messages

A SelectMessage corresponds to a syntax message that includes selectors. A message without selectors and with a single pattern is represented by a PatternMessage.

In the syntax, a PatternMessage may be represented either as a simple message or as a complex message, depending on whether it has declarations and if its pattern is allowed in a simple message.

type Message = PatternMessage | SelectMessage;

interface PatternMessage {
  type: "message";
  declarations: Declaration[];
  pattern: Pattern;
}

interface SelectMessage {
  type: "select";
  declarations: Declaration[];
  selectors: Expression[];
  variants: Variant[];
}

Each message declaration is represented by a Declaration, which connects the name of a variable with its expression value. The name does not include the initial $ of the variable.

The name of an InputDeclaration MUST be the same as the name in the VariableRef of its VariableExpression value.

An UnsupportedStatement represents a statement not supported by the implementation. Its keyword is a non-empty string name (i.e. not including the initial .). If not empty, the body is the "raw" value (i.e. escape sequences are not processed) starting after the keyword and up to the first expression, not including leading or trailing whitespace. The non-empty expressions correspond to the trailing expressions of the reserved statement.

Note

Be aware that future versions of this specification might assign meaning to reserved statement values. This would result in new interfaces being added to this data model.

type Declaration = InputDeclaration | LocalDeclaration | UnsupportedStatement;

interface InputDeclaration {
  type: "input";
  name: string;
  value: VariableExpression;
}

interface LocalDeclaration {
  type: "local";
  name: string;
  value: Expression;
}

interface UnsupportedStatement {
  type: "unsupported-statement";
  keyword: string;
  body?: string;
  expressions: Expression[];
}

In a SelectMessage, the keys and value of each variant are represented as an array of Variant. For the CatchallKey, a string value may be provided to retain an identifier. This is always '*' in MessageFormat 2 syntax, but may vary in other formats.

interface Variant {
  keys: Array<Literal | CatchallKey>;
  value: Pattern;
}

interface CatchallKey {
  type: "*";
  value?: string;
}

Patterns

Each Pattern contains a linear sequence of text and placeholders corresponding to potential output of a message.

Each element of the Pattern MUST either be a non-empty string, an Expression, or a Markup object. String values represent literal text. String values include all processing of the underlying text values, including escape sequence processing. Expression wraps each of the potential expression shapes. Markup wraps each of the potential markup shapes.

Implementations MUST NOT rely on the set of Expression and Markup interfaces defined in this document being exhaustive. Future versions of this specification might define additional expressions or markup.

type Pattern = Array<string | Expression | Markup>;

type Expression =
  | LiteralExpression
  | VariableExpression
  | FunctionExpression
  | UnsupportedExpression;

interface LiteralExpression {
  type: "expression";
  arg: Literal;
  annotation?: FunctionAnnotation | UnsupportedAnnotation;
  attributes: Attribute[];
}

interface VariableExpression {
  type: "expression";
  arg: VariableRef;
  annotation?: FunctionAnnotation | UnsupportedAnnotation;
  attributes: Attribute[];
}

interface FunctionExpression {
  type: "expression";
  arg?: never;
  annotation: FunctionAnnotation;
  attributes: Attribute[];
}

interface UnsupportedExpression {
  type: "expression";
  arg?: never;
  annotation: UnsupportedAnnotation;
  attributes: Attribute[];
}

interface Attribute {
  name: string;
  value?: Literal | VariableRef;
}

Expressions

The Literal and VariableRef correspond to the the literal and variable syntax rules. When they are used as the body of an Expression, they represent expression values with no annotation.

Literal represents all literal values, both quoted and unquoted. The presence or absence of quotes is not preserved by the data model. The value of Literal is the "cooked" value (i.e. escape sequences are processed).

In a VariableRef, the name does not include the initial $ of the variable.

interface Literal {
  type: "literal";
  value: string;
}

interface VariableRef {
  type: "variable";
  name: string;
}

A FunctionAnnotation represents a function annotation. The name does not include the : starting sigil.

Each option is represented by an Option.

interface FunctionAnnotation {
  type: "function";
  name: string;
  options: Option[];
}

interface Option {
  name: string;
  value: Literal | VariableRef;
}

An UnsupportedAnnotation represents a private-use annotation not supported by the implementation or a reserved annotation. The source is the "raw" value (i.e. escape sequences are not processed), including the starting sigil.

When parsing the syntax of a message that includes a private-use annotation supported by the implementation, the implementation SHOULD represent it in the data model using an interface appropriate for the semantics and meaning that the implementation attaches to that annotation.

interface UnsupportedAnnotation {
  type: "unsupported-annotation";
  source: string;
}

Markup

A Markup object has a kind of either "open", "standalone", or "close", each corresponding to open, standalone, and close markup. The name in these does not include the starting sigils # and / or the ending sigil /. The optional options for markup use the same Option as FunctionAnnotation.

interface Markup {
  type: "markup";
  kind: "open" | "standalone" | "close";
  name: string;
  options: Option[];
  attributes: Attribute[];
}

Extensions

Implementations MAY extend this data model with additional interfaces, as well as adding new fields to existing interfaces. When encountering an unfamiliar field, an implementation MUST ignore it. For example, an implementation could include a span field on all interfaces encoding the corresponding start and end positions in its source syntax.

In general, implementations MUST NOT extend the sets of values for any defined field or type when representing a valid message. However, when using this data model to represent an invalid message, an implementation MAY do so. This is intended to allow for the representation of "junk" or invalid content within messages.