Skip to content

Conversation

@eemeli
Copy link
Collaborator

@eemeli eemeli commented Jul 29, 2023

At the moment, the data model includes a boolean quoted property on the Literal construct. This should be dropped, as it's not meant to effect anything during the runtime.

As the data model is extensible by implementations, this does allow for an implementation to add the field back in as a private extension, should it have a need to track this information.

@eemeli eemeli added the data model Issues related to the Interchange Data Model label Jul 29, 2023
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. Some wording tweaks suggested.

Both _quoted_ and _unquoted_ values are represented by `Literal`,
as the use or lack of quotation is a presentation detail
which has no effect on the meaning of the _literal_.
The `value` of `Literal` is the "cooked" value (i.e. escape sequences are processed).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid the jargon-ish "cooked" here and I would note the non-inclusion of the quotes (where they exist and as you did for reserved's sigils elsewhere)

Suggested change
The `value` of `Literal` is the "cooked" value (i.e. escape sequences are processed).
The `value` of `Literal` does not include surrounding quotes (where present)
and replaces `quoted-escape` sequences with the unescaped character.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like for this to be considered in a separate issue or PR, for two reasons:

  1. The data model uses the "raw" and "cooked" terms also with respect to Text and Reserved, which ought to be updated simultaneously. I would rather keep that outside the scope of this rather focused PR.
  2. At the moment, the data model is described and explained via equivalences with the MF2 syntax. If there is a desire to describe it as an explicit result of parsing the syntax, as suggested by the term "replaces" here, that's a much bigger change that ought to be accompanied by some additional documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "raw" and "cooked" are terms, define them has terms (in a terminology section that needs to be added) and link them.

A different way of saying this would be to go more Unicode-like:

Suggested change
The `value` of `Literal` is the "cooked" value (i.e. escape sequences are processed).
The `value` of `Literal` is the code point sequence contained by the _literal_,
with external syntax (such as quotes) removed
and escape sequences resolved to the characters that they represent.

This would apply to any representation, not just MF2. For example, a JS string would replace \u20ac notation with in the Literal.

If there is a desire to describe it as an explicit result of parsing the syntax, as suggested by the term "replaces" here, that's a much bigger change that ought to be accompanied by some additional documentation.

I think we should stipulate that the data model representation can round-trip any MF2 string without loss of information, although doing so would canonicalize the representation, such as syntax (non-literal) whitespace and the presence or absence of quotes around literals such that the resulting round-trip string might not be a character-by-character match to the original.

Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
@eemeli eemeli requested a review from aphillips August 1, 2023 15:13
Copy link
Contributor

@ryzokuken ryzokuken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absence of this flag in the data model, do we still need to make this explicit distinction between "quoted" and "unquoted" values ?

Literal represents all literal values, both quoted and unquoted.
The presence or absence of quotes is not preserved by the data model.

If this is implied by the absence of the flag, then perhaps it's more confusing to leave it in as opposed to just dropping these lines.

@eemeli
Copy link
Collaborator Author

eemeli commented Aug 4, 2023

In the absence of this flag in the data model, do we still need to make this explicit distinction between "quoted" and "unquoted" values ?

I think it's good to include, to clarify that the mapping of these potentially separately representable syntax rules into a single data model interface is wholly intentional.

@aphillips aphillips merged commit 6ae5373 into unicode-org:main Aug 7, 2023
@eemeli eemeli deleted the no-quoted branch August 7, 2023 20:00
eemeli added a commit to messageformat/messageformat that referenced this pull request Aug 13, 2023
XM5jDcsHTyGJtQqlCi added a commit to XM5jDcsHTyGJtQqlCi/messageformat that referenced this pull request Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data model Issues related to the Interchange Data Model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants