Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flow mapping enhancement, omitting the currently required space after colon #267

Closed
revintec opened this issue Jan 6, 2022 · 12 comments
Closed

Comments

@revintec
Copy link

revintec commented Jan 6, 2022

currently {a:b} parses to {"a:b":null}
this key -> null mapping should be an uncommon case
instead I propose pasring {a:b} to {"a":"b"}

from here https://yaml.org/spec/1.2.2/#742-flow-mappings it states that

Normally, YAML insists the “:” mapping value indicator be separated from the value by white space. A benefit of this restriction is that the “:” character can be used inside plain scalars, as long as it is not followed by white space. This allows for unquoted URLs and timestamps. It is also a potential source for confusion as “a:1” is a plain scalar and not a key/value pair.

I think a:b should still parse to "a:b", but {a:b} should have no ambiguity here, to {"a":"b"}
also, {a:b:c} should be parsed to {"a":"b:c"} instead of {"a:b":"c"}
(may be only apply this inside FLOW-KEY context?

@perlpunk
Copy link
Member

perlpunk commented Jan 7, 2022

It might be uncommon, but having a mapping with urls as keys might be a use case, e.g.
{ http://foo.example, http://example.org }

I believe your suggested change is very parser-unfriendly, and I personally don't want to implement it. The parser has to go back and reparse the whole string.
It also seems not very user-friendly, because right now users know that there has to be a space, and with your suggestion it depends.

edit: What about [a:b], by the way?

@UnePierre
Copy link

Maybe (but I'm probably not aware of further consequences), it would help to parse like this?

  1. a:b --> "a:b" -- no whitespace (no change)
  2. a: b --> "a" : "b" -- whitespace behind colon (no change)
  3. a :b --> "a" : "b" -- whitespace before colon (behavioral change)
  4. a : b --> "a" : "b" -- whitespace both sides (no change)

Trying with https://codebeautify.org/yaml-parser-online gives me:
a:b:c --> "a:b:c"
a : b : c --> "a" : "b : c" -- I'm not sure whether I find this intuitive. I expected --> "a" : { "b" : "c" }.

@perlpunk
Copy link
Member

perlpunk commented Jan 7, 2022

@UnePierre a : b : c is invalid YAML.
that yaml parser you linked to does't even say which parser it uses.
Try https://play.yaml.io/main/parser?input=YSA6IGIgOiBj instead.

@perlpunk
Copy link
Member

perlpunk commented Jan 7, 2022

@UnePierre what does your suggestion have to do with @revintec 's suggestion?

@UnePierre
Copy link

@UnePierre a : b : c is invalid YAML. that yaml parser you linked to does't even say which parser it uses. Try https://play.yaml.io/main/parser?input=YSA6IGIgOiBj instead.

"Invalid" is also fine and understandable. If one-liners are necessary, flow-style with curly brackets can be used.

(I sent a suggestion for improvement to codebeautifier.org linking back here.)

@UnePierre
Copy link

@UnePierre what does your suggestion have to do with @revintec 's suggestion?

If you think, investigating the other possible cases isn't worth: never mind.
I can understand, that parsing with only one look-ahead is way simpler, and probably more efficient.

@revintec
Copy link
Author

revintec commented Jan 8, 2022

It might be uncommon, but having a mapping with urls as keys might be a use case, e.g. { http://foo.example, http://example.org }

I believe your suggested change is very parser-unfriendly, and I personally don't want to implement it. The parser has to go back and reparse the whole string. It also seems not very user-friendly, because right now users know that there has to be a space, and with your suggestion it depends.

edit: What about [a:b], by the way?

{ http://foo.example, http://example.org } should be rarely need I think?
if we want a hashSet, we should be using hashSet directly, or a list(like [http://foo.example,http://example.org]?
if we need the old behavior(and yes here comes the breaking change in grammar...sad){ "http://..." }

image

image

I sometimes want very compact lines, without any blank/space, because blank/space mess with line wrap and/or have special treatment in shells and may not be that noticeable(from screenshot or whatever, we can't see if there is a blank/space or not). I use them like this

parama: xxx
paramb: yyy
defaults:
 - {a:b,c:d}
 - {e:f,g:h}
 - ...

I've written some parser/laxer before, it should be parser friendly/fairly easy to implement, just parse differently in FLOW_KEY context should be fine. I've done a POC here(manually tested, not sure it's fully spec compliant

reference implementation as mentioned here(https://yaml.org#JavaScript
https://github.com/nodeca/js-yaml

From d568d424469757ad38911967eb0a7b57c1c98932 Mon Sep 17 00:00:00 2001
From: Revin <caoyunbin001@126.com>
Date: Thu, 6 Jan 2022 18:29:45 +0800
Subject: [PATCH] js-yaml.js: https://github.com/yaml/yaml-spec/issues/267

---
 front_end/concorde/js-yaml.js | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/front_end/concorde/js-yaml.js b/front_end/concorde/js-yaml.js
index 3b3ad9e..8727220 100644
--- a/front_end/concorde/js-yaml.js
+++ b/front_end/concorde/js-yaml.js
@@ -1456,6 +1456,7 @@ function readPlainScalar(state, nodeIndent, withinFlowCollection) {
 
   while (ch !== 0) {
     if (ch === 0x3A/* : */) {
+      if(state.readingKey)break;
       following = state.input.charCodeAt(state.position + 1);
 
       if (is_WS_OR_EOL(following) ||
@@ -1709,7 +1710,9 @@ function readFlowCollection(state, nodeIndent) {
     }
 
     _line = state.line;
+    state.readingKey=isMapping;
     composeNode(state, nodeIndent, CONTEXT_FLOW_IN, false, true);
+    delete state.readingKey;
     keyTag = state.tag;
     keyNode = state.result;
     skipSeparationSpace(state, true, nodeIndent);
-- 
2.31.0

@perlpunk
Copy link
Member

perlpunk commented Jan 8, 2022

Please don't post screenshots of text.

You wrote:

currently {a:b} parses to {"a:b":null}
this key -> null mapping should be an uncommon case

so I thought you were explicitly talking about the case when there is an empty value.
I thought you would want {a:b} to parse as {"a": "b"} and {a:b: value} still to parse as {"a:b": "value"}. But you simply left out this case and only said an empty value would be uncommon. Please be more clear next time.

But your suggestion is to not require a space after a colon in flow mappings in general. Of course this is not complicated. No proof needed.

You mentioned a use case for it. I mentioned a use case for colons in the keys.
You wrote { http://foo.example, http://example.org } is rare, I say, now that I see what you are actually suggesting:
{ http://foo.example: value, http://example.org: value } might be less rare. Also in some languages class names have colons, e.g. { YAML::Parser: value }

You have a use case, I have a use case. Currently you have to add a space after the colon. With your suggestion, keys with colons would need quotes.
You can't make everyone happy.

@perlpunk
Copy link
Member

perlpunk commented Jan 8, 2022

Currently [a: b] parses the same as [{"a": "b"}].
I read from your screenshot that [a:b] would still be parsed as ["a:b"]. This is a bit inconsistent and might require an extra condition in parsers.

@perlpunk
Copy link
Member

perlpunk commented Jan 8, 2022

If you think, investigating the other possible cases isn't worth: never mind.

No, I just don't get how your suggestion does have anything to do with @revintec 's suggestion, like I said.
You suggest that a :b parses as "a": "b" and everything else should stay the same. @revintec does not want a space at all.

@revintec
Copy link
Author

revintec commented Jan 10, 2022

sorry if there is any misunderstanding
I didn't think of the {a:b: c} case, as one would argue, using naked non-alphabetical keys are bad practice
its hard to tell(esp. for some font) if there is any blank/space present around non-alphabetical characters(aka, symbols like ':')
I suppose {a:b: c} should be parsed to {"a":"b: c"} but it should be written as {a:"b: c"} anyways... or {"a:b":c} if we want {"a:b":"c"}

I think you're right mentioning that {http://xxx: value} is less rare, but my point is:

  1. we should not rely on non-visible characters for controls, like : and/or : (it's a tab). Python's block indentation(mixed with blank/spaces and tab) is always a headache, esp. when copy/pasting and changing old code(that still uses tabs
  2. using naked non-alphabetical keys should be discouraged, they should be quoted with ". if there is a :, it should be considered a key separator in contexts like reading a key. but I deliberately left out the [a:b] case, because I think arrays doesn't need keys, I might just keep the status-quo

I think removing the requirement of having a blank/space after the colon would make the language more compact and easier to use, I could be wrong
anyway, thanks for the input about class names having colons, didn't think of that

@ingydotnet
Copy link
Member

The original intent was to make the key value separator always be : since it would be fairly common to have : characters in plain string values.

JSON allows for {"k":"v"} so we made an exception that flow collection pairs with quoted keys may be separated by :. (to keep JSON as a YAML subset)

See: https://play.yaml.io/main/parser?input=LSB7YTpiOiBjfQotIHtodHRwOi8vb3V0LmlufQotIGE6CWIKLSB7YToJYn0KLSBbYTpiXQotIFthOiBiXQotIFsiYSI6Yl0KLSBbYSA6Yl0K

12 of 15 parsers get these exactly right.

There are many YAML syntax issues we are looking to address in YAML 1.3 but this (colon+ws) is pretty solid.

eg https://play.yaml.io/main/parser?input=IyAzIHBhcnNlIHNhbWU6Ci0gWyB7IGE6IGIgfSBdICMgQmVzdCBwcmFjdGljZQotIFsgPyBhOiBiIF0gICAjIFF1ZXN0aW9uIG1hcmsgc3ludGF4IGlzIHVnbHkKLSBbIGE6IGIgXSAgICAgIyBWZXJ5IGNvbXBsZXggdG8gaW1wbGVtZW50Lgo=

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants