Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty keys in tests 2JQS and PW8X #40

Closed
wants to merge 4 commits into from

Conversation

eemeli
Copy link
Member

@eemeli eemeli commented Jul 9, 2018

This fixes #25 by marking 2JQS as an error due to the repeated null key, and adjusts PW8X to avoid repeating null keys, allowing that test to test what it's supposed to test without error.

Tests E76Z and X38W are also mentioned in #25, but their alias node keys are not necessarily equal to their anchor nodes, and hence can be considered correct as is.

@hvr
Copy link
Collaborator

hvr commented Jul 9, 2018

I can see why the yaml2json tests require specifying a schema, but I'm unhappy to require this at the yaml2event level as well, as that's a level that is supposed to be schema agnostic (as the schema application occurs at a higher layer).

PS: Also, I'd like to see the testsuite be extended to test all 3 specified yaml schemas as that's one area where I currently have to write my own tests, as I implement all 3 schemas in HsYAML.

@eemeli
Copy link
Member Author

eemeli commented Jul 9, 2018

I believe that the fixes proposed here are (or at least should be) equivalent for all schemas, as the comparison that we're talking about is whether a node with no explicit tag and an empty string value is equivalent to itself, i.e. null == null or '' == '' depending on the schema. Both of those comparsions should produce the same true result.

@perlpunk
Copy link
Member

perlpunk commented Jul 9, 2018

Thanks.
Like discussed in #25 we can't mark 2JQS as an error, because at the parsing level it's not an error yet.
So we need to find a better way to mark errors, like for example a loader-error tag.

@perlpunk
Copy link
Member

perlpunk commented Jul 9, 2018

@eemeli I think the question is, where this check should be implemented. I currently think the parser should not care at all about this, even if two empty keys look perfectly equal.
If you're going to implement a check in the loader for the more complicated cases, then having a check in the parser and the loader seems like too much work, and you would have to decide which cases are simple enough for the parser to decide and which not.

In the end, the mapping you are loading might not end up at all as a mapping/dictionary. A processor on top of the parser can do a lot of things with the parsing events additionally to just create a mapping data structure.

Changing PW8X might still be a good idea, though.

@eemeli
Copy link
Member Author

eemeli commented Jul 9, 2018

Just to verify what we're talking about, are we agreed that the in-yaml of 2JQS does not represent valid YAML, or is there an argument being made that a schema exists that would allow it to be parsed into a representative data structure?

Now, presuming that the in-yaml is invalid, the question (as I understand it) is then twofold:

  1. Should a key equivalence check be expected to be made during the event stream's emission?
  2. If the error would be only caught in later stages, should the test be marked as an error?

Regarding the first question, I don't really have an opinion. My JS yaml library processes input a bit differently than most other YAML libraries and uses a different internal representation of the nodes, so I'm generating the events using an external harness and from later in the process.

Regarding the second question, I would have a hard time accepting that the test suite should include test input that is in error, but not have that test be marked as an error. Maybe it'd be better to just remove 2JQS to have this problem go away?

@perlpunk
Copy link
Member

perlpunk commented Jul 9, 2018

@eemeli Regarding question 1, that what I was trying to say, I believe that the parser should not be checking key equivalence, so the event stream can still stay in the test. Currently, the error tag (and error block/file) stands for parsing error (or syntax error).
So my suggestion was, that we have different kinds of errors, and it's not clear yet how to mark them as such.
We could add a new tag loader-error (or constructor-error) to say that the syntax is valid but there should be an error in the level further up in the loading process.
But since we also have already tags for 1.3 in the test suite, the tags could get confusing, and I was thinking about adding a data structure instead, for example:

1.2:
  parser: valid
  constructor: invalid
1.3:
  parser: invalid

The default would be valid.

Hopefully that makes sense now =)

@eemeli
Copy link
Member Author

eemeli commented Jul 9, 2018

I think I understand where you're coming from, but I'm not sure that separating parser/constructor/loader errors from each other is necessarily a good idea. The one-line description of the yaml-test-suite is that it's a "Comprehensive, language independent Test Suite for YAML", and to me that implies rather strongly that it should not concern itself with the implementation details of each parser or library.

Quoting from section 3.1 of the 1.2 spec:

A YAML processor need not expose the serialization or representation stages. It may translate directly between native data structures and a character stream.

As I've understood it, the intent moving forward is to make this test suite rather than the written spec be the final authority. In that context, classifying errors as belonging to some specific stage seems... surprising.

Of the current tests, 2JQS is the only one that proposes this quandary, but it is arguably a valid test, given that 18 parsers fail at it when parsing the first : a line. But on the other hand, of the five JSON results, two end up with {"":"a"} and three with {"":"b"}, which is not that good.

So to keep the good bits, I'm going to update this PR by leaving out the second pair, which will evade the key-equality question, keep the empty-key behaviour that's nominally being tested, and allow the test to not be an error. Would that be acceptable?

@eemeli
Copy link
Member Author

eemeli commented Jul 9, 2018

I also added in-json and out-yaml sections to 2JQS, because those are valid now.

@perlpunk
Copy link
Member

perlpunk commented Jul 9, 2018

I think we agree on most things, so currently I just fail to see the problem =)
The test-event format is implementation specific, yes. A processor doesn't have to split the process into levels, yes. And even if it does, it might not use an event based parsing.

If a processor just does everything in one step, then it can't use the test-event file for testing.
If a test is marked as valid syntax, but as a constructor error, then a processor that does everything in one step knows that it should treat it as an error. You don't have to implement the several levels for that.

Look at it from the perspective of libyaml, for example. libyaml is just a parser and emitter. It can give you parsing events or a tree of nodes, where each node represents the parsed mapping, sequence or scalar, with its style and other attributes. It does not implement a schema. For libyaml, everything is just a string. It also doesn't know anything about the meaning of tags.
If we made 2JQS a general error case, then libyaml would be suddenly wrong.

A processor can always just skip certain tests, anyway. There will always be implementation specific things.

Edit: libyaml is actually a bad example in this case, as it cannot parse empty keys anyway

@perlpunk
Copy link
Member

perlpunk commented Jul 9, 2018

The change of PW8X looks ok, but since @flyx added that test: @flyx, any comment on this?

@eemeli
Copy link
Member Author

eemeli commented Jul 9, 2018

I agree that 2JQS should not be a general error case; I'd gotten stuck earlier on looking at its in-yaml and seeing the error in the repeated second empty key, when the real error the test is for is not being able to handle the first empty key. So @perlpunk would you be fine with my updated change to that test? It should let the proper thing be tested, without needing to use (potentially) invalid YAML to do so.

@perlpunk
Copy link
Member

@eemeli yes, to me it looks good, but we still don't want an in-json for 2JQS because JSON doesn't have null as a key and some YAML loaders turn it into "" and some into "null".
Still would like to have @flyx' opinion on the changes as he created the tests

@ingydotnet ingydotnet deleted the branch yaml:master October 9, 2021 13:01
@ingydotnet ingydotnet closed this Oct 9, 2021
@ingydotnet ingydotnet mentioned this pull request Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests with repeated keys should be marked as errors
4 participants