-
-
Notifications
You must be signed in to change notification settings - Fork 660
Draft: #2298/change yaml parser to a maintained one #2392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Draft: #2298/change yaml parser to a maintained one #2392
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor remark, more from an end user than a real code review
cmd/root.go
Outdated
switch yamlParser { | ||
case "goccy", "": | ||
yqlib.ConfiguredYamlPreferences.UseGoccyParser = true | ||
case "v3": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it "legacy-v3", or simply "legacy"
It should be obvious this option is no longer maintained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good note, thanks will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
… place, add datetime preprocessor for date time difference
…l.v3 preserves them while goccy normalises them)
pkg/yqlib/datetime_preprocessor.go
Outdated
// Try to parse date-only format | ||
if _, err := time.Parse("2006-01-02", value); err == nil { | ||
return true | ||
} | ||
|
||
// Try RFC3339 without timezone | ||
if _, err := time.Parse("2006-01-02T15:04:05", value); err == nil { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are constants in time package for this.
You can use usestdlibvars linter to identify them easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly usestdlibvars did not show any errors, maybe my setup has a problem...
Anyway fixed (next commit)
pkg/yqlib/datetime_preprocessor.go
Outdated
// Skip empty lines, comments, directives, document separators | ||
if trimmed == "" || strings.HasPrefix(trimmed, "#") || | ||
strings.HasPrefix(trimmed, "%") || strings.HasPrefix(trimmed, "---") || | ||
strings.HasPrefix(trimmed, "...") { | ||
return true | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest a switch with one case (or multiple) it would be more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/yqlib/datetime_preprocessor.go
Outdated
if strings.HasSuffix(trimmed, "|") || strings.HasSuffix(trimmed, ">") || | ||
strings.HasSuffix(trimmed, "|-") || strings.HasSuffix(trimmed, ">-") || | ||
strings.HasSuffix(trimmed, "|+") || strings.HasSuffix(trimmed, ">+") { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/yqlib/datetime_preprocessor.go
Outdated
strings.HasPrefix(trimmedValue, "\"") || | ||
strings.HasPrefix(trimmedValue, "'") || | ||
strings.HasPrefix(trimmedValue, "{") || | ||
strings.HasPrefix(trimmedValue, "[") || | ||
strings.HasPrefix(trimmedValue, "&") || | ||
strings.HasPrefix(trimmedValue, "*") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a switch on trimmedValue[0] would be more readable
pkg/yqlib/datetime_preprocessor.go
Outdated
// Skip if value is empty, quoted, or already complex | ||
if trimmedValue == "" || | ||
strings.HasPrefix(trimmedValue, "\"") || | ||
strings.HasPrefix(trimmedValue, "'") || | ||
strings.HasPrefix(trimmedValue, "{") || | ||
strings.HasPrefix(trimmedValue, "[") || | ||
strings.HasPrefix(trimmedValue, "&") || | ||
strings.HasPrefix(trimmedValue, "*") { | ||
return line | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/yqlib/decoder_goccy_yaml.go
Outdated
var commentLineRegEx = regexp.MustCompile(`^\s*#`) | ||
var yamlDirectiveLineRegEx = regexp.MustCompile(`^\s*%YA`) // e.g., %YAML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this out, otherwise the regexp is computed each time the function is called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/yqlib/datetime_preprocessor.go
Outdated
// ISO8601 date/time patterns that should be automatically tagged as timestamps | ||
var dateTimePatterns = []*regexp.Regexp{ | ||
// RFC3339 / ISO8601 with timezone: 2006-01-02T15:04:05Z or 2006-01-02T15:04:05+07:00 | ||
regexp.MustCompile(`^\s*([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(?:\.[0-9]+)?(?:Z|[+-][0-9]{2}:[0-9]{2}))\s*$`), | ||
// Date only: 2006-01-02 | ||
regexp.MustCompile(`^\s*([0-9]{4}-[0-9]{2}-[0-9]{2})\s*$`), | ||
// RFC3339 without timezone: 2006-01-02T15:04:05 | ||
regexp.MustCompile(`^\s*([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(?:\.[0-9]+)?)\s*$`), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this aside the function that use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@ccoVeille Thank you very much for the review. I will be addressing the comments tomorrow. And for the parser change, Goccy handles comments differently and this will break workflows for some users. I managed to equalize the behavior around timestamps and dates, but the comments will brake stuff. Also some arrays and anchors are handled differently, which will change the print out but the structure will be kept. So logical flows should not be impacted in contrast to the comments difference where if someone depends on some comment being attached to a specific node, his flow will break with the Goccy parser. I will make a bit longer explanation of all the differences that I found and how they are addressed in this PR. Of course I am very opened for suggestions and comments. I will be working on this this and next week. |
Co-authored-by: ccoVeille <3875889+ccoVeille@users.noreply.github.com>
Co-authored-by: ccoVeille <3875889+ccoVeille@users.noreply.github.com>
You're racing ahead here, I'd slow this down a little - the go-yaml parser now has the support of the official Yaml org - I've been chatting with them and they will be soon addressing the issues raised. I'd prefer to keep go-yaml and maintain comments, anchors, array and formatting etc. |
ThesisYou do make a good point: As I get it: Why to work on this if we can just wait for https://github.com/yaml/go-yaml and then just replace a dependency (theoretically). I can state an argument for having a goccy parser as an alternative behind a flag:
go-yaml vs goccy parsing differences (not mitigated already in this PR)The differences between go-yaml and goccy displayed visually: 1. Comments association on array elements (BREAKING change if someone depends on comments inside the yaml file)Can be mitigated with a pre processor that moves the comments so the user gets the same result in the end. Input: name:
# under-name-comment
- first-array-child In this example go-yaml associates the comment with the first element of the array, while goccy associates the comment with the array itself 2. Flow maps will be expanded:For input: {a: 1, b: 2} go-yaml produces the same as output. goccy produces: a: 1
b: 2 No workflows will be broken as the data structure is the same, formatting is the only change here. It can be handled in some complicated pre + post processing but fairly I don't see this as a braking change, no one should depend on the formatting of the output yaml. 3. Merge anchorGiven: foo: &foo
a: foo_a
thing: foo_thing
c: foo_c
foobar:
c: foobar_c
<<: *foo
thing: foobar_thing go-yaml output for .foobar: c: foobar_c
!!merge <<: *foo
thing: foobar_thing Goccy output for .foobar: <<:
a: foo_a
c: foo_c
thing: foo_thing
c: foobar_c
thing: foobar_thing For me goccy seems to be the cleaner more informative output, again if you really want to this can be handled in post processor. 4. Duplicate keysgo-yaml allows for duplicate keys I think this is a feature rather than a bug. Again pre-processing can be used to level the behavior if this is what you believe is best. 5. Goccy is stricter on some invalid YAML casesI think this is a feature. ConclusionI do think goccy should be supported behind a flag for the use cases that have been waiting for years for a fix in go-yaml and the same will have to wait for years more even with the support of the YAML foundation. Not implementing goccy as a parser or any other parser than go-yaml forces anyone that want to use yq to fix go-yaml and wait for their changes to be merged, which can take months or years even if they have the capacity to fix go-yaml. More or less a parallel parser can save hundreds to thousands of work hours for the users of yq. |
P.S. if you see value in having goccy as a parser and you want some or all of the differences to be handled with pre and/or post processors please do write so and I will add them to this PR. |
There is a |
No description provided.