-
Notifications
You must be signed in to change notification settings - Fork 9.1k
v3.2: Guidance on searching and evaluating schemas #4743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3.2-dev
Are you sure you want to change the base?
Conversation
Some OAS features casually state that they depend on the type of data being examined, or implicitly carry ambiguity about how to determine how to parse the data. This section attempts to provide some guidance and limits, requiring only that implementations follow the unambiguous, statically deterministic keywords `$ref` and `allOf`. It also provides for just validating the data (when possible) and using the actual in-memory type when a schema is too complex to analyze statically. One use of this is breaking apart schemas to use them with mixed binary and JSON-compatible data, and a new section has been added to address that. Finally, a typo in a related section was fixed.
@karenetheridge while I have your attention, do you think this is fine where it is or should it go under the Schema Object somewhere? I really could not decide. |
I'm putting this in draft because based on @karenetheridge's feedback I'm going to rework it fairly substantially, but it's still of use when understanding how it fits with the other related PRs. The effect of the rewrite should be the same, but I think the wording and organization will be significantly different. It's clear that the different use cases here need to be separated out and clarified. I think this ended up being a bit oddly abstract because of how I tried to split things up into PRs that don't conflict. |
Move things under the Schema Object, organize by use case and by the point in the process at which things occur, and link directly from more parts of the spec so that the parts in the Schema Object section can stay more focused.
I have added a commit that almost totally rewrites this- you probably just want to review the whole thing and not look at the per-commit diff as it will be a mess. The new version:
I do not think that has changed anything substantial, but it's essentially a new PR now. |
@karenetheridge I'm going to mark various threads as resolved since the text is now so different that they are confusing- please do not take that to mean I'm dismissing open questions, please just re-start whatever is needed with comments on the new text, or as new top-level comments. Apologies for the inconvenience. |
Co-authored-by: Karen Etheridge <ether@cpan.org>
Also clarify that there is no one set list of keywords to search for, but rather each use case defines what is relevant.
@karenetheridge I trimmed back the multi-valued |
@karenetheridge I'm marking various threads resolved as I think subsequent commits addressed them, and it's a lot of at least somewhat outdated discussion for folks to have to read through before tomorrow's call. Please feel free to re-raise anything that is still not addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small edit (not a change introduced by you, but still an improvement I think).
@@ -2599,6 +2601,10 @@ Note that JSON Schema Draft 2020-12 does not require an `x-` prefix for extensio | |||
The [`format` keyword (when using default format-annotation vocabulary)](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-7.2.1) and the [`contentMediaType`, `contentEncoding`, and `contentSchema` keywords](https://www.ietf.org/archive/id/draft-bhutton-json-schema-validation-01.html#section-8.2) define constraints on the data, but are treated as annotations instead of being validated directly. | |||
Extended validation is one way that these constraints MAY be enforced. | |||
|
|||
In addition to extended validation, annotations are the most effective way to determine whether these keywords impact the type and structure of the fully parsed data. | |||
For example, formats such as `int64` can be applied to JSON strings, as JSON numbers have limitations that make large integers non-portable. | |||
If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and MUST document the limitations this imposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and MUST document the limitations this imposes. | |
If annotation collection is not available, implementations MUST perform a [schema search](#searching-schemas) for these keywords, and SHOULD document the limitations this imposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(removed, commented in wrong section)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions from the TSC call
For example, if `foo` had the schema `{"type": "string", "format": "int64")`, the data structure used for validation would still be the same, but the application will need to convert the string `"42"` to the 64-bit integer `42`. | ||
Similarly, the `content*` keywords can indicate further structure within a string. | ||
|
||
Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and MUST document which approach it implements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and MUST document which approach it implements. | |
Implementations MUST either use [annotation collection](#extended-validation-with-annotations) to gather this information, or perform a [schema search](#searching-schemas), and SHOULD document which approach it implements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the meeting, if implementations don't do this, what would they do instead? If there isn't anything they can do, then I think the MUST would stand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really did not expect this PR to get hung up on a debate about how much to require implementations to document their behavior. Which I thought would be thoroughly non-controversial. Why would we not want them to do so?
So... I have no idea. I want everyone else to resolve their differences around documentation requirements so it doesn't hang up this PR, that's my opinion on the matter.
|
||
Implementations MUST document which strategy or strategies they use, as well as any known limitations. | ||
|
||
##### Searching Schemas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about moving this section a little further up the document, who has thoughts?
1. Use a placeholder value, on the assumption that no assertions will apply to the binary data and no conditional schema keywords will cause the schema to treat the placeholder value differently (e.g. a part that could be either plain text or binary might behave unexpectedly if a string is used as a binary placeholder, as it would likely be treated as plain text and subject to different subschemas and keywords). | ||
2. Perform [schema searches](#searching-schemas) to find the appropriate keywords (`properties`, `prefixItems`, etc.) in order to break up the subschemas and apply them separately to binary and JSON-compatible data. | ||
|
||
Implementations MUST document which strategy or strategies they use, as well as any known limitations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementations MUST document which strategy or strategies they use, as well as any known limitations. | |
Implementations SHOULD document which strategy or strategies they use, as well as any known limitations. |
Co-authored-by: Karen Etheridge <ether@cpan.org>
@lornajane @karenetheridge @duncanbeevers Can y'all sort out what we should be doing on documentation requirements and why? I have no idea why MUST requirements around documenting behavior are controversial, but all I really care about is that this does not hang up this PR. It sounds like @karenetheridge is disagreeing on one? I just want a broadly applicable rule that tells me what to do here. |
NOTE 1: This is intended to clarify requirements that already exist but have never been well-defined, both by making certain things required and stating clearly that other things are not. It is particularly relevant in light of the Encoding Object changes, although the vaguely-defined behavior predates the new features.
Some OAS features casually state that they depend on the type of data being examined, or implicitly carry ambiguity about how to determine how to parse the data.
This section attempts to provide some guidance and limits, requiring only that implementations follow the unambiguous, statically deterministic keywords
$ref
andallOf
.It also provides for just validating the data (when possible) and using the actual in-memory type when a schema is too complex to analyze statically.
One use of this is breaking apart schemas to use them with mixed binary and JSON-compatible data, and a new section has been added to address that.
Finally, a typo in a related section was fixed.