Add schemas for all JSON extracts #731

tidoust · 2022-09-14T22:24:52Z

This provides a first level of schema validation for curated data extracts, see #657 for context.

Goal is to make it easier to detect and document (through a changelog, so also useful for #704) situations where we change the structure of data extracts.

Schemas, notably those that deal with parsed IDL structures, could go deeper into details.

Tests are run against the curated version of data. That is not necessary for extracts that aren't actually curated (dfns, headings, ids, links, refs), just more convenient not to have branching logic in the test code.

Creating the PR as pull request as 69 of the new tests currently fail, either because extraction logic in Reffy needs to be slightly improved to create more consistent data structures, or because of actual issues in the specs themselves (e.g. invalid URL fragments).

This provides a first level of schema validation for curated data extracts, see #657 for context. Goal is to make it easier to detect and document (through a changelog, so also useful for #704) situations where we change the structure of data extracts. Schemas, notably those that deal with parsed IDL structures, could go deeper into details. Tests are run against the curated version of data. That is not necessary for extracts that aren't actually curated (dfns, headings, ids, links, refs), just more convenient not to have branching logic in the test code.

Options need to be specified in the constructor and old `format` option no longer exists as far as I can tell. The options are set to report all errors and to include the validated to ease direct understanding of what the error is.

The URI format seems to be more picky about fragments. The update also raises an error when an ID starts with a `#`. That's allowed in theory but the few cases where this happens in practice are clearly unintended.

An empty string is usually the sign that extraction failed to work as intended. The update also drops the check on IDs that start with `#`. That analysis should typically rather be done in Strudy.

A couple of headings don't have IDs and there is not much that we can do about it.

Linked to w3c/reffy#1075 This will only work once a version of Reffy has been released that exposes the appropriate schema validation function.

No need to check whether the `validate` function exists. Test will automatically fail if it doesn't whereas we're expecting one.

This makes use of the new schema validation function in Reffy to make sure that the curated data Webref produces follow expected scheams, see: w3c/reffy#1075 This replaces #731 and fixes #657. Schemas, notably those that deal with parsed IDL structures, could go deeper into details. To be improved over time. Tests are run against the curated version of data. That is not necessary for extracts that aren't actually curated (dfns, headings, ids, links, refs), just more convenient not to have branching logic in the test code.

tidoust · 2022-09-27T08:10:13Z

Superseded by #749.

This makes use of the new schema validation function in Reffy to make sure that the curated data Webref produces follow expected scheams, see: w3c/reffy#1075 This replaces #731 and fixes #657. Schemas, notably those that deal with parsed IDL structures, could go deeper into details. To be improved over time. Tests are run against the curated version of data. That is not necessary for extracts that aren't actually curated (dfns, headings, ids, links, refs), just more convenient not to have branching logic in the test code.

tidoust added 8 commits September 15, 2022 00:04

Adjust Ajv options to log more complete errors

e36e03a

Options need to be specified in the constructor and old `format` option no longer exists as far as I can tell. The options are set to report all errors and to include the validated to ease direct understanding of what the error is.

Use "url" for URL format in JSON schema

8995d84

The URI format seems to be more picky about fragments. The update also raises an error when an ID starts with a `#`. That's allowed in theory but the few cases where this happens in practice are clearly unintended.

Forbid empty strings in schemas

f4b37ed

An empty string is usually the sign that extraction failed to work as intended. The update also drops the check on IDs that start with `#`. That analysis should typically rather be done in Strudy.

Merge branch 'main' into schemas

1fdd4fe

Make id optional for headings in dfns extract

48a93d3

A couple of headings don't have IDs and there is not much that we can do about it.

Merge branch 'main' into schemas

b7d7c8c

Merge branch 'main' into schemas

fb1bf7b

tidoust mentioned this pull request Sep 25, 2022

Add JSON schemas for extracts export validation function w3c/reffy#1075

Merged

tidoust added 5 commits September 26, 2022 09:28

Use schemas from Reffy

e89882e

Linked to w3c/reffy#1075 This will only work once a version of Reffy has been released that exposes the appropriate schema validation function.

Drop now useless dependency on Ajv

8dbc17b

Drop debug code

4b27aff

Merge branch 'main' into schemas

69e0e7f

Simplify test code

a3e95c2

No need to check whether the `validate` function exists. Test will automatically fail if it doesn't whereas we're expecting one.

tidoust mentioned this pull request Sep 27, 2022

Validate JSON data against schemas #749

Merged

tidoust closed this Sep 27, 2022

dontcallmedom deleted the schemas branch March 22, 2024 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add schemas for all JSON extracts #731

Add schemas for all JSON extracts #731

tidoust commented Sep 14, 2022

tidoust commented Sep 27, 2022

Add schemas for all JSON extracts #731

Add schemas for all JSON extracts #731

Conversation

tidoust commented Sep 14, 2022

tidoust commented Sep 27, 2022