-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IDL Names generator tool #489
Conversation
The IDL names generator takes a crawl report as input and creates a report per referenceable IDL name, that details the complete parsed IDL structure that defines the name across all specs. The parsed IDL structure is a wrapped version of the structure that appears in `idlparsed` extracts. Here is an example: ```json { "defined": { "spec": { "title": "Media Capture and Streams", "url": "https://www.w3.org/TR/mediacapture-streams/" }, "type": "dictionary", "name": "ConstraintSet", "inheritance": null, "members": [], "extAttrs": [], "partial": false, "href": "https://w3c.github.io/mediacapture-main/#dom-constraintset" }, "extended": [], "inheritance": null, "includes": [] } ``` The meaning of the properties is: - `defined` contains the base IDL definition of the name and includes a `spec` property that describes where the name is defined. Note the URL that appears is the spec identifier, equivalent to the `url` field in browser-specs, and not necessarily the crawled URL. The rest of the structure is the `idlparsed` one (with the exception of the `href` property, see below). - `extended` contains the list of partial definitions that extend the base definition, each of them following the same structure as the one presented here. The order of the list follows the order of appearance in the crawl results, where specs are sorted by URL. - `inheritance` contains the inherited interface when there is one, again following the same structure. The whole inheritance chain appears, meaning that one can follow `inheritance` properties to get from `HTMLVideoElement` all the way down to `EventTarget`. - `includes` contains the list of mixins that the name includes, each of them following the same structure as the one presented here. The order of the list follows the order of appearance in the crawl results, where specs are sorted by URL. Whenever possible, all IDL terms get linked to their definition in the spec through an `href` property (which uses the crawled URL). That property is computed from the dfns extracts. The property appears at the interface level and also for individual IDL property names, as in: ```json { "defined": { "spec": { "title": "CSS Spatial Navigation Level 1", "url": "https://www.w3.org/TR/css-nav-1/" }, "type": "enum", "name": "SpatialNavigationDirection", "values": [ { "type": "enum-value", "value": "up", "href": "https://drafts.csswg.org/css-nav-1/#dom-spatialnavigationdirection-up" }, { "type": "enum-value", "value": "down", "href": "https://drafts.csswg.org/css-nav-1/#dom-spatialnavigationdirection-down" }, { "type": "enum-value", "value": "left", "href": "https://drafts.csswg.org/css-nav-1/#dom-spatialnavigationdirection-left" }, { "type": "enum-value", "value": "right", "href": "https://drafts.csswg.org/css-nav-1/#dom-spatialnavigationdirection-right" } ], "extAttrs": [], "href": "https://drafts.csswg.org/css-nav-1/#enumdef-spatialnavigationdirection" }, "extended": [], "includes": [] } ``` Related discussion in #472.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a minor stylistic question
Thinking about this a bit more, I believe that this is not a fantastically useful approach for the scenarios that we have in mind, so I'd like to leave this open for the time being. The initial goal was to give a ready-to-use extract per IDL name that would in particular allow people to serialize the IDL under whatever format that they might want. That would be straightforward if one could use the serializer in webidl2.js library. That is not directly possible though, because the serizalizer actually operates on the "hidden" tokens in the AST ("hidden" in the sense that they disappear when As such, there would be no easy way to use these extracts directly in WebIDLPedia or Respec. On top of my head, several possibilities:
|
I had imagined option (2) when considering that issue when reviewing the pull request - or more specifically, I thought we could add the IDL fragments to the JSON output in a later iteration. |
The AST is only half useful because it is not suitable for serialization purpose. Adding the raw IDL fragment is better from that perspective. For the export to be readily usable, it needs to include definitions. Without the AST, these definitions need to be at the root level. In turn, this means that additional logic is needed when one wants to re-serialize the IDL fragment to associate the definitions back to the appropriate definitions. That logic is not fantastically straightforward. Definitions are not included by default, and not included in the crawl. It would be good to have clear feedback on whether they are going to be useful.
The crawler now also exports one text file per IDL name in an "idlnames" folder. Each text file contains the full interface (without the fragments that define the inherited classes).
This update adds an `href` property to `defined` structures that link to the definition of the underlying IDL name in the spec, when known. Note the full definition would also appears in the `dfns` array if the generator is told to generate definitions.
I made several updates:
I note that the cleanup job running on webref will have to be completed to detect files that need to be deleted in the Custom logic to serialize an IDL name with linksSome possible logic to link an IDL fragment members with definitions using the IDLNames generator and the WebIDL writer (Relative paths are from the root of the Reffy package and need to be updated if you create that script elsewhere). const path = require('path');
const { parse, write } = require('webidl2');
const { requireFromWorkingDirectory, expandCrawlResult } = require('./src/lib/util');
const { generateIdlNames } = require('./src/cli/generate-idlnames.js');
const { matchIdlDfn, getExpectedDfnFromIdlDesc } = require('./src/cli/check-missing-dfns');
function templates(idlName, dfns) {
function getExpectedDfn(name, context) {
if (context && context.data) {
const expected = getExpectedDfnFromIdlDesc(context.data, context.parent);
if (expected) {
const dfn = dfns.find(dfn => matchIdlDfn(expected, dfn));
if (dfn) {
return dfn;
}
}
if (!expected || (expected.type === 'interface')) {
return getInterfaceDfn(name);
}
}
return null;
}
function getInterfaceDfn(name) {
const expected = { linkingText: [name], type: 'interface', 'for': [] };
let dfn = null;
if (idlName.dfns) {
for (const list of Object.values(idlName.dfns)) {
dfn = list.find(dfn => matchIdlDfn(expected, dfn));
if (dfn) {
break;
}
}
}
return dfn;
}
function getWrappingFunction(lookupFunction) {
return function (name, context) {
const dfn = lookupFunction(name, context);
if (dfn) {
return `[${name}](${dfn.href})`;
}
return name;
}
}
return {
name: getWrappingFunction(getExpectedDfn),
nameless: getWrappingFunction(getExpectedDfn),
reference: getWrappingFunction(getInterfaceDfn)
};
}
function serialize(idlName) {
let res = [];
function serializeNode(node) {
const root = node.defined ? node.defined : node;
const spec = root.spec ? root.spec : null;
let dfns = [];
if (spec && idlName.dfns && idlName.dfns[spec.url]) {
dfns = idlName.dfns[spec.url];
}
const writeParams = { templates: templates(idlName, dfns) };
const idlTree = parse(node.defined ? node.defined.fragment : node.fragment);
const idl = write(idlTree, writeParams);
res.push(idl);
if (node.inheritance) {
serializeNode(node.inheritance);
}
if (node.extended) {
node.extended.map(node => serializeNode(node));
}
if (node.includes) {
node.includes.map(node => serializeNode(node));
}
}
serializeNode(idlName);
return res;
}
async function linkify(idlName, crawlPath) {
const crawlIndex = requireFromWorkingDirectory(path.join(crawlPath, 'index.json'));
const crawlResults = await expandCrawlResult(crawlIndex, crawlPath);
const names = generateIdlNames(crawlResults.results, { dfns: true });
const desc = names[idlName];
const res = serialize(desc);
return res.join('\n\n');
}
const idlName = process.argv[2] || 'Document';
const crawlPath = process.argv[3] || 'reports/ed';
linkify(idlName, crawlPath).then(res => {
console.log('==========');
console.log(res);
console.log('==========');
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed everything in detail, but the direction and the design choices LGTM; once the webref build is out, I'll try to adapt webidlpedia to use this
The "idlnames" and "idlnameparsed" folders have recently been added, see: w3c/reffy#489 (comment) They contain files per IDL Name, which need to be dropped when the IDL names no longer appear in any of the crawled specs. Also adjust jobs execution schedules to have the cleanup job run after the weekly tr crawl, see: #86 (review)
The "idlnames" and "idlnameparsed" folders have recently been added, see: w3c/reffy#489 (comment) They contain files per IDL Name, which need to be dropped when the IDL names no longer appear in any of the crawled specs. Also adjust jobs execution schedules to have the cleanup job run after the weekly tr crawl, see: #86 (review)
The IDL names generator takes a crawl report as input and creates a report per referenceable IDL name, that details the complete parsed IDL structure that defines the name across all specs.
The parsed IDL structure is a wrapped version of the structure that appears in
idlparsed
extracts. Here is an example:The meaning of the properties is:
defined
contains the base IDL definition of the name and includes aspec
property that describes where the name is defined. Note the URL that appears is the spec identifier, equivalent to theurl
field in browser-specs, and not necessarily the crawled URL. The rest of the structure is theidlparsed
one (with the exception of thehref
property, see below).extended
contains the list of partial definitions that extend the base definition, each of them following the same structure as the one presented here. The order of the list follows the order of appearance in the crawl results, where specs are sorted by URL.inheritance
contains the inherited interface when there is one, again following the same structure. The whole inheritance chain appears, meaning that one can followinheritance
properties to get fromHTMLVideoElement
all the way down toEventTarget
.includes
contains the list of mixins that the name includes, each of them following the same structure as the one presented here. The order of the list follows the order of appearance in the crawl results, where specs are sorted by URL.Whenever possible, all IDL terms get linked to their definition in the spec through an
href
property (which uses the crawled URL). That property is computed from the dfns extracts. The property appears at the interface level and also for individual IDL property names, as in:The crawler calls the IDL names generator to create individual exports per IDL name in
idlnamesparsed
. Individual exports remain relatively small in size (max is 775KB for theWebGL2RenderingContext
interface, average size is 25KB). Total folder size is a bit more than 50MB though.Partially addresses #472 (this does not create the textual definition).