-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learn to parse IETF datatracker pages #1122
Comments
tidoust
added a commit
that referenced
this issue
Nov 10, 2023
Note the need to be more explicit about IETF entries. It would be worth improving the code now that we start to add more of these specs. This is tracked in #1122. This would add: ```json [ { "url": "https://datatracker.ietf.org/doc/html/draft-zern-webp/", "seriesComposition": "full", "shortname": "webp", "series": { "shortname": "webp", "currentSpecification": "webp", "title": "WebP Image Format", "shortTitle": "WebP Image Format", "nightlyUrl": "https://www.ietf.org/archive/id/draft-zern-webp-13.html" }, "organization": "IETF", "groups": [ { "name": "Network Working Group", "url": "https://datatracker.ietf.org/group/app/" } ], "nightly": { "url": "https://www.ietf.org/archive/id/draft-zern-webp-13.html", "status": "Editor's Draft", "alternateUrls": [], "filename": "draft-zern-webp-13.html" }, "title": "WebP Image Format", "source": "spec", "shortTitle": "WebP Image Format", "categories": [ "browser" ], "standing": "good" }, { "url": "https://www.rfc-editor.org/rfc/rfc6386", "seriesComposition": "full", "shortname": "rfc6386", "series": { "shortname": "rfc6386", "currentSpecification": "rfc6386", "title": "VP8 Data Format and Decoding Guide", "shortTitle": "VP8 Data Format and Decoding Guide", "nightlyUrl": "https://www.rfc-editor.org/rfc/rfc6386" }, "groups": [ { "name": "Independent Submission", "url": "https://datatracker.ietf.org/stream/ise/" } ], "organization": "IETF", "nightly": { "url": "https://www.rfc-editor.org/rfc/rfc6386", "status": "Informational", "alternateUrls": [], "filename": "rfc6386.html" }, "title": "VP8 Data Format and Decoding Guide", "source": "specref", "shortTitle": "VP8 Data Format and Decoding Guide", "categories": [ "browser" ], "standing": "good" } ] ```
tidoust
added a commit
that referenced
this issue
Nov 10, 2023
Note the need to be somewhat explicit about IETF entries. It would be worth improving the code now that we start adding more of these specs. This is tracked in #1122. This adds: ```json [ { "url": "https://datatracker.ietf.org/doc/html/draft-zern-webp/", "seriesComposition": "full", "shortname": "webp", "series": { "shortname": "webp", "currentSpecification": "webp", "title": "WebP Image Format", "shortTitle": "WebP Image Format", "nightlyUrl": "https://www.ietf.org/archive/id/draft-zern-webp-13.html" }, "organization": "IETF", "groups": [ { "name": "Network Working Group", "url": "https://datatracker.ietf.org/group/app/" } ], "nightly": { "url": "https://www.ietf.org/archive/id/draft-zern-webp-13.html", "status": "Editor's Draft", "alternateUrls": [], "filename": "draft-zern-webp-13.html" }, "title": "WebP Image Format", "source": "spec", "shortTitle": "WebP Image Format", "categories": [ "browser" ], "standing": "good" }, { "url": "https://www.rfc-editor.org/rfc/rfc6386", "seriesComposition": "full", "shortname": "rfc6386", "series": { "shortname": "rfc6386", "currentSpecification": "rfc6386", "title": "VP8 Data Format and Decoding Guide", "shortTitle": "VP8 Data Format and Decoding Guide", "nightlyUrl": "https://www.rfc-editor.org/rfc/rfc6386" }, "groups": [ { "name": "Independent Submission", "url": "https://datatracker.ietf.org/stream/ise/" } ], "organization": "IETF", "nightly": { "url": "https://www.rfc-editor.org/rfc/rfc6386", "status": "Informational", "alternateUrls": [], "filename": "rfc6386.html" }, "title": "VP8 Data Format and Decoding Guide", "source": "specref", "shortTitle": "VP8 Data Format and Decoding Guide", "categories": [ "browser" ], "standing": "good" } ] ``` --------- Co-authored-by: Francois Daoust <fd@tidoust.net>
tidoust
added a commit
that referenced
this issue
Nov 21, 2023
The code now recognizes IETF draft documents that have a `datatracker.ietf.org` URL: - It associates them with the IETF organization - It can compute a useful shortname (that code can in theory return a truncated shortname because there is no direct way to validate that the Internet Draft name contains a group ID). - It extracts the group's ID from the nightly URL (that code could further be improved to fetch the actual group name, right now the code only knows about the "HTTP" working group). - It associates IETF documents from the HTTP WG to the right repository. - It computes the better-looking nightly URL at `www.ietf.org` or at `httpwg.org` for HTTP WG documents. This allows to simplify IETF data in `specs.json` a bit. Note that the code still cannot process drafts that have been submitted by individuals automatically, even when these drafts at targeted at a group. Such drafts should be associated with the individuals that submitted them and not with any group. A couple of spec entries, which incorrectly referenced the Network WG or the HTTP WG, were fixed accordingly in `specs.json`. This fixes #1122, but note that the code does not need to fetch the datatracker page for the time being.
tidoust
added a commit
that referenced
this issue
Nov 22, 2023
The code now recognizes IETF draft documents that have a `datatracker.ietf.org` URL and fetch all the information it needs for IETF drafts and RFCs from the IETF datatracker using the Simplified Documents API: https://datatracker.ietf.org/api/#simplified-documents - It associates them with the IETF organization - It can compute a useful shortname (that code can in theory return a truncated shortname because there is no direct way to validate that the Internet Draft name contains a group ID). - It extracts the group's from the datatracker API - It associates IETF documents from the HTTP WG to the right repository. - It computes the better-looking nightly URL at `www.ietf.org` or at `httpwg.org` for HTTP WG documents. This allows to simplify IETF data in `specs.json` a bit. This fixes #1122, but note that the code does not need to fetch the datatracker page for the time being. IETF documents may be linked to a group, an area, or be part of what IETF calls individual submissions. Areas and individual submissions still link to a "group" page at IETF, so the code just takes that info from datatracker as-is. As a result, individual submissions are no longer associated with the author who submitted the document, but that does not seem needed in any case. The code throws when an IETF document that it knows under a certain name got published under a different name to alert us that the canonical URL needs to change in browser-specs. Name changes typically happen when a document transitions to a working group, or when it gets published as an RFC.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
IETF entries that have not yet been published as RFCs have a canonical URL that looks like
https://datatracker.ietf.org/doc/html/...
.Such entries need to be explicit about the organization, the group, and more often than not, a "better looking" nightly URL, e.g. one under
https://www.ietf.org/archive/
. The problem with the nightly URL is that it typically contains the current revision of the draft and thus becomes outdated as soon as a new revision is published.As we add more of these specs, the code could rather:
Not sure yet how to extract the group itself though.
The text was updated successfully, but these errors were encountered: