Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise terminology to align with current RFCs. #15

Closed
afs opened this issue Feb 17, 2023 · 23 comments · Fixed by #41
Closed

Revise terminology to align with current RFCs. #15

afs opened this issue Feb 17, 2023 · 23 comments · Fixed by #41
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. spec:editorial Minor issue or proposed change in the specification (markup, typo, informative text)

Comments

@afs
Copy link
Contributor

afs commented Feb 17, 2023

RFC 3987 is a "proposed standard" in the RFC process.

RDF relies on the syntax extensions that RFC 3987 makes to RFC 3986.

RFC 3986 is updated by RFC 6874 and RFC 8820.

This should include RDF1.1 Erratum 29 which refers to the use of certain URI/IRI terminolgy. ("absolute IRI", "relative IRI reference")

@Tpt
Copy link
Contributor

Tpt commented Feb 17, 2023

There is also the WHATWG URL living standard that is mostly RFC 3987 with some changes for web backward compatibility.

@afs
Copy link
Contributor Author

afs commented Feb 17, 2023

Yes - I nearly put HTML URLs in. I couldn't find a grammar to link to. They do come from a different design space and have various features related to that (e.g. display, normalization of output). They do discuss "c:"! It is "living" so may change.

@gkellogg
Copy link
Member

There have been issues raised before on using the URL spec rather than IRI. The main advantage is that it is properly citable, but the lack of a BNF grammar.

My thought is that we continue to use the term "IRI", defined in RDF Concepts for others specs to cite. That is, in turn, defined in an appendix based on the URL living standard with a historical reference to RFC 3987 (also in changes). The appendix can include the ABNF from RFC397, but probably should be non-normative. Validation comes from the URL spec, with informative reference to the historical ABNF and note that there are valid IRIs not handled by the ABNF.

We also update our terminology here, and elsewhere as noted in RDF1.1 Erratum 29.

@gkellogg
Copy link
Member

Alternatively, we could join with consensus and define IRI as a URL referencing the living specification. This would not prohibit us from continuing to define an IRI term, and related, as aliases for the URL-related terminology so that references to those terms still resolve. It's clear that validation and parsing of a URL is too complicated for ABNF, but as that's fully defined elsewhere, that shouldn't concern us. A "valid IRI" is thus a valid URL string.

@gkellogg gkellogg mentioned this issue Feb 17, 2023
3 tasks
gkellogg added a commit that referenced this issue Feb 17, 2023
@afs
Copy link
Contributor Author

afs commented Feb 18, 2023

An absolute IRI can be defined for historical purposes, but is the same thing as IRI.

Some of the historical naming is not great. It works in the specs (2396) but get lost when used outside.

An "Absolute URI" (3986, 4.3) is with-scheme and without-fragment. In the RDF context, it is not very useful.

@afs
Copy link
Contributor Author

afs commented Feb 18, 2023

The URL standard brings in a lot of material that is related to its use domain. It is something RDF applications may wish to use. RDF is compatible with the URL spec.

I don't see what advantages it brings to the RDF platform (RDF specs) to be based on the URl spec. We need less than what the URL spec covers.

We end up with two sets of terminology, confusing the situation. That does not help the material on the web in books and training materials already in existence.

Validation comes from the URL spec

validation == the parsing algorithm? The URl spec defines "validation error".

For example, what will be the RDF position on percent-encoding?

"Code points greater than U+007F DELETE will be converted to percent-encoded bytes by the URL parser."

That seems to imply a change to RDF even if we, somehow, explain it does not apply.

"A URL is a struct that represents a universal identifier". That is confusing for our usage. RDF is a data format, we don't need to be involved with the parsed presentation.

this would not prohibit us from continuing to define an IRI term

I think it would raise questions. One of the stated URL goals is to superseded the terminology of URI and IRIs.

We still have to include the grammar (3987 section 2) and the URI algorithms extended for unicode (section5) in RDF Concepts.

"As the editors learn more about the subject matter the goals might increase in scope somewhat."

As far as I can see, we can probably use some of the URL spec (not all of it) if we do the necessary checking.
I'm not seeing what we gain, given where we are today. We would adopt material from RFC 3987 either way.

@gkellogg
Copy link
Member

I'll add the i18n-tracker tag to the issue. Discussion of this should probably be included in a future meeting with representatives if the I18N group. It seems that the URL spec is the intended direction for W3C, but it remains inadequate for the purposes of RDF, and the IRI nomenclature is pretty widely embedded in various specs.

Note that URL is a living spec, so could be updated to address our needs, given the time and expertise to spend on it.

We end up with two sets of terminology, confusing the situation. That does not help the material on the web in books and training materials already in existence.

For other obsolete terms still in use we've used <span id="..."></span> which would be a way to address the entry points, but probably some descriptive definition of IRI and related would be useful at least to say that they are archaic terms (or similar).

validation == the parsing algorithm? The URl spec defines "validation error".

One of the things to explore with the I18N group is the lack of an ABNF grammar, which I presume has some reasoning behind it. Distinguishing c:/path as part of a file vs a scheme required some Windows-specific logic in my implementation, but they describe that as being non-conforming. Using validation error satisfies the need at the expense of requiring implementations to include a conforming URL parser.

For example, what will be the RDF position on percent-encoding?

"Code points greater than U+007F DELETE will be converted to percent-encoded bytes by the URL parser."

This seems to be the output of the parser, which I don't believe we would use directly.

Interesting, though, that the description of URL serializing does not decode the characters beyond U+007F, which does not seem very friendly to international users.

We still have to include the grammar (3987 section 2) and the URI algorithms extended for unicode (section5) in RDF Concepts.

Presuming that it is compatible with URL. I'd hate to do this without fully understanding why the URL spec does not include such a grammar. Perhaps the spec, or one related, could be updated to deal with the specific needs of RDF.

@gkellogg gkellogg added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Feb 18, 2023
@afs
Copy link
Contributor Author

afs commented Feb 18, 2023

It seems that the URL spec is the intended direction for W3C,

Pointer please.

We end up with two sets of terminology, confusing the situation. That does not help the material on the web in books and training materials already in existence.

For other obsolete terms still in use we've used which would be a way to address the entry points, but probably some descriptive definition of IRI and related would be useful at least to say that they are archaic terms (or similar).

I'm wasn't talking about the specs. I mean material that has been blogged, video'ed, and printed.

We still have to include the grammar (3987 section 2) and the URI algorithms extended for unicode (section5) in RDF Concepts.

Presuming that it is compatible with URL.

Why would it differ? (aside from the false statements about spaces in the goals section).
Why does W3C want to differ from IETF?

@gkellogg
Copy link
Member

It seems that the URL spec is the intended direction for W3C,

Pointer please.

Notably IRIStatus from the I18N wiki.

We still have to include the grammar (3987 section 2) and the URI algorithms extended for unicode (section5) in RDF Concepts.

Presuming that it is compatible with URL.

Why would it differ? (aside from the false statements about spaces in the goals section).

Why does URL have a grammar? I don't have anything to point to, but I have some recollection that ABNF was inadequate, thus the procedural description of parsing URLs. I'd like to get clarification on this.

Why does W3C want to differ from IETF?

The IRIStatus doc seems to describe the motivation. But it is certainly heavily biased to the needs of browser vendors.

We need to have a good rationale for either subsuming the RFC3987 work relevant to RDF, or adopting/adapting the URL spec.

@afs
Copy link
Contributor Author

afs commented Feb 19, 2023

IRIStatus was last updated was 2015.

It is mainly concerned about presentation of international domain names and bidi URLs. That, and conversion (i.e. changing the string) to ASCII are important for browsers. There isn't a problem with RFC 3987 ihost.

We should (and, currently, we are) compatible with the URL spec. There are several RFCs and W3C for specific URI schemes. We don't reference those nor discuss their specifics.

@TallTed
Copy link
Member

TallTed commented Feb 19, 2023

Also notable, IRIStatus was authored and edited over ~18 months by a single person (Aphillip who did not see fit to create a user page in the i18n wiki, but appears to have been Addison Phillips whose personal pages all appear to be outdated and/or offline). Addison Phillips's chairmanship of the i18n WG notwithstanding, this does not suggest to me that the apparently unratified IRIStatus is the "intended direction for W3C", a group with hundreds of member organizations and not-easily-counted hundreds if not thousands of Member-representing and Invited Expert individual participants.

@gkellogg
Copy link
Member

Clearly, we need to understand if W3C has a policy about international identifiers and how to handle the state of RFC3987 which presents an issue; is this a TAG question, or an I18N question?

As previously noted, we can either choose adopt the pertinent core of RFC3987 and use RDF Concepts as the citable reference for IRI, or we can do something else, including adopt the URL Standard; doing nothing doesn't seem like an option.

Updating or adopting RFC3987 seems like serious scope creep, but not having a normative specification to reference isn't acceptable either.

@duerst
Copy link

duerst commented Feb 20, 2023

RFC 3987 is still a "proposed standard" in the RFC process.

Please note that most IETF specifications are at Proposed Standard. These get referenced all the time. It's not like a CR or PR at W3C.

RDF relies on the syntax extensions that RFC 3987 makes to RFC 3986.

RFC 3986 is updated by RFC 6874, RFC 7320, RFC 8820.

Please note that RFC 8820 obsoletes RFC 7320, so as a result, RFC 3986 is really only updated by RFC 6874 and RFC 8820.

Also, there's an update to RFC 6874 at https://datatracker.ietf.org/doc/draft-ietf-6man-rfc6874bis/. That also updates RFC 3987 (if it gets approved). But please note that IPv6 zone identifiers are something extremely local, definitely not suited for RDF.

As for RFC 8820, that's essentially just best current practice, and there, RDF has its own considerations.

@afs
Copy link
Contributor Author

afs commented Feb 20, 2023

@duerst - That you for clarifying that RFC 3987 is still referenceable.

I've corrected the issue section and removed mention of 7320 because RFC 8820 obsoletes RFC 7320.

@afs
Copy link
Contributor Author

afs commented Feb 20, 2023

The URL spec is not the right spec.

It is concerned with processing URLs, and the parsing algorithm changes the input string. It has specific percent-encoding handling which must only be performed once between original IRI and any deference action or the meaning has been changed (and it's a security risk).

As previously discussed (RDF 1.1 timeframe - whether IRIs be normalized) RDF specs are transporting IRIs and not concerned with "producing" or "dereference" (application use).

The title here is "Revise terminology to align with current RFCs." - e.g. IRI references and absolute IRIs.

#17 mentions the WG may change core terminology based on URL spec - that is completely different and should not be in FPWD when the WG hasn't indicated it will even consider doing this. #17 does not mention "adopt the pertinent core of RFC3987".

We need a grammar for IRIs.

It would be useful (but is not essential) to have a complete updated IRI grammar and not just the changes.
It would be useful to also pick up the mechanical changes that happen as well.

@afs
Copy link
Contributor Author

afs commented Feb 20, 2023

Please note that most IETF specifications are at Proposed Standard.

Yes. The "Proposed Standard level" are defined in RFC 2026#section-4.1.1 (which is not changed by RFC 6410#section-2.1).

I found URNs are "proposed standard":

gkellogg added a commit that referenced this issue Feb 20, 2023
@gkellogg
Copy link
Member

The URL spec is not the right spec.

It is concerned with processing URLs, and the parsing algorithm changes the input string. It has specific percent-encoding handling which must only be performed once between original IRI and any deference action or the meaning has been changed (and it's a security risk).

Perhaps it's worth an informative note describing the relationship/compatibility of RFC3987 and the URL standard.

As previously discussed (RDF 1.1 timeframe - whether IRIs be normalized) RDF specs are transporting IRIs and not concerned with "producing" or "dereference" (application use).

The title here is "Revise terminology to align with current RFCs." - e.g. IRI references and absolute IRIs.

#17 mentions the WG may change core terminology based on URL spec - that is completely different and should not be in FPWD when the WG hasn't indicated it will even consider doing this. #17 does not mention "adopt the pertinent core of RFC3987".

We need a grammar for IRIs.

It would be useful (but is not essential) to have a complete updated IRI grammar and not just the changes. It would be useful to also pick up the mechanical changes that happen as well.

Would you suggest an appendix that merges and updates the ABNF grammar in this document?

Please note that most IETF specifications are at Proposed Standard.

Yes. The "Proposed Standard level" are defined in RFC 2026#section-4.1.1 (which is not changed by RFC 6410#section-2.1).

It is great that we can put this to bed and continue to use RFC3987, as the issue of referencing that and using the URL standard instead has come up a number of times in different places.

@afs
Copy link
Contributor Author

afs commented Feb 20, 2023

Would you suggest an appendix that merges and updates the ABNF grammar in this document?

Good idea - give me a task to do it.

@gkellogg
Copy link
Member

gkellogg commented Feb 20, 2023

I have tooling to generate HTML for an ABNF grammar; I created #20 for adding an ABNF grammar we can collaborate on.

@afs
Copy link
Contributor Author

afs commented Mar 9, 2023

This should be split into two issues.

One for aligning terminology and one for including the combined IRI grammar.

@gkellogg
Copy link
Member

@afs: is the "Define IRI syntax for RDF" half of this issue substantially covered by #21? If so, maybe we can just re-name this issue to "Revise terminology to align with current RFCs".

@afs
Copy link
Contributor Author

afs commented Mar 14, 2023

Like all non-trivial changes, there should be an issue for work done.

Issue #20 should do to track PR #21.

Then we can use this one for revising terminology.

@gkellogg gkellogg changed the title Define IRI syntax for RDF. Revise terminology to align with current RFCs. Revise terminology to align with current RFCs. Mar 14, 2023
@afs
Copy link
Contributor Author

afs commented Mar 15, 2023

From #21.

Ensure that "IRI" in the section title "IRI Grammar" does not have under-dots (the abbreviation style).

@gkellogg gkellogg added Editorial Errata management: this erratum is editorial spec:editorial Minor issue or proposed change in the specification (markup, typo, informative text) labels Mar 16, 2023
@pchampin pchampin removed the Editorial Errata management: this erratum is editorial label Mar 27, 2023
@gkellogg gkellogg mentioned this issue May 8, 2023
gkellogg added a commit that referenced this issue Aug 31, 2023
* Described IRIs as "resolved" rather than "absolute".

For #15.

---------

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Andy Seaborne <andy@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. spec:editorial Minor issue or proposed change in the specification (markup, typo, informative text)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants