diff --git a/.gitignore b/.gitignore index 54e2289..b6cae31 100644 --- a/.gitignore +++ b/.gitignore @@ -1,12 +1,19 @@ +# emacs detritus +*~ +*# + docs/url-problem-statement.html docs/url-problem-statement.txt + evaluate/Cargo.lock evaluate/*.jar evaluate/target/ evaluate/*.class evaluate/urlsettest.json evaluate/urltestdata.json + parser.pegjs + reference-implementation/IdnaMappingTable.txt reference-implementation/punycode.js reference-implementation/unorm.js @@ -18,4 +25,5 @@ reference-implementation/test/urltestparser.js reference-implementation/test/urlsettest.js url.html url.pegjson + node_modules/ diff --git a/README.md b/README.md index c860907..aab4282 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,7 @@ Contents --- * Specification. Source is contained in [url.src](url.src) and [url.pegjs](url.pegjs). + * [IETF document(s)](docs#readme). (Source for) IETF Internet Draft laying out problems and some solutions. * [Reference implementation](reference-implementation#readme) in JavaScript. This directory also contains web pages that demonstate live parsing of URLs entered in an HTML input field. The latest version is deployed live: @@ -50,3 +51,6 @@ Running `make` in the `evaluate` directory will capture test results for a number of non-web browser URL/URI implementations. See the [Makefile](evaluate/Makefile) for a list of prerequisites. + +Running `make` in the `docs` directory will build the Internet +Draft(s) in HTML and plain text, using xml2rfc. diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..61c1750 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,18 @@ +Running `make` in this `docs` directory will build the Internet +Draft(s) in HTML and plain text, using xml2rfc. + +It will use xml2rfc to convert .xml file into .html and .txt. You can do this yourself, via: + xml2rfc multipart-form-data.xml --text --html + +or use online. + +Periodically submit an Internet Draft + + +to make a new version of + + +Be sure to update the file version number in the xml file before submitting. + + + diff --git a/docs/url-problem-statement.xml b/docs/url-problem-statement.xml index 994089d..b57a817 100755 --- a/docs/url-problem-statement.xml +++ b/docs/url-problem-statement.xml @@ -14,7 +14,7 @@ - Problem Statement: URL + URL Problem Statement and Directions @@ -73,9 +73,42 @@ +
+ This document lays out the problem space around + standards for URLs and things like + them, and proposes some actions to resolve the conflicts. + From a user or developer point of view, it makes no sense for there to + be a proliferation of definitions of URL nor for there to be a + proliferation of incompatible implementations. This shouldn't be a + competitive feature. Therefore there is a need for the organizations + involved to update and reconcile the various Internet Drafts, + Recommendations, and Standards in this area. + Currently, this document is for discussion purposes, with + possible next steps discussed in . + + Discussions have taken place on + and . The W3C TAG + has discussed these issues in meetings and on their mailing list, + There has been limited discussion on the WebApps mailing list.(?) + + This document, test suite, reference implementation, and + (the WebPlatform specification are being developed at + , including + an issue tracker, Wiki, and related resources. + 'Pull' requests for edits to doocuments or tests are most welcome. + Raising issues in the + GitHub tracker is also helpful. + Comments to the editors or on those mailing list in email + are also welcome. + + + +
This section contains a very compressed history of URL standards, in sufficient detail to set some context. + REVIEWERS: history is necessarily incomplete, but please + note incorrect or missing essential facts. The first standards-track specification for URLs was @@ -117,10 +150,20 @@ definition was moved out into the "URL - Living Standard" . - The world has also moved on. ICANN has approved non-ASCII top level + When W3C produced the HTML5 + recommendation, the normative + reference to the WHATWG URL standard was a gating issue, and + an + unusual compromised was reached, where the [URL] reference is given a descriptive + paragraph rather than a single document reference. + + + The world has moved on in other ways. ICANN has approved non-ASCII top level domains, but IDNA specs ( and ) did not fully addressed IRI processing. Subsequently, the Unicode consortium produced . + +
@@ -128,6 +171,10 @@ produced multiple documents, and it's unclear whether there's a trajectory to make them consistent. This section tries to enumerate currently active organizations and specs. + REVIEWERS: are there important ongoing activities we've missed + or gotten wrong? Who are the stakeholders whose current + work might be affected? (This is to scope organizational + coordination needed.) Organizations include the IETF, @@ -140,8 +187,13 @@ Relevant specs under development in each organization include:
- and - are under active development. + has + passed working group last call and entered IESG + review. + + New schemes and updates to old ones continue, including + 'file:' + and 'urn:'. The IRI working group closed, but work can continue in the Applications Area working group. Documents sitting needing update, abandoned now, @@ -149,12 +201,18 @@ , and ), which were originally intended to obsolete . - - In addition, there's quite a bit of activity around URNs and - library identifiers in the URN working group, including - some expressions of desire to update RFC 3986 to better accomodate - desired URN semantics. + + The URNBis + working group has been working to update the definitions + of URNs, but has difficulty with some of the wording in RFC 3986. + In particular, updates RFC3986. + + The web security working group developed + RFC 6454 ("The Web Origin Concept"), which + redefines. Updates + in the IETF were abandoned. +
@@ -169,16 +227,31 @@
The Web - Applications Working Group, in conjuction with the - W3C TAG, + Applications Working Group sporadically have been republishing the WHATWG work with no technical content differences as . There is a proposal to formalize this relationship. + + The W3C TAG developed + + Best Practices for Fragment Identifiers and Media Type Definitions + , which points out several problems with the definitions + for the 'fragment' part of URLs. The TAG is working to + insure liaison exchange happens. + + Note also the interim solution for the + HTML5 reference to [URL], which should be updated by + the HTML working + group . +
- is being developed on a + WebPlatform.org is an activity sponsored by W3C and web vendors. + is being developed on a develop GitHub branch based on . It currently contains work that has yet to be folded back into the @@ -196,16 +269,20 @@
- The main problem is conflicting specifications that overlap - but don't match each other. - Additionally, the following are issues that need to be resolved to - make URL processing unambiguous and stable. - + This section lays out the problems we see need a coordinated + solution. REVIEWERS: have we missed some things? Are any of these + non-problems or not worth solving? + + The main problem is conflicting specifications that overlap + but don't match each other. + Additionally, the following are issues that need to be resolved to + make URL processing unambiguous and stable. + Nomenclature: over the years, a number of different sets of terminology has been used. URL / URI / IRI is not the only difference. chronicles a number of differences. - + Parameterization: standards in this area need to define such matters as normalization forms and values for parameters such as UseSTD3ASCIIRules. @@ -215,6 +292,13 @@ and browsers. identifies a number of such differences. + IDNA: RFC 3490 (IDNA) defines processing for 'IDN-aware + domain name slots' (where "the host portion of the URI in the + src attribute of an HTML <IMG> tag" is given as an + example. Later, "IDNA is applicable to all domain names in all + domain name slots". So in mailto:user@host, is the host a + IDN-aware domain name slot? A domain name slot at all? + Specific scheme definitions: some UR* scheme definitions are woefully out of date, incomplete, or don't correspond to current practice, but updating their definitions is unclear. This includes "file:", @@ -224,18 +308,60 @@
-
- This problem clearly requires a cross-organizational solution, - specifically: - - Build a plan to update or obsolete , - , , and - to be consistent with +
+ Many of the problem above require some cross-organizational collaboration. + This section outlines alternatives and possible next steps. + REVIEWERS: Neccessary? Sufficient? What are we missing, what + did we get wrong? + +
+ At various times, many have called for replacing + the IETF URI standard RFC 3986, or updating it. How + to approach this is controversal, but at a minimum + the following are needed: + + Make it clear that ASCII-only URIs (as now defined + by RFC 3986) are not what is mainly used on the web. + Updates for URN. + Updates for fragment identifier semantics. + Note terminology issue and resolution. + + +
+ +
+ After insuring that topics covered in RFC 3987 are + also covered by the W3C URL recommendation, then + mark 3987 as obsolete with a short RFC noting + the conditions laid out in this document. + +
+ +
+ + Replaced by . + +
+
+ + Coordinate 'file:' syntax in + and , possibly + moving the file: Partr of URL-LS into a separate + document. + +
+
+ + + Update , and + to be consistent with and . This may involve working to get the other specifications updated, - if only to clarify nomenclature. + if only to clarify nomenclature. - Change the goals to only obsolete + Obsolete any previous definition of x-url-encoded. + + Change the goals to only obsolete specifications listed above that are not updated. Presuming that is updated, explicitly state that canonical URLs (i.e., the output of the URL parser) not only @@ -251,6 +377,9 @@ Other than responding to any feedback that may be provided, no changes to any Unicode Consortium product is required. + +
+