From 9fdf187036421ff61ba42520d911b3167282102f Mon Sep 17 00:00:00 2001 From: Dominique Hazael-Massieux Date: Mon, 26 Sep 2016 18:16:34 +0200 Subject: [PATCH 1/2] Mark W3C URL spec as obsolete --- url.bs | 2933 +------------------------------------------------------- 1 file changed, 7 insertions(+), 2926 deletions(-) diff --git a/url.bs b/url.bs index f6220eed..832d320a 100644 --- a/url.bs +++ b/url.bs @@ -1,2938 +1,19 @@
 Title: URL
-Group: webapps
+Group: webplatform
 H1: URL
 Shortname: url
-Status: WD
-TR: http://www.w3.org/TR/url-1/
+Status: NOTE
+TR: https://www.w3.org/TR/url-1/
 ED: https://url.spec.whatwg.org/
-Previous Version: http://www.w3.org/TR/2012/WD-url-20120524/
+Previous Version: http://www.w3.org/TR/2014/WD-url-1-20141209/
 Level: 1
 Editor: Anne van Kesteren, Mozilla, annevk@annevk.nl
 Editor: Sam Ruby, IBM http://www.ibm.com/, rubys@intertwingly.net
 Abstract: The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API.
+Status Text: This document is no longer maintained. Please refer to the URL Living Standard for the latest version of this specification.
+Warning: Obsolete
 Logo: https://resources.whatwg.org/logo-url.svg
-!Version History: https://github.com/w3ctag/url/commits/develop
-!Version History: https://github.com/whatwg/url/commits @urlstandard
-!Participate: file a bug (open bugs)
-!Participate: public-webapps@w3.org (archives)
-!Participate: whatwg@whatwg.org (archives)
-!Participate: IRC: #whatwg on Freenode
 Indent: 2
-
- -

Goals

- -

The URL standard takes the following approach towards making URLs fully interoperable: - -

- -

As the editors learn more about the subject matter the goals -might increase in scope somewhat. - - - -

Terminology

- -

Some terms used in this specification are defined in the -DOM, Encoding, IDNA, and Web IDL Standards. -[[!DOM]] -[[!ENCODING]] -[[!IDNA]] -[[!WEBIDL]] - -

The ASCII digits are code points in the range U+0030 to U+0039. - - -

The ASCII hex digits are ASCII digits or are -code points in the range U+0041 to U+0046 or in the range U+0061 to U+0066. - -

The ASCII alpha are code points in the range U+0041 to U+005A -or in the range U+0061 to U+007A. - -

The ASCII alphanumeric are ASCII digits or -ASCII alpha. - - -

Parsers

- -

The EOF code point is a conceptual code point that signifies the end of a -string or code point stream. - -

A parse error indicates a non-fatal mismatch between input and requirements. -User agents are encouraged to expose parse errors -somehow. - -

Within a parser algorithm that uses a pointer variable, c -references the code point the pointer variable points to. - -

Within a string-based parser algorithm that uses a pointer variable, -remaining references the substring after pointer in the string -being processed. - -

If "mailto:username@example" is a string being -processed and pointer points to "@", -c is "@" and remaining is -"example". - - - -

Percent-encoded bytes

- -

A percent-encoded byte is "%", followed by -two ASCII hex digits. Sequences of -percent-encoded bytes, after -conversion to bytes, should not cause a -utf-8 decoder to run into any -errors. - -

To percent encode a byte into a -percent-encoded byte, return a string consisting of -"%", followed by a double-digit, uppercase, hexadecimal -representation of byte. - -

To percent decode a byte sequence input, run these steps: - -

Using anything but a utf-8 decoder -when the input contains bytes outside the range 0x00 to 0x7F might be -insecure and is not recommended. - -

    -
  1. Let output be an empty byte sequence. - -

  2. -

    For each byte byte in input, run these steps: - -

      -
    1. If byte is not `%`, append - byte to output. - -

    2. Otherwise, if byte is `%` and the next two - bytes after byte in input are not in the ranges - 0x30 to 0x39, 0x41 to 0x46, and 0x61 to 0x66, append byte to - output. - -

    3. -

      Otherwise, run these substeps: - -

        -
      1. Let bytePoint be the two bytes after byte in - input, - decoded, and - then interpreted as hexadecimal number. - - -

      2. Append a byte whose value is bytePoint to - output. - -

      3. Skip the next two bytes in input. -

      -
    - -
  3. Return output. -

- - -

The simple encode set are all code points less than -U+0020 (i.e. excluding U+0020) and all code points greater than U+007E. - -

The default encode set is the -simple encode set and code points U+0020, -'"', -"#", -"<", -">", -"?", -and -"`". - -

The password encode set is the -default encode set and code points -"/", -"@", -and -"\". - -

The username encode set is the -password encode set and code point -":". - -

To utf-8 percent encode a code point, using -an encode set, run these steps: - -

    -
  1. If code point is not in - encode set, return code point. - -

  2. Let bytes be the result of running - utf-8 encode on - code point. - -

  3. Percent encode each byte in bytes, and - then return them concatenated, in the same order. -

- - - -

Hosts (domains and IP addresses)

- - - -

A host is a network address in the form of a -domain or an -IPv6 address. - -

A domain identifies a realm within a network. - -

An IPv6 address is a 128-bit identifier and -for the purposes of this specification represented as an ordered list of -eight 16-bit pieces. -[[RFC4291]] - - -

IDNA

- -

The domain to ASCII given a -domain domain, runs these steps: - -

    -
  1. Let result be the result of running - Unicode ToASCII with - domain_name set to domain, - UseSTD3ASCIIRules set to false, processing_option set to - Transitional_Processing, and VerifyDnsLength set to false. - -

  2. If result is a failure value, return failure. - -

  3. Return result. -

- -

The domain to Unicode given a -domain domain, runs these steps: - -

    -
  1. Let result be the result of running - Unicode ToUnicode with - domain_name set to domain, - UseSTD3ASCIIRules set to false. - -

  2. -

    Return result, ignoring any returned errors. - -

    User agents are encouraged to report errors through a developer console. -

- - -

Host writing

- -

A host must be either a -domain or "[", followed -by an IPv6 address, followed by -"]". - -

A domain is a valid domain if these steps return success: - -

    -
  1. Let result be the result of running - Unicode ToASCII with - domain_name set to domain, - UseSTD3ASCIIRules set to true, processing_option set to - Nontransitional_Processing, and VerifyDnsLength set to true. - -

  2. If result is a failure value, return failure. - -

  3. Set result to the result of running - Unicode ToUnicode with - domain_name set to result, - UseSTD3ASCIIRules set to true. - -

  4. If result contains any errors, return failure. - -

  5. Return success. -

- -

Ideally we define this in terms of a sequence of code points that make up a -valid domain rather than through a whack-a-mole: -bug 25334. - -

A domain must be a string that is a -valid domain. - -

An IPv6 address is defined in the -"Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture. -[[!RFC4291]] - - - -

Host parsing

- -

The host parser takes a string -input and optionally a Unicode flag, and then runs -these steps: - -

    -
  1. If input is the empty string, return failure. - - -

  2. -

    If input starts with "[", run these - substeps: - -

      -
    1. If input does not end with - "]", parse error, return failure. - -

    2. Return the result of - IPv6 parsing input - with its leading "[" and trailing - "]" removed. -

    - -
  3. Let domain be the result of - utf-8 decode without BOM on the - percent decoding of - utf-8 encode on input. - - -

  4. Let asciiDomain be the result of running - domain to ASCII on domain. - -

  5. If asciiDomain is failure, return failure. - -

  6. -

    If asciiDomain contains one of - U+0000, - U+0009, - U+000A, - U+000D, - U+0020, - "#", - "%", - "/", - ":", - "?", - "@", - "[", - "\", - and - "]", - return failure. - -

  7. Return asciiDomain if the Unicode flag is unset, - and the result of running domain to Unicode - on asciiDomain otherwise. -

- -

The IPv6 parser takes a string -input and then runs these steps: - -

    -
  1. Let address be a new - IPv6 address with its - 16-bit pieces initialized to 0. - -

  2. Let piece pointer be a pointer into - address's - 16-bit pieces, initially zero - (pointing to the first 16-bit piece), - and let piece be the - 16-bit piece it points to. - -

  3. Let compress pointer be another pointer into - address's 16-bit pieces, initially - null and pointing to nothing. - -

  4. Let pointer be a pointer into - input, initially zero (pointing to the first code point). - -

  5. -

    If c is ":", run these substeps: - -

      -
    1. If remaining does not start with - ":", parse error, return failure. - -

    2. Increase pointer by two. - -

    3. Increase piece pointer by one and then set - compress pointer to piece pointer. -

    - -
  6. -

    Main: - While c is not the EOF code point, run these - substeps: - -

      -
    1. If piece pointer is eight, - parse error, return failure. - -

    2. -

      If c is ":", run these inner - substeps: - -

        -
      1. If compress pointer is not null, - parse error, return failure. - -

      2. Increase pointer and piece pointer by one, set - compress pointer to piece pointer, - and then jump to Main. -
      - -
    3. Let value and length be 0. - -

    4. While length is less than 4 and - c is an - ASCII hex digit, set - value to - value × 0x10 + c interpreted as hexadecimal number, - and increase pointer and length by one. - -

    5. -

      Based on c: - -

      -
      "." -
      -

      If length is 0, parse error, - return failure. -

      Decrease pointer by length. -

      Jump to IPv4. - -

      ":" -
      -

      Increase pointer by one. -

      If c is the EOF code point, - parse error, return failure. - -

      Anything but the EOF code point -

      Parse error, return failure. -

      - -
    6. Set piece to value. - -

    7. Increase piece pointer by one. -

    - -
  7. If c is the EOF code point, jump to - Finale. - -

  8. IPv4: - If piece pointer is greater than six, - parse error, return failure. - -

  9. Let dots seen be 0. - -

  10. -

    While c is not the EOF code point, run - these substeps: - -

      -
    1. Let value be null. - -

    2. If c is not an ASCII digit, - parse error, return failure. - -

    3. -

      While c is an - ASCII digit, run these subsubsteps: - -

        -
      1. Let number be c interpreted as decimal number. - -

      2. -

        If value is null, set value to number. - -

        Otherwise, if value is 0, parse error, return failure. - -

        Otherwise, set value to value × 10 + number. - -

      3. Increase pointer by one. - -

      4. If value is greater than 255, parse error, - return failure. -

      - -
    4. If dots seen is less than 3 and - c is not a ".", - parse error, return failure. - -

    5. Set piece to - piece × 0x100 + value. - -

    6. If dots seen is 1 or 3, increase - piece pointer by one. - -

    7. Increase pointer by one. - -

    8. If dots seen is 3 and c is not - the EOF code point, - parse error, return failure. - -

    9. Increase dots seen by one. -

    - -
  11. -

    Finale: - If compress pointer is not null, run these substeps: - -

      -
    1. Let swaps be - piece pointercompress pointer. - -

    2. Set piece pointer to seven. - -

    3. While piece pointer is not zero and swaps is - greater than zero, swap piece with the - piece at pointer - compress pointer + swaps − 1, and then - decrease both piece pointer and swaps by one. -

    - -
  12. Otherwise, if compress pointer is null and - piece pointer is not eight, parse error, - return failure. - -

  13. Return address. -

- - -

Host serializing

- -

The host serializer takes null or a -host host and then runs -these steps: - -

    -
  1. If host is null, return the empty string. - -

  2. If host is an - IPv6 address, return - "[", followed by the result of running the - IPv6 serializer on host, - followed by "]". - -

  3. Otherwise, host is a domain, - return host. -

- -

The IPv6 serializer takes an -IPv6 address address and -then runs these steps: - -

    -
  1. Let output be the empty string. - -

  2. -

    Let compress pointer be a pointer to the first - 16-bit piece in the first longest - sequences of address's - 16-bit pieces that are 0. - -

    In 0:f:0:0:f:f:0:0 it would point to - the second 0. - -

  3. If there is no sequence of address's - 16-bit pieces that are 0 longer than - one, set compress pointer to null. - -

  4. -

    For each piece in address's - pieces, run these substeps: - -

      -
    1. If compress pointer points to - piece, append "::" to - output if piece is - address's first piece and append - ":" otherwise, and then run these substeps again with all - subsequent pieces in - address's pieces - that are 0 skipped or go the next step in the overall set of steps if - that leaves no pieces. - -

    2. Append piece, represented as the shortest - possible lowercase hexadecimal number, to output. - -

    3. If piece is not - address's last piece, - append ":" to output. -

    - -
  5. Return output. -

- -

This algorithm requires the recommendation from -A Recommendation for IPv6 Address Text Representation. -[[RFC5952]] - - - - - -

URLs

- - - -

A URL is a universal identifier. - -

A URL consists of components, namely a -scheme, -scheme data, -username, -password, -host, -port, -path, -query, and -fragment. - -

A URL's scheme is -a string that identifies the type of URL and can be used to -dispatch a URL for further processing after -parsing. It is initially the empty string. - -

A URL's -scheme data is a string holding the contents of a -URL. It is initially the empty string. - -

A URL's -scheme data will be its initial value if its -scheme is a relative scheme, and -otherwise will be the only component without an initial value. - -

A URL's username -is a string identifying a user. It is initially the empty string. - -

A URL's password -is either null or a string identifying a user's credentials. It is initially null. - -

A URL's host is -either null or a host. It is initially null. - -

A URL's port is a -string that identifies a networking port. It is initially the empty string. - -

A URL's path is a -list of zero or more strings holding data, usually identifying a location in hierarchical -form. It is initially the empty list. - -

A URL's query is -either null or a string holding data. It is initially null. - -

A URL's fragment -is either null or a string holding data that can be used for further processing on the -resource the URL's other components identify. -It is initially null. - -

A URL also has an associated relative flag. -It is initially unset. - -

The relative flag exists as checking if a -URL's scheme is a -relative scheme can give incorrect results due to the -protocol attribute. - - -

A URL also has an associated -object that is either null or a -Blob. It is initially null. -[[!FILEAPI]] - -

At this point this is used primarily to support "blob" -URLs, but others can be added going forward, hence "object". - - -

A relative scheme is a -scheme listed in the first column of -the following table. A default port is a -relative scheme's optional corresponding -port and is listed in the second column -on the same row. - - -
scheme - port -
"ftp""21" -
"file" -
"gopher""70" -
"http""80" -
"https""443" -
"ws""80" -
"wss""443" -
- - - -

A URL -includes credentials if either its -username is not the empty string or its -password is non-null. - - -

A URL can be designated as -base URL. - -

A base URL is useful for -the URL parser when the input is potentially a -relative URL. - - -

URL writing

- - - -

A URL must be written as either a -relative URL or an -absolute URL, optionally followed by -"#" and a -fragment. - -

An absolute URL must be a -scheme, followed by -":", followed by either a -scheme-relative URL, if -scheme is a relative scheme, or -scheme data otherwise, optionally followed -by "?" and a query. - -

A scheme must be one -ASCII alpha, followed by zero or more of -ASCII alphanumeric, "+", -"-", and ".". A -scheme must be registered -.... - -

The syntax of scheme data -depends on the scheme and is typically -defined alongside it. Standards must define -scheme data within the constraints of zero or -more URL units, excluding "?". - -

A relative URL must be either a -scheme-relative URL, an -absolute-path-relative URL, -or a path-relative URL that -does not start with a scheme and -":", optionally followed by a "?" and -a query. - -

At the point where a relative URL is -parsed, a -base URL must be in scope. - -

A scheme-relative URL must be -"//", optionally followed by -userinfo and "@", -followed by a host, optionally followed -by ":" and a port, -optionally followed by an -absolute-path-relative URL. - -

Userinfo must be a -username, optionally followed by a -":" and a -password. - -

A username must be zero or more -URL units, excluding "/", -":, "?", and "@". - - -

A password must be zero or more -URL units, excluding "/", -"?", and "@". - -

A port must be zero or more -ASCII digits. - -

An -absolute-path-relative URL -must be "/", followed by a -path-relative URL that does not -start with "/". - -

A path-relative URL must be zero or -more path segments separated from each -other by a "/". - -

A path segment must be zero or more URL units, -excluding "/" and "?". - -

A query must be zero or more -URL units. - -

A fragment must be zero or more -URL units. - -

The URL code points are ASCII alphanumeric, -"!", -"$", -"&", -"'", -"(", -")", -"*", -"+", -",", -"-", -".", -"/", -":", -";", -"=", -"?", -"@", -"_", -"~", -and code points in the ranges -U+00A0 to U+D7FF, -U+E000 to U+FDCF, -U+FDF0 to U+FFFD, -U+10000 to U+1FFFD, -U+20000 to U+2FFFD, -U+30000 to U+3FFFD, -U+40000 to U+4FFFD, -U+50000 to U+5FFFD, -U+60000 to U+6FFFD, -U+70000 to U+7FFFD, -U+80000 to U+8FFFD, -U+90000 to U+9FFFD, -U+A0000 to U+AFFFD, -U+B0000 to U+BFFFD, -U+C0000 to U+CFFFD, -U+D0000 to U+DFFFD, -U+E0000 to U+EFFFD, -U+F0000 to U+FFFFD, -U+100000 to U+10FFFD. - -

Code points higher than U+009F will be converted to -percent-encoded bytes by the -URL parser, except for code points appearing in -fragments. - -

The URL units are URL code points and -percent-encoded bytes. - - -

URL parsing

- -

Add the ability to halt on the first conformance error. - -

The URL parser takes a string -input, optionally with a -base URL base, and -optionally with an encoding -encoding override, and then runs these steps: - -

    -
  1. Let url be the result of running the - basic URL parser on input - with base, and encoding override as provided. - -

  2. If url is failure, return failure. - -

  3. If url's scheme is not - "blob", return url. - -

  4. If url's scheme data - is not in the blob URL store, return - url. [[!FILEAPI]] - -

  5. Set url's object to a - structured clone of the entry in the - blob URL store corresponding to - url's scheme data. - [[!HTML]] - -

  6. Return url. -

- -
- -

The basic URL parser takes a string -input, optionally with a -base URL base, -optionally with an encoding -encoding override, optionally with an -URL url and a state override -state override, and then runs these steps: - -

-

The encoding override argument is a legacy concept only relevant for - HTML. The url and state override arguments are only for - use by methods of objects implementing the URLUtils interface. - [[!HTML]] - -

When the url and state override arguments are not - passed the basic URL parser returns either a - URL or failure. If they are passed the - algorithm simply modifies the passed url and can terminate without - returning anything. -

- -
    -
  1. -

    If url is not given: - -

      -
    1. Set url to a new URL. - -

    2. Remove any leading and trailing - ASCII whitespace from - input. -

    - -
  2. Let state be state override - if given, or scheme start state otherwise. - -

  3. If base is not given, set it to null. - -

  4. If encoding override is not given, set it to - utf-8. - -

  5. Let buffer be the empty string. - -

  6. Let the @ flag and the [] flag be - unset. - -

  7. Let pointer be a pointer to first code point in - input. - -

  8. -

    Keep running the following state machine by switching on state. If - after a run pointer points to the EOF code point, go to - the next step. Otherwise, increase pointer by one and continue with the - state machine. - -

    -
    scheme start state -
    -
      -
    1. If c is an ASCII alpha, - append c, lowercased, to buffer, and - set state to scheme state. - -

    2. Otherwise, if state override is not given, set - state to no scheme state, and decrease - pointer by one. - -

    3. Otherwise, parse error, terminate this algorithm. -

    - -
    scheme state -
    -
      -
    1. If c is an ASCII alphanumeric, - "+", "-", or - ".", append c, lowercased, to - buffer. - -

    2. -

      Otherwise, if c is ":", set - url's scheme to - buffer, buffer to the empty string, - and then run these substeps: - -

        -
      1. If state override is given, - terminate this algorithm. - -

      2. If url's - scheme is - a relative scheme, set url's - relative flag. - -

      3. If url's - scheme is - "file", set state to - relative state. - -

      4. Otherwise, if url's - relative flag is set, base is not null - and base's - scheme is equal to - url's scheme, - set state to - relative or authority state. - -

      5. Otherwise, if url's - relative flag is set, set state to - authority first slash state. - -

      6. Otherwise, set state to - scheme data state. -

      - -
    3. Otherwise, if state override is not given, set - buffer to the empty string, state to - no scheme state, and start over (from the first code point - in input). - -

    4. Otherwise, if c is the - EOF code point, terminate this algorithm. - - -

    5. Otherwise, parse error, terminate this algorithm. -

    - -
    scheme data state -
    -
      -
    1. If c is "?", set - url's query - to the empty string and state to - query state. - -

    2. Otherwise, if c is "#", set - url's fragment - to the empty string and state to - fragment state. - -

    3. -

      Otherwise, run these substeps: - -

        -
      1. If c is not the EOF code point, not a - URL code point, and not - "%", parse error. - -

      2. If c is "%" and remaining does - not start with two ASCII hex digits, parse error. - -

      3. If c is none of - EOF code point, U+0009, U+000A, and U+000D, - utf-8 percent encode c using the - simple encode set, and append the result to - url's - scheme data. -

      -
    - -
    no scheme state -
    -

    If base is null, or base's - scheme is not a - relative scheme, parse error, return failure. - -

    Due to the protocol attribute's - ability to change base's - scheme, base's - relative flag is not used here. - -

    Otherwise, set state to relative state, - and decrease pointer by one. - -

    relative or authority state -
    -

    If c is "/" and - remaining starts with "/", set - state to authority ignore slashes state - and increase pointer by one. - -

    Otherwise, parse error, set state to - relative state and decrease pointer by - one. - -

    relative state -
    -

    Set url's relative flag, set - url's scheme to - base's scheme if - url's scheme is not - "file", and then, based on c: - -

    -
    EOF code point -
    -

    Set url's host - to base's host, - url's port to - base's port, - url's path to - base's path, and - url's query to - base's query. - -

    "/" -
    "\" -
    -
      -
    1. If c is "\", - parse error. -

    2. Set state to - relative slash state. -

    - -
    "?" -

    Set - url's host to - base's host, - url's port to - base's port, - url's path to - base's path, - url's query to the empty string, - and state to query state. - -

    "#" -

    Set - url's host to - base's host, - url's port to - base's port, - url's path to - base's path, - url's query to - base's query, - url's fragment to the empty string, - and state to fragment state. - -

    Otherwise -
    -
      -
    1. -

      If url's scheme is not - "file", or c is not an - ASCII alpha, or remaining does not start with either - ":" or "|", or remaining - consists of one code point, or remaining's second code point is - not one of "/", "\", "?", - and "#", then set - url's host to - base's host, - url's port to - base's port, - url's path to - base's path, and then remove - url's path's last entry. - - -

      This is a (platform-independent) Windows drive letter quirk. - When found at the start of a file URL it is treated as an - absolute path rather than one relative to - base's path. - -

    2. Set state to relative path state, - and decrease pointer by one. -

    -
    - -
    relative slash state -
    -

    If c is either "/" or - "\", run these steps: - -

      -
    1. If c is "\", - parse error. - -

    2. If url's - scheme is - "file", set state to - file host state. - -

    3. Otherwise, set state to - authority ignore slashes state. -

    - -

    Otherwise, run these steps: - -

      -
    1. -

      If url's scheme is not - "file", set - url's host to - base's host and - url's port to - base's port. - -

      file:/path/ will not inherit - base's host. - -

    2. Set state to relative path state, - and decrease pointer by one. -

    - -
    authority first slash state -
    -

    If c is "/", set - state to authority second slash state. - -

    Otherwise, parse error, set state to - authority ignore slashes state, and decrease - pointer by one. - -

    authority second slash state -
    -

    If c is "/", set - state to authority ignore slashes state. - -

    Otherwise, parse error, set state to - authority ignore slashes state, and decrease - pointer by one. - -

    authority ignore slashes state -
    -

    If c is neither "/" nor - "\", set state to - authority state, and decrease pointer by one. - -

    Otherwise, parse error. - -

    authority state -
    -
      -
    1. -

      If c is "@", run these substeps: - -

        -
      1. If the @ flag is set, - parse error, prepend "%40" to - buffer. - -

      2. Set the @ flag. - -

      3. -

        For each code point in buffer, run these substeps: - -

          -
        1. If code point is U+0009, U+000A, or U+000D, - parse error, continue. - -

        2. If code point is not a - URL code point and not - "%", parse error. - -

        3. If code point is "%" and - remaining does not start with two - ASCII hex digits, parse error. - -

        4. If code point is ":" and - url's - password is null, set - url's password - to the empty string and continue. - -

        5. utf-8 percent encode code point using - the default encode set and append the result to - url's password - if url's password - is non-null, and to - url's username - otherwise. -

        -
      4. Set buffer to the empty string. -

      - -
    2. Otherwise, if c is one of EOF code point, - "/", "\", "?", - and "#", decrease pointer by the - number of code points in buffer plus one, set - buffer to the empty string, and - state to host state. - -

    3. Otherwise, append c to buffer. -

    - -
    file host state -
    -
      -
    1. -

      If c is one of EOF code point, - "/", "\", "?", - and "#", decrease pointer by one, - and run these substeps: - -

        -
      1. -

        If buffer consists of two code points, of - which the first is an ASCII alpha and the second is - either ":" or "|", set - state to relative path state. - -

        This is a (platform-independent) Windows drive letter quirk. - buffer is not reset here and instead used in the - relative path state. - -

      2. Otherwise, if buffer is the empty string, set - state to relative path start state. - -

      3. -

        Otherwise, run these steps: - -

          -
        1. Let host be the result of - host parsing - buffer. - -

        2. If host is failure, return failure. - -

        3. Set - url's host to - host, buffer to the empty string, - and state to relative path start state. -

        -
      - -
    2. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    3. Otherwise, append c to buffer. -

    - -
    host state -
    hostname state -
    -
      -
    1. -

      If c is ":" and the - [] flag is unset, run these substeps: - -

        -
      1. Let host be the result of - host parsing - buffer. - -

      2. If host is failure, return failure. - -

      3. Set url's host to - host, buffer to the empty string, - and state to port state. - -

      4. If state override is hostname state, - terminate this algorithm. -

      - -
    2. -

      Otherwise, if c is the - EOF code point, "/", - "\", "?", or - "#", decrease pointer by one, and - run these substeps: - -

        -
      1. Let host be the result of - host parsing - buffer. - -

      2. If host is failure, return failure. - -

      3. Set url's host to - host, buffer to the empty string, - and state to relative path start state. - -

      4. If state override is given, terminate this - algorithm. -

      - -
    3. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    4. -

      Otherwise, run these substeps: - -

        -
      1. If c is "[", set the - [] flag. - -

      2. If c is "]", unset the - [] flag. - -

      3. Append c to buffer. -

      -
    - -
    port state -
    -
      -
    1. If c is an ASCII digit, - append c to buffer. - -

    2. -

      Otherwise, if c is one of - EOF code point, "/", - "\", "?", and - "#", or state override is given, run - these substeps: -

        -
      1. -

        Remove leading U+0030 code points from buffer - until either the leading code point is not U+0030 or - buffer is one code point. - -

        - -
        InputOutput -
        "42""42" -
        "031""31" -
        "080""80" -
        "0000""0" -
        -
        - -
      2. If buffer is equal to - url's scheme's - default port, set buffer to the empty - string. - -

      3. Set url's - port to buffer. - -

      4. If state override is given, terminate this - algorithm. - -

      5. Set buffer to the empty string, - state to relative path start state, and - decrease pointer by one. -

      - -
    3. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    4. Otherwise, parse error, return failure. -

    - -
    relative path start state -
    -
      -
    1. If c is "\", - parse error. - -

    2. Set state to relative path state - and if c is neither "/" nor - "\", decrease pointer by one. -

    - -
    relative path state -
    -
      -
    1. -

      If either c is one of - EOF code point, "/", and - "\", or state override is not given and - c is one of "?" and - "#", run these substeps: -

        -
      1. If c is "\", parse error. - -

      2. -

        If buffer, lowercased, matches any row in the first column of - the following table, set buffer to the contents of the cell in - the second column of the matched row: - - -
        "%2e" "." -
        ".%2e" ".." -
        "%2e." -
        "%2e%2e" -
        - -

      3. If buffer is "..", remove - url's path's last entry, if - any, and then if c is neither "/" nor - "\", append the empty string to url's - path. - -

      4. Otherwise, if buffer is "." and - c is neither "/" nor "\", - append an empty string to - url's path. - -

      5. -

        Otherwise, if buffer is not - ".", run these subsubsteps: - -

          -
        1. -

          If url's scheme is - "file", url's path - is empty, buffer consists of two - code points, of which the first is an ASCII alpha, - and the second is "|", replace the second code point in - buffer with ":". - -

          This is a (platform-independent) Windows drive letter quirk. - They are beautiful, no? - -

        2. Append buffer to - url's path. -

        - -
      6. Set buffer to the empty string. - -

      7. If c is "?", set - url's query to the empty string, - and state to query state. - -

      8. If c is "#", set - url's fragment to the empty string, - and state to fragment state. -

      - -
    2. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    3. -

      Otherwise, run these steps: - -

        -
      1. If c is not a - URL code point and not "%", - parse error. - -

      2. If c is "%" and remaining does - not start with two ASCII hex digits, parse error. - -

      3. utf-8 percent encode c using the - default encode set, and append the result to - buffer. -

      -
    - -
    query state -
    -
      -
    1. -

      If c is the EOF code point, or - state override is not given and c - is "#", run these substeps: - -

        -
      1. If url's relative flag is unset or - url's scheme is either - "ws" or "wss", set - encoding override to utf-8. - - -

      2. Set buffer to the result of - encoding - buffer using encoding override. - -

      3. -

        For each byte in buffer run - these subsubsteps: - -

          -
        1. If byte is less than 0x21, greater than - 0x7E, or is one of 0x22, 0x23, 0x3C, 0x3E, and 0x60, append - byte, - percent encoded, to - url's query. - -

        2. Otherwise, append a code point whose value is - byte to url's - query. -

        - -
      4. Set buffer to the empty string. - -

      5. If c is "#", set - url's - fragment to the empty string, - and state to fragment state. -

      - -
    2. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    3. -

      Otherwise, run these substeps: - -

        -
      1. If c is not a - URL code point and not "%", - parse error. - -

      2. If c is "%" and remaining does - not start with two ASCII hex digits, parse error. - -

      3. Append c to buffer. -

      -
    - -
    fragment state -
    -

    Based on c: -

    -
    EOF code point -

    Do nothing. - -

    U+0000 -
    U+0009 -
    U+000A -
    U+000D -

    Parse error. - -

    Otherwise -
    -
      -
    1. If c is not a URL code point and not - "%", parse error. - -

    2. If c is "%" and remaining does - not start with two ASCII hex digits, parse error. - -

    3. -

      Append c to url's fragment. - -

      Unfortunately not using percent-encoding is intentional as - implementations with majority market share exhibit this behavior. - -

    -
    -
    - -
  9. Return url. -

- -
- -

To set the username given a url and -username, run these steps: - -

    -
  1. Set url's username to the - empty string. - -

  2. For each code point in username, - utf-8 percent encode it using the username encode set, and - append the result to url's - username. -

- -

To set the password given a url and -password, run these steps: - -

    -
  1. If password is the empty string, set url's - password to null. - -

  2. -

    Otherwise, run these substeps: - -

      -
    1. Set url's password to - the empty string. - -

    2. For each code point in password, - utf-8 percent encode it using the password encode set, and - append the result to url's - password. -

    -
- - -

URL serializing

- -

The URL serializer takes a -URL url, -optionally an exclude fragment flag, and then runs these steps: - -

    -
  1. Let output be url's - scheme and - ":" concatenated. - -

  2. -

    If url's relative flag is set: - -

      -
    1. Append "//" to output. - -

    2. -

      If url's - username is not the empty string - or url's - password is non-null, run these - substeps: - -

        -
      1. Append url's - username to - output. - -

      2. If url's - password is non-null, append - ":" concatenated with url's - password to - output. - -

      3. Append "@" to output. -

      - -
    3. Append url's - host, - serialized, to - output. - -

    4. If url's port - is not the empty string, append ":" concatenated with - url's port to - output. - -

    5. Append "/" concatenated with the strings in - url's path - (including empty strings), separated from each other by - "/" to output. -

    - -
  3. Otherwise, if url's relative flag is - unset, append url's - scheme data to - output. - -

  4. If url's query is non-null, - append "?" concatenated with url's - query to output. - -

  5. If the exclude fragment flag is unset and - url's fragment is - non-null, append "#" concatenated with - url's fragment to - output. - -

  6. Return output. -

- -

Origin

- - -

See origin's definition in HTML for the -necessary background information. [[!HTML]] - -

A URL's origin is -the origin returned by running these steps, switching -on URL's scheme: - -

-
"blob" -
-

Let url be the result of - parsing - URL's - scheme data. - -

If url is failure, return a new globally unique identifier. - Otherwise, return url's origin. - - -

The origin of - blob:https://whatwg.org/d0360e2f-caee-469f-9a2f-87d5b0456f6f is - the tuple - (https, whatwg.org, 443). - -

"ftp" -
"gopher" -
"http" -
"https" -
"ws" -
"wss" -

Return a tuple consisting of URL's - scheme, its - host, and its default port if its - port is the empty string, and its - port otherwise. - -

"file" -

Unfortunate as it is, this is left as an exercise to the reader. When in doubt, - return a new globally unique identifier. - -

Otherwise -

Return a new globally unique identifier. -

- - -

application/x-www-form-urlencoded

- -

The application/x-www-form-urlencoded format is a simple way to -encode name-value pairs in a byte sequence where all bytes are in the 0x00 to 0x7F range. - -

While this description makes -application/x-www-form-urlencoded sound dated — and really, it is — the -format is in widespread use due to its prevalence of HTML forms. -[[!HTML]] - -

application/x-www-form-urlencoded parsing

- -

The features provided by the -application/x-www-form-urlencoded parser -are mainly relevant for server-oriented implementations. A browser-based implementation -only needs what the -application/x-www-form-urlencoded string parser -requires. - -

The -application/x-www-form-urlencoded parser -takes a byte sequence input, optionally with an -encoding encoding override, -optionally with a use _charset_ flag, and optionally with an -isindex flag, and then runs these steps: - -

    -
  1. If encoding override is not given, set it to - utf-8. - -

  2. -

    If encoding override is not - utf-8 and input contains bytes - whose value is greater than 0x7F, return failure. - -

    This can only happen if input was not - generated through the serializer or - URLSearchParams. - -

  3. Let sequences be the result of splitting - input on `&`. - - -

  4. If the isindex flag is set and the first byte sequence in - sequences does not contain a `=`, prepend - `=` to the first byte sequence in sequences. - -

  5. Let pairs be an empty list of name-value pairs where both name - and value hold a byte sequence. - -

  6. -

    For each byte sequence bytes in sequences, - run these substeps: - -

      -
    1. If bytes is the empty byte sequence, run these substeps for the - next byte sequence. - -

    2. If bytes contains a `=`, then let - name be the bytes from the start of bytes up to but - excluding its first `=`, and let value be the - bytes, if any, after the first `=` up to the end of - bytes. If `=` is the first byte, then - name will be the empty byte sequence. If it is the last, then - value will be the empty byte sequence. - -

    3. Otherwise, let name have the value of bytes - and let value be the empty byte sequence. - -

    4. Replace any `+` in name and - value with 0x20. - -

    5. -

      If use _charset_ flag is set, name is - `_charset_`, run these substeps: - -

        -
      1. Let result be the result of - getting an encoding - for value, - decoded. - -

      2. If result is not failure, unset use _charset_ flag and - set encoding override to result. -

      - -
    6. Add a pair consisting of name and - value to pairs. -

    - -
  7. Let output be an empty list of name-value pairs where both name - and value hold a string. - -

  8. For each name-value pair in pairs, append a name-value pair to - output where the new name and value appended to output - are the result of running encoding override's - decoder on the - percent decoding of the name and value from - pairs, respectively. - -

  9. Return output. -

- -

application/x-www-form-urlencoded serializing

- -

The -application/x-www-form-urlencoded byte serializer -takes a byte sequence input and then runs these steps: - -

    -
  1. Let output be the empty string. -

  2. -

    For each byte in input, depending on - byte: - -

    -
    0x20 -

    Append U+002B to output. - -

    0x2A -
    0x2D -
    0x2E -
    0x30 to 0x39 -
    0x41 to 0x5A -
    0x5F -
    0x61 to 0x7A -

    Append a code point whose value is byte to - output. - -

    Otherwise -

    Append byte, - percent encoded, to - output. -

    -
  3. Return output. -

- - -

The -application/x-www-form-urlencoded serializer -takes a list of name-value pairs pairs, optionally with an -encoding -encoding override, and then runs these steps: - -

    -
  1. If encoding override is not given, set it to - utf-8. - -

  2. Let output be the empty string. - -

  3. -

    For each pair in pairs, run - these substeps: - -

      -
    1. Let outputPair be a copy of pair. - -

    2. Replace outputPair's name and value with the result of running - encode on them using - encoding override, respectively. - -

    3. Replace outputPair's name and value with their - serialization. - -

    4. If pair is not the first pair in pairs, append - "&" to output. - -

    5. Append outputPair's name, followed by "=", - followed by outputPair's value to output. -

    - -
  4. Return output. -
- -

Hooks

- -

The -application/x-www-form-urlencoded string parser -takes a string input, -utf-8 encodes it, and then -returns the result of -application/x-www-form-urlencoded parsing -it. - - - -

API

- - - -
[Constructor(USVString url, optional USVString base = "about:blank"),
- Exposed=(Window,Worker)]
-interface URL {
-  static USVString domainToASCII(USVString domain);
-  static USVString domainToUnicode(USVString domain);
-};
-URL implements URLUtils;
-
-[NoInterfaceObject,
- Exposed=(Window,Worker)]
-interface URLUtils {
-  stringifier attribute USVString href;
-  readonly attribute USVString origin;
-
-           attribute USVString protocol;
-           attribute USVString username;
-           attribute USVString password;
-           attribute USVString host;
-           attribute USVString hostname;
-           attribute USVString port;
-           attribute USVString pathname;
-           attribute USVString search;
-           attribute URLSearchParams searchParams;
-           attribute USVString hash;
-};
-
-[NoInterfaceObject,
- Exposed=(Window,Worker)]
-interface URLUtilsReadOnly {
-  stringifier readonly attribute USVString href;
-  readonly attribute USVString origin;
-
-  readonly attribute USVString protocol;
-  readonly attribute USVString host;
-  readonly attribute USVString hostname;
-  readonly attribute USVString port;
-  readonly attribute USVString pathname;
-  readonly attribute USVString search;
-  readonly attribute USVString hash;
-};
- -

Except where different objects implementing URLUtilsReadOnly are identical -to objects implementing URLUtils. - -

Since all members are readonly and certain members from -URLUtils are not exposed a number of potential optimizations is possible -compared to objects implementing URLUtils. These are left as an exercise to -the reader. - - - -

Specifications defining objects implementing URLUtils or -URLUtilsReadOnly must define a -get the base algorithm, which must return the -appropriate base URL for the object. - -

Specifications defining objects implementing URLUtils may -define update steps to make it possible for an -underlying string (such as an -attribute value) -to be updated. The update steps are passed a string -value for this purpose. - -

An object implementing URLUtils or URLUtilsReadOnly has an -associated input (a string), -query encoding -(an encoding), -query object -(a URLSearchParams object or null), and a -url (a URL or null). - -Unless stated otherwise, query encoding is -utf-8 and -query object is null. The others follow -from the set the input algorithm. - -

The associated -query encoding is a legacy -concept only relevant for HTML. -[[!HTML]] - -

Specifications defining objects implementing URLUtils or -URLUtilsReadOnly must use the -set the input algorithms to set -input, url, and -query object. To -set the input given input -and optionally a url, run these steps: - -

    -
  1. If url is given, set url - to url and input to - input. - -

  2. -

    Otherwise, run these substeps: - -

      -
    1. Set url to null. - -

    2. If input is null, set - input to the empty string. - -

    3. -

      Otherwise, run these subsubsteps: - -

        -
      1. Set input to input. - -

      2. Let url be the result of running the - URL parser on - input with - base URL being the result of running - get the base and - query encoding as - encoding override. - - -

      3. If url is not failure, set - url to url. -

      -
    - -
  3. Let query be url's - query if url - is non-null, and the empty string otherwise. - -

  4. If query object is null, set - query object to a - new URLSearchParams object - using query, and then append the - context object to - query object's list of - url objects. - -

  5. Otherwise, set query object's - list to the result of - parsing query. -

- -

To run the pre-update steps for an object implementing -URLUtils, optionally given a value, run these steps: - -

    -
  1. If value is not given, let value be the result - of serializing the associated - url. - - -

  2. Run the update steps with - value. -

- - -

Constructors

- -

The -URL(url, base) -constructor, when invoked, must run these steps: - -

    -
  1. Let parsedBase be the result of running the - basic URL parser on base. - -

  2. If parsedBase is failure, - throw a TypeError exception. - -

  3. Set parsedURL to the result of running the - basic URL parser on url - with parsedBase. - -

  4. If parsedURL is failure, - throw a TypeError exception. - -

  5. Let result be a new URL object. - -

  6. Let result's - get the base return - parsedBase. - -

  7. -

    Run result's - set the input given the empty string - and parsedURL. - -

    A URL object's - input is never exposed. - -

  8. Return result. -

- -
-

To Basic URL parse a string into a - URL without using a - base URL, invoke the constructor with a single - argument: - -

var input = "https://example.org/💩",
-    url = new URL(input)
-url.pathname // "/%F0%9F%92%A9"
- -

Alternatively you can use the base URL of a - document through - baseURI: - -

var input = "/💩",
-    url = new URL(input, document.baseURI)
-url.href // "https://url.spec.whatwg.org/%F0%9F%92%A9"
-
- - -

URL statics

- -

The -domainToASCII(domain) -static method, when invoked, must run these steps: - -

    -
  1. Let asciiDomain be the result of - host parsing domain. - -

  2. If asciiDomain is an IPv6 address - or failure, return the empty string. - -

  3. Return asciiDomain. -

- -

The -domainToUnicode(domain) -static method, when invoked, must run these steps: - -

    -
  1. Let unicodeDomain be the result of - host parsing domain with the - Unicode flag set. - -

  2. If unicodeDomain is an - IPv6 address or failure, return the empty string. - -

  3. Return unicodeDomain. -

- -

Add domainToUI() which follows the UA conventions for when to use the Unicode -representation? - - -

URLUtils and URLUtilsReadOnly members

- -

The URLUtils and URLUtilsReadOnly interfaces are -not exposed on the global object. They are meant to augment other interfaces, such as -URL. - -

The href attribute's getter must run -these steps: - -

    -
  1. If url is null, return - input. - -

  2. Return the serialization - of url. -

- -

The href attribute's setter must run these steps: - -

    -
  1. Let input be the given value. - -

  2. -

    If the context object is a - URL object, run these substeps: - -

      -
    1. Let parsedURL be the result of running the - basic URL parser on input - with base URL being the result of running - get the base. - -

    2. If parsedURL is failure, - throw a TypeError exception. - -

    3. -

      Run set the input given the empty - string and parsedURL. - -

      A URL object's - input is never exposed. -

    - -
  3. -

    Otherwise, run these substeps: - -

      -
    1. Run the set the input - algorithm for input. - -

    2. Run the pre-update steps with the input. -

    - -
    -

    This means that if the href attribute is set to - value that would cause the URL parser to return - failure, that value is still passed through unchanged. This is one of those unfortunate - legacy incidents. - -

    var a = document.createElement("a"),
    -    input = "https://test:test/" // invalid port makes the parser return failure
    -a.href = test
    -a.href === test // true
    -
    -
- -

The origin attribute's getter must -run these steps: - -

    -
  1. If url is null, return the empty - string. - -

  2. Return the - Unicode serialization - of url's origin. - [[!HTML]] -

- -

It returns the Unicode rather than the ASCII serialization for -compatibility with HTML's MessageEvent feature. -[[!HTML]] - -

The protocol attribute's getter -must run these steps: - -

    -
  1. If url is null, return - ":". - -

  2. Return scheme and - ":" concatenated. -

- -

The protocol attribute's setter must -run these steps: - -

    -
  1. If url is null, terminate - these steps. - -

  2. Basic URL parse the given value and - ":" concatenated with - url as url and - scheme start state as state override. - -

  3. Run the pre-update steps. -

- -

The username attribute's getter -must run these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. Return username. -

- -

The username attribute's setter must -run these steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Set the username given url - and the given value. - -

  3. Run the pre-update steps. -

- -

The password attribute's getter -must run these steps: - -

    -
  1. If url is null or its - password is null, return the empty - string. - -

  2. Return password. -

- -

The password attribute's setter must -run these steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Set the password given url - and the given value. - -

  3. Run the pre-update steps. -

- -

The host attribute's getter must run -these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. If port is the empty string, - return host, - serialized. - -

  3. Return host, - serialized, - ":", and port - concatenated. -

- -

The host attribute's setter must run these -steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Basic URL parse the given value with - url as url and - host state as state override. - -

  3. Run the pre-update steps. -

- -

The hostname attribute's getter -must run these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. Return host, - serialized. -

- -

The hostname attribute's setter must -run these steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Basic URL parse the given value with - url as url and - hostname state as state override. - -

  3. Run the pre-update steps. -

- -

The port attribute's getter must run -these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. Return port. -

- -

The port attribute's setter must run these steps: - -

    -
  1. If url is null, its - relative flag is unset, or its - scheme is "file", - terminate these steps. - -

  2. Otherwise, Basic URL parse - the given value with url as url and - port state as state override. - -

  3. Run the pre-update steps. -

- -

The pathname attribute's getter -must run these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. If the relative flag is unset, return - scheme data. - -

  3. Return "/" concatenated with the strings in - path (including empty strings), - separated from each other by "/". -

- -

The pathname attribute's setter must -run these steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Empty path. - -

  3. Basic URL parse the given value with - url as url and - relative path start state as state override. - -

  4. Run the pre-update steps. -

- -

The search attribute's getter must -run these steps: - -

    -
  1. If url is null, or its - query is either null or - the empty string, return the empty string. - -

  2. Return "?" concatenated with - query. -

- -

The search attribute's setter must run these steps: - -

    -
  1. If url is null, terminate these steps. - -

  2. If the given value is the empty string, set - query to null, empty - query object's - list, run its - update steps, and terminate these - steps. - -

  3. Let input be the given value with a single leading - "?" removed, if any. - -

  4. Set query to the empty string. - -

  5. Basic URL parse input - with url as url, - query state as state override, and the - associated query encoding as - encoding override. - -

  6. Set query object's - list to the result of - parsing input. - -

  7. Run query object's - update steps. -

- -

The update steps of -query object are run to ensure all -url objects remain synchronized. - -

The searchParams attribute's getter -must return the query object. - -

The searchParams attribute's setter must run -these steps: - -

    -
  1. Let object be the given value. - -

  2. Remove the context object from - query object's list of - url objects. - -

  3. Append the context object to - object's list of - url objects. - -

  4. Set query object to - object. - -

  5. Set query to the - serialization of the - query object's - list. - -

  6. Run the pre-update steps. -

- -

The hash attribute's getter must run -these steps: - -

    -
  1. If url is null, or its - fragment is either null or - the empty string, return the empty string. - -

  2. Return "#" concatenated with - fragment. -

- -

The hash attribute's setter must run these steps: - -

    -
  1. If url is null, or its - scheme is - "javascript", terminate these steps. - -

  2. If the given value is the empty string, set - fragment to null, run the - pre-update steps, and terminate these steps. - -

  3. Let input be the given value with a single leading - "#" removed, if any. - -

  4. Set fragment to - the empty string. - -

  5. Basic URL parse input - with url as url and - fragment state as state override. - -

  6. Run the pre-update steps. -

- - -

Interface URLSearchParams

- -
[Constructor(optional (USVString or URLSearchParams) init = ""),
- Exposed=(Window,Worker)]
-interface URLSearchParams {
-  void append(USVString name, USVString value);
-  void delete(USVString name);
-  USVString? get(USVString name);
-  sequence<USVString> getAll(USVString name);
-  boolean has(USVString name);
-  void set(USVString name, USVString value);
-  iterable<USVString, USVString>;
-  stringifier;
-};
- -

A URLSearchParams object has an associated -list of name-value pairs, which is initially -empty. - -

A URLSearchParams object has an associated list of zero or more -url objects, which is initially empty. - -

URLSearchParams objects always use -utf-8 as -encoding, despite the existence of -concepts such as -query encoding. This is to -encourage developers to migrate towards -utf-8, which they really ought to -have done a long time ago now. - -

To create a -new URLSearchParams object, optionally -using init, run these steps: - -

    -
  1. Let query be a new URLSearchParams object. - -

  2. If init is the empty string or null, return - query. - - -

  3. If init is a string, - set query's list to the - result of parsing - init. - -

  4. If init is a URLSearchParams object, set - query's list to a copy - of init's list. - -

  5. Return query. -

- -

A URLSearchParams object's -update steps are to run these steps for -each associated url object -urlObject, in order: - -

    -
  1. Set urlObject's url's - query to the - serialization of - URLSearchParams object's - list. - -

  2. Run urlObject's pre-update steps. -

- -

The -URLSearchParams(init) -constructor, when invoked, must return a -new URLSearchParams object -using init if given. - -

The -append(name, value) -method, when invoked, must run these steps: - -

    -
  1. Append a new name-value pair whose name is name and - value is value, to list. - -

  2. Run the update steps. -

- -

The -delete(name) -method, when invoked, must run these steps: - -

    -
  1. Remove all name-value pairs whose name is name from - list. - -

  2. Run the update steps. -

- -

The -get(name) -method, when invoked, must return the value of the first name-value pair whose name is -name in list, and null if -there is no such pair. - -

The -getAll(name) -method, when invoked, must return the values of all name-value pairs whose name is -name, in list, -in list order, and the empty sequence otherwise. - -

The -set(name, value) -method, when invoked, must run these steps: - -

    -
  1. If there are any name-value pairs whose name is name, in - list, set the value of the first such - name-value pair to value and remove the others. - -

  2. Otherwise, append a new name-value pair whose name is name and - value is value, to list. - -

  3. Run the update steps. -

- -

The -has(name) -method, when invoked, must return true if there is a name-value pair whose name is -name in list, and false -otherwise. - -

The value pairs to iterate over are the -list name-value pairs with the key being -the name and the value the value. - -

The stringification behavior must return the -serialization of the -URLSearchParams object's -list. - - -

URL APIs elsewhere

- -

A standard that exposes URLs, should expose the -URL as a string (by serializing an internal -URL). A standard should not expose a URL using a -URL object. URL objects are -meant for URL manipulation. In IDL the USVString type should be used. - -

The higher-level notion here is that values are to be exposed as immutable -data structures. - -

If a standard decides to use a variant of the name "URL" for a feature it defines, it -should name such a feature "url" (i.e. lowercase and with an "l" at the end). Names such -as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" -(i.e. uppercase) is preferred, e.g. "newURL" and "oldURL". - -

The {{EventSource}} and -{{HashChangeEvent}} interfaces in HTML are examples of -proper naming. [[!HTML]] - - - -

Acknowledgments

- -

There have been a lot of people that have helped make -URLs more interoperable over the years and -thereby furthered the goals of this standard. Likewise many people have helped making this -standard what it is today. - -

With that, many thanks to -Adam Barth, -Albert Wiersch, -Alexandre Morgaut, -Arkadiusz Michalski, -Behnam Esfahbod, -Bobby Holley, -Boris Zbarsky, -Brandon Ross, -Dan Appelquist, -Daniel Bratell, -David Håsäther, -David Sheets, -David Singer, -Erik Arvidsson, -Gavin Carothers, -Geoff Richards, -Glenn Maynard, -Henri Sivonen, -Ian Hickson, -James Graham, -James Manger, -James Ross, -Joshua Bell, -Kevin Grandon, -Larry Masinter, -Mark Davis, -Marcos Cáceres, -Martin Dürst, -Mathias Bynens, -Michael Peick, -Michael™ Smith, -Michel Suignard, -Peter Occil, -Rodney Rehm, -Roy Fielding, -Santiago M. Mola, -Simon Pieters, -Simon Sapin, -Tab Atkins, -Tantek Çelik, -Tim Berners-Lee, -Vyacheslav Matva, and -成瀬ゆい (Yui Naruse) -for being awesome! - -

This standard is written by -Anne van Kesteren -(Mozilla, -annevk@annevk.nl) -and -Sam Ruby -(IBM, -rubys@intertwingly.net). - -

The upstream draft at https://url.spec.whatwg.org/ is licensed under CC0. - -

-{
-    "IDNA": {
-        "authors": [
-            "Mark Davis",
-            "Michel Suignard"
-        ],
-        "href": "http://www.unicode.org/reports/tr46/",
-        "title": "Unicode IDNA Compatibility Processing",
-        "publisher": "Unicode Consortium"
-    },
-    "DOM": {
-        "authors": [
-            "Anne van Kesteren",
-            "Aryeh Gregor",
-            "Ms2ger"
-        ],
-        "href": "https://dom.spec.whatwg.org/",
-        "title": "DOM",
-        "publisher": "WHATWG"
-    },
-    "ENCODING": {
-        "authors": [
-            "Anne van Kesteren"
-        ],
-        "href": "https://encoding.spec.whatwg.org/",
-        "title": "Encoding",
-        "publisher": "WHATWG"
-    },
-    "FILEAPI": {
-        "authors": [
-            "Arun Ranganathan",
-            "Jonas Sicking"
-        ],
-        "href": "http://dev.w3.org/2006/webapi/FileAPI/",
-        "title": "File API",
-        "publisher": "W3C"
-    },
-    "WEBIDL": {
-        "authors": [
-            "Cameron McCormack",
-            "Jonas Sicking"
-        ],
-        "href": "http://heycam.github.io/webidl/",
-        "title": "Web IDL",
-        "publisher": "W3C"
-    }
-}
-
- -
-urlPrefix: http://dev.w3.org/2006/webapi/FileAPI/; type: dfn
-  text: blob
-  text: blob url store; url: #BlobURLStore
-urlPrefix: https://dom.spec.whatwg.org/; type: dfn
-  text: concept-attribute-value
-  text: concept-document
-  text: context object
-  text: dom-node-baseuri
-urlPrefix: https://encoding.spec.whatwg.org/; type: dfn
-  text: ascii whitespace
-  text: concept-encoding-get
-  text: decoder
-  text: encode
-  text: encoding
-  text: error
-  text: utf-8
-  text: utf-8 decode without bom
-  text: utf-8 encode
-  text: utf-8 decoder
-urlPrefix: https://html.spec.whatwg.org/multipage/
-  urlPrefix: comms.html; type: interface;
-    text: EventSource
-    text: HashChangeEvent
-  urlPrefix: browsers.html; type: dfn;
-    text: origin
-    text: unicode serialization of an origin
-  urlPrefix: infrastructure.html; type: dfn;
-    text: structured clone
-urlPrefix: http://heycam.github.io/webidl/#dfn-; type: dfn;
-  text: throw
-  text: value pairs to iterate over
-url: http://www.unicode.org/reports/tr46/#ToASCII; type: dfn; text: toascii
-url: http://www.unicode.org/reports/tr46/#ToUnicode; type: dfn; text: tounicode
+Boilerplate: omit references
 
From 150a84702097833aae68ac11a681a4f62c27e4d4 Mon Sep 17 00:00:00 2001 From: Dominique Hazael-Massieux Date: Mon, 26 Sep 2016 18:16:49 +0200 Subject: [PATCH 2/2] Update HTML from BS --- url.html | 4386 ++++++++++++++++++------------------------------------ 1 file changed, 1424 insertions(+), 2962 deletions(-) diff --git a/url.html b/url.html index aa95bed8..dcedc12d 100644 --- a/url.html +++ b/url.html @@ -1,2965 +1,1427 @@ - - + + + + URL - - - - -
-

W3C -

-

URL

-

W3C Working Draft, - 16 December 2014

-
This version:
http://www.w3.org/TR/2014/WD-url-1-20141216/
Latest version:
http://www.w3.org/TR/url-1/
Editor's Draft:
https://url.spec.whatwg.org/
Previous Versions:
http://www.w3.org/TR/2012/WD-url-20120524/
Editors:
Version History:
https://github.com/w3ctag/url/commits/develop
https://github.com/whatwg/url/commits @urlstandard
Participate:
file a bug (open bugs)
public-webapps@w3.org (archives)
whatwg@whatwg.org (archives)
IRC: #whatwg on Freenode
-
- -
-
- -

Abstract

-

The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API.

- -
- -

Status of this document

-

This section describes the status of this document at the time of its -publication. Other documents may supersede this document. A list of current -W3C publications and the latest revision of this technical report can be -found in the W3C technical reports index -at http://www.w3.org/TR/.

- -

This is a Working Draft of the URL -specification. - Please send comments to -public-webapps@w3.org -(archived) -with [url] at the start of the subject -line.

- -

This document is produced by the -Web Applications (WebApps) -Working Group. The WebApps Working Group is part of the -Rich Web Clients Activity -in the W3C Interaction -Domain.

- -

This document is governed by the 14 October 2005 W3C Process -Document.

- -

This document was produced by a group operating under the -5 February 2004 -W3C Patent Policy. -W3C maintains a -public list of any patent disclosures -made in connection with the deliverables of the group; that page also -includes instructions for disclosing a patent. An individual who has actual -knowledge of a patent which the individual believes contains -Essential -Claim(s) -must disclose the information in accordance with -section -6 of the W3C Patent Policy. -

- -

Publication as a Working Draft does not imply endorsement by the W3C -Membership. This is a draft document and may be updated, replaced or obsoleted -by other documents at any time. It is inappropriate to cite this document as -other than work in progress.

-
-
- -

Table of Contents

-
-
- - -

Goals

- -

The URL standard takes the following approach towards making URLs fully interoperable: - -

    -
  • Align RFC 3986 and RFC 3987 with contemporary implementations and - obsolete them in the process. (E.g. spaces, other "illegal" code points, - query encoding, equality, canonicalization, are all concepts not entirely - shared, or defined.) URL parsing needs to become as solid as HTML parsing. - [RFC3986] - [RFC3987] - -

  • Standardize on the term URL. URI and IRI are just confusing. In - practice a single algorithm is used for both so keeping them distinct is - not helping anyone. URL also easily wins the - search result popularity contest. - -

  • Supplanting Origin of a URI [sic]. - [RFC6454] - -

  • Define URL’s existing JavaScript API in full detail and add - enhancements to make it easier to work with. Add a new URL - object as well for URL manipulation without usage of HTML elements. (Useful - for JavaScript worker environments.) -

- -

As the editors learn more about the subject matter the goals -might increase in scope somewhat. - - - -

1. Terminology

- -

Some terms used in this specification are defined in the -DOM, Encoding, IDNA, and Web IDL Standards. -[DOM] -[ENCODING] -[IDNA] -[WEBIDL] - -

The ASCII digits are code points in the range U+0030 to U+0039. - - -

The ASCII hex digits are ASCII digits or are -code points in the range U+0041 to U+0046 or in the range U+0061 to U+0066. - -

The ASCII alpha are code points in the range U+0041 to U+005A -or in the range U+0061 to U+007A. - -

The ASCII alphanumeric are ASCII digits or -ASCII alpha. - - -

1.1. Parsers

- -

The EOF code point is a conceptual code point that signifies the end of a -string or code point stream. - -

A parse error indicates a non-fatal mismatch between input and requirements. -User agents are encouraged to expose parse errors -somehow. - -

Within a parser algorithm that uses a pointer variable, c -references the code point the pointer variable points to. - -

Within a string-based parser algorithm that uses a pointer variable, -remaining references the substring after pointer in the string -being processed. - -

If "mailto:username@example" is a string being -processed and pointer points to "@", -c is "@" and remaining is -"example". - - - -

2. Percent-encoded bytes

- -

A percent-encoded byte is "%", followed by -two ASCII hex digits. Sequences of -percent-encoded bytes, after -conversion to bytes, should not cause a -utf-8 decoder to run into any -errors. - -

To percent encode a byte into a -percent-encoded byte, return a string consisting of -"%", followed by a double-digit, uppercase, hexadecimal -representation of byte. - -

To percent decode a byte sequence input, run these steps: - -

Using anything but a utf-8 decoder -when the input contains bytes outside the range 0x00 to 0x7F might be -insecure and is not recommended. - -

    -
  1. Let output be an empty byte sequence. - -

  2. -

    For each byte byte in input, run these steps: - -

      -
    1. If byte is not `%`, append - byte to output. - -

    2. Otherwise, if byte is `%` and the next two - bytes after byte in input are not in the ranges - 0x30 to 0x39, 0x41 to 0x46, and 0x61 to 0x66, append byte to - output. - -

    3. -

      Otherwise, run these substeps: - -

        -
      1. Let bytePoint be the two bytes after byte in - input, - decoded, and - then interpreted as hexadecimal number. - - -

      2. Append a byte whose value is bytePoint to - output. - -

      3. Skip the next two bytes in input. -

      -
    - -
  3. Return output. -

- -

-

The simple encode set are all code points less than -U+0020 (i.e. excluding U+0020) and all code points greater than U+007E. - -

The default encode set is the -simple encode set and code points U+0020, -'"', -"#", -"<", -">", -"?", -and -"`". - -

The password encode set is the -default encode set and code points -"/", -"@", -and -"\". - -

The username encode set is the -password encode set and code point -":". - -

To utf-8 percent encode a code point, using -an encode set, run these steps: - -

    -
  1. If code point is not in - encode set, return code point. - -

  2. Let bytes be the result of running - utf-8 encode on - code point. - -

  3. Percent encode each byte in bytes, and - then return them concatenated, in the same order. -

- - - -

3. Hosts (domains and IP addresses)

- - - -

A host is a network address in the form of a -domain or an -IPv6 address. - -

A domain identifies a realm within a network. - -

An IPv6 address is a 128-bit identifier and -for the purposes of this specification represented as an ordered list of -eight 16-bit pieces. -[RFC4291] - - -

3.1. IDNA

- -

The domain to ASCII given a -domain domain, runs these steps: - -

    -
  1. Let result be the result of running - Unicode ToASCII with - domain_name set to domain, - UseSTD3ASCIIRules set to false, processing_option set to - Transitional_Processing, and VerifyDnsLength set to false. - -

  2. If result is a failure value, return failure. - -

  3. Return result. -

- -

The domain to Unicode given a -domain domain, runs these steps: - -

    -
  1. Let result be the result of running - Unicode ToUnicode with - domain_name set to domain, - UseSTD3ASCIIRules set to false. - -

  2. -

    Return result, ignoring any returned errors. - -

    User agents are encouraged to report errors through a developer console. -

- - -

3.2. Host writing

- -

A host must be either a -domain or "[", followed -by an IPv6 address, followed by -"]". - -

A domain is a valid domain if these steps return success: - -

    -
  1. Let result be the result of running - Unicode ToASCII with - domain_name set to domain, - UseSTD3ASCIIRules set to true, processing_option set to - Nontransitional_Processing, and VerifyDnsLength set to true. - -

  2. If result is a failure value, return failure. - -

  3. Set result to the result of running - Unicode ToUnicode with - domain_name set to result, - UseSTD3ASCIIRules set to true. - -

  4. If result contains any errors, return failure. - -

  5. Return success. -

- -

Ideally we define this in terms of a sequence of code points that make up a -valid domain rather than through a whack-a-mole: -bug 25334. - -

A domain must be a string that is a -valid domain. - -

An IPv6 address is defined in the -"Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture. -[RFC4291] -

- - -

3.3. Host parsing

- -

The host parser takes a string -input and optionally a Unicode flag, and then runs -these steps: - -

    -
  1. If input is the empty string, return failure. -

    - -
  2. -

    If input starts with "[", run these - substeps: - -

      -
    1. If input does not end with - "]", parse error, return failure. - -

    2. Return the result of - IPv6 parsing input - with its leading "[" and trailing - "]" removed. -

    - -
  3. Let domain be the result of - utf-8 decode without BOM on the - percent decoding of - utf-8 encode on input. - - -

  4. Let asciiDomain be the result of running - domain to ASCII on domain. - -

  5. If asciiDomain is failure, return failure. - -

  6. -

    If asciiDomain contains one of - U+0000, - U+0009, - U+000A, - U+000D, - U+0020, - "#", - "%", - "/", - ":", - "?", - "@", - "[", - "\", - and - "]", - return failure. - -

  7. Return asciiDomain if the Unicode flag is unset, - and the result of running domain to Unicode - on asciiDomain otherwise. -

- -

The IPv6 parser takes a string -input and then runs these steps: - -

    -
  1. Let address be a new - IPv6 address with its - 16-bit pieces initialized to 0. - -

  2. Let piece pointer be a pointer into - address’s - 16-bit pieces, initially zero - (pointing to the first 16-bit piece), - and let piece be the - 16-bit piece it points to. - -

  3. Let compress pointer be another pointer into - address’s 16-bit pieces, initially - null and pointing to nothing. - -

  4. Let pointer be a pointer into - input, initially zero (pointing to the first code point). - -

  5. -

    If c is ":", run these substeps: - -

      -
    1. If remaining does not start with - ":", parse error, return failure. - -

    2. Increase pointer by two. - -

    3. Increase piece pointer by one and then set - compress pointer to piece pointer. -

    - -
  6. -

    Main: - While c is not the EOF code point, run these - substeps: - -

      -
    1. If piece pointer is eight, - parse error, return failure. - -

    2. -

      If c is ":", run these inner - substeps: - -

        -
      1. If compress pointer is not null, - parse error, return failure. - -

      2. Increase pointer and piece pointer by one, set - compress pointer to piece pointer, - and then jump to Main. -
      - -
    3. Let value and length be 0. - -

    4. While length is less than 4 and - c is an - ASCII hex digit, set - value to - value × 0x10 + c interpreted as hexadecimal number, - and increase pointer and length by one. - -

    5. -

      Based on c: - -

      -
      "." -
      -

      If length is 0, parse error, - return failure. -

      Decrease pointer by length. -

      Jump to IPv4. - -

      ":" -
      -

      Increase pointer by one. -

      If c is the EOF code point, - parse error, return failure. - -

      Anything but the EOF code point -

      Parse error, return failure. -

      - -
    6. Set piece to value. - -

    7. Increase piece pointer by one. -

    - -
  7. If c is the EOF code point, jump to - Finale. - -

  8. IPv4: - If piece pointer is greater than six, - parse error, return failure. - -

  9. Let dots seen be 0. - -

  10. -

    While c is not the EOF code point, run - these substeps: - -

      -
    1. Let value be null. - -

    2. If c is not an ASCII digit, - parse error, return failure. - -

    3. -

      While c is an - ASCII digit, run these subsubsteps: - -

        -
      1. Let number be c interpreted as decimal number. - -

      2. -

        If value is null, set value to number. - -

        Otherwise, if value is 0, parse error, return failure. - -

        Otherwise, set value to value × 10 + number. - -

      3. Increase pointer by one. - -

      4. If value is greater than 255, parse error, - return failure. -

      - -
    4. If dots seen is less than 3 and - c is not a ".", - parse error, return failure. - -

    5. Set piece to - piece × 0x100 + value. - -

    6. If dots seen is 1 or 3, increase - piece pointer by one. - -

    7. Increase pointer by one. - -

    8. If dots seen is 3 and c is not - the EOF code point, - parse error, return failure. - -

    9. Increase dots seen by one. -

    - -
  11. -

    Finale: - If compress pointer is not null, run these substeps: - -

      -
    1. Let swaps be - piece pointercompress pointer. - -

    2. Set piece pointer to seven. - -

    3. While piece pointer is not zero and swaps is - greater than zero, swap piece with the - piece at pointer - compress pointer + swaps − 1, and then - decrease both piece pointer and swaps by one. -

    - -
  12. Otherwise, if compress pointer is null and - piece pointer is not eight, parse error, - return failure. - -

  13. Return address. -

- - -

3.4. Host serializing

- -

The host serializer takes null or a -host host and then runs -these steps: - -

    -
  1. If host is null, return the empty string. - -

  2. If host is an - IPv6 address, return - "[", followed by the result of running the - IPv6 serializer on host, - followed by "]". - -

  3. Otherwise, host is a domain, - return host. -

- -

The IPv6 serializer takes an -IPv6 address address and -then runs these steps: - -

    -
  1. Let output be the empty string. - -

  2. -

    Let compress pointer be a pointer to the first - 16-bit piece in the first longest - sequences of address’s - 16-bit pieces that are 0. - -

    In 0:f:0:0:f:f:0:0 it would point to - the second 0. - -

  3. If there is no sequence of address’s - 16-bit pieces that are 0 longer than - one, set compress pointer to null. - -

  4. -

    For each piece in address’s - pieces, run these substeps: - -

      -
    1. If compress pointer points to - piece, append "::" to - output if piece is - address’s first piece and append - ":" otherwise, and then run these substeps again with all - subsequent pieces in - address’s pieces - that are 0 skipped or go the next step in the overall set of steps if - that leaves no pieces. - -

    2. Append piece, represented as the shortest - possible lowercase hexadecimal number, to output. - -

    3. If piece is not - address’s last piece, - append ":" to output. -

    - -
  5. Return output. -

- -

This algorithm requires the recommendation from -A Recommendation for IPv6 Address Text Representation. -[RFC5952] - - - - - -

4. URLs

- - - -

A URL is a universal identifier. - -

A URL consists of components, namely a -scheme, -scheme data, -username, -password, -host, -port, -path, -query, and -fragment. - -

A URL’s scheme is -a string that identifies the type of URL and can be used to -dispatch a URL for further processing after -parsing. It is initially the empty string. - -

A URL’s -scheme data is a string holding the contents of a -URL. It is initially the empty string. - -

A URL’s -scheme data will be its initial value if its -scheme is a relative scheme, and -otherwise will be the only component without an initial value. - -

A URL’s username -is a string identifying a user. It is initially the empty string. - -

A URL’s password -is either null or a string identifying a user’s credentials. It is initially null. - -

A URL’s host is -either null or a host. It is initially null. - -

A URL’s port is a -string that identifies a networking port. It is initially the empty string. - -

A URL’s path is a -list of zero or more strings holding data, usually identifying a location in hierarchical -form. It is initially the empty list. - -

A URL’s query is -either null or a string holding data. It is initially null. - -

A URL’s fragment -is either null or a string holding data that can be used for further processing on the -resource the URL’s other components identify. -It is initially null. - -

A URL also has an associated relative flag. -It is initially unset. - -

The relative flag exists as checking if a -URL’s scheme is a -relative scheme can give incorrect results due to the -protocol attribute. - - -

A URL also has an associated -object that is either null or a -Blob. It is initially null. -[FILEAPI] - -

At this point this is used primarily to support "blob" -URLs, but others can be added going forward, hence "object". - - -

A relative scheme is a -scheme listed in the first column of -the following table. A default port is a -relative scheme’s optional corresponding -port and is listed in the second column -on the same row. - - -
scheme - port -
"ftp""21" -
"file" -
"gopher""70" -
"http""80" -
"https""443" -
"ws""80" -
"wss""443" -
- -

- -

A URL -includes credentials if either its -username is not the empty string or its -password is non-null. - - -

A URL can be designated as -base URL. - -

A base URL is useful for -the URL parser when the input is potentially a -relative URL. - - -

4.1. URL writing

- - - -

A URL must be written as either a -relative URL or an -absolute URL, optionally followed by -"#" and a -fragment. - -

An absolute URL must be a -scheme, followed by -":", followed by either a -scheme-relative URL, if -scheme is a relative scheme, or -scheme data otherwise, optionally followed -by "?" and a query. - -

A scheme must be one -ASCII alpha, followed by zero or more of -ASCII alphanumeric, "+", -"-", and ".". A -scheme must be registered -.... - -

The syntax of scheme data -depends on the scheme and is typically -defined alongside it. Standards must define -scheme data within the constraints of zero or -more URL units, excluding "?". - -

A relative URL must be either a -scheme-relative URL, an -absolute-path-relative URL, -or a path-relative URL that -does not start with a scheme and -":", optionally followed by a "?" and -a query. - -

At the point where a relative URL is -parsed, a -base URL must be in scope. - -

A scheme-relative URL must be -"//", optionally followed by -userinfo and "@", -followed by a host, optionally followed -by ":" and a port, -optionally followed by an -absolute-path-relative URL. - -

Userinfo must be a -username, optionally followed by a -":" and a -password. - -

A username must be zero or more -URL units, excluding "/", -":, "?", and "@". - - -

A password must be zero or more -URL units, excluding "/", -"?", and "@". - -

A port must be zero or more -ASCII digits. - -

An -absolute-path-relative URL -must be "/", followed by a -path-relative URL that does not -start with "/". - -

A path-relative URL must be zero or -more path segments separated from each -other by a "/". - -

A path segment must be zero or more URL units, -excluding "/" and "?". - -

A query must be zero or more -URL units. - -

A fragment must be zero or more -URL units. - -

The URL code points are ASCII alphanumeric, -"!", -"$", -"&", -"'", -"(", -")", -"*", -"+", -",", -"-", -".", -"/", -":", -";", -"=", -"?", -"@", -"_", -"~", -and code points in the ranges -U+00A0 to U+D7FF, -U+E000 to U+FDCF, -U+FDF0 to U+FFFD, -U+10000 to U+1FFFD, -U+20000 to U+2FFFD, -U+30000 to U+3FFFD, -U+40000 to U+4FFFD, -U+50000 to U+5FFFD, -U+60000 to U+6FFFD, -U+70000 to U+7FFFD, -U+80000 to U+8FFFD, -U+90000 to U+9FFFD, -U+A0000 to U+AFFFD, -U+B0000 to U+BFFFD, -U+C0000 to U+CFFFD, -U+D0000 to U+DFFFD, -U+E0000 to U+EFFFD, -U+F0000 to U+FFFFD, -U+100000 to U+10FFFD. - -

Code points higher than U+009F will be converted to -percent-encoded bytes by the -URL parser, except for code points appearing in -fragments. - -

The URL units are URL code points and -percent-encoded bytes. - - -

4.2. URL parsing

- -

Add the ability to halt on the first conformance error. - -

The URL parser takes a string -input, optionally with a -base URL base, and -optionally with an encoding -encoding override, and then runs these steps: - -

    -
  1. Let url be the result of running the - basic URL parser on input - with base, and encoding override as provided. - -

  2. If url is failure, return failure. - -

  3. If url’s scheme is not - "blob", return url. - -

  4. If url’s scheme data - is not in the blob URL store, return - url. [FILEAPI] - -

  5. Set url’s object to a - structured clone of the entry in the - blob URL store corresponding to - url’s scheme data. - [HTML] - -

  6. Return url. -

- -
- -

The basic URL parser takes a string -input, optionally with a -base URL base, -optionally with an encoding -encoding override, optionally with an -URL url and a state override -state override, and then runs these steps: - -

-

The encoding override argument is a legacy concept only relevant for - HTML. The url and state override arguments are only for - use by methods of objects implementing the URLUtils interface. - [HTML] - -

When the url and state override arguments are not - passed the basic URL parser returns either a - URL or failure. If they are passed the - algorithm simply modifies the passed url and can terminate without - returning anything. -

- -
    -
  1. -

    If url is not given: - -

      -
    1. Set url to a new URL. - -

    2. Remove any leading and trailing - ASCII whitespace from - input. -

    - -
  2. Let state be state override - if given, or scheme start state otherwise. - -

  3. If base is not given, set it to null. - -

  4. If encoding override is not given, set it to - utf-8. - -

  5. Let buffer be the empty string. - -

  6. Let the @ flag and the [] flag be - unset. - -

  7. Let pointer be a pointer to first code point in - input. - -

  8. -

    Keep running the following state machine by switching on state. If - after a run pointer points to the EOF code point, go to - the next step. Otherwise, increase pointer by one and continue with the - state machine. - -

    -
    scheme start state -
    -
      -
    1. If c is an ASCII alpha, - append c, lowercased, to buffer, and - set state to scheme state. - -

    2. Otherwise, if state override is not given, set - state to no scheme state, and decrease - pointer by one. - -

    3. Otherwise, parse error, terminate this algorithm. -

    - -
    scheme state -
    -
      -
    1. If c is an ASCII alphanumeric, - "+", "-", or - ".", append c, lowercased, to - buffer. - -

    2. -

      Otherwise, if c is ":", set - url’s scheme to - buffer, buffer to the empty string, - and then run these substeps: - -

        -
      1. If state override is given, - terminate this algorithm. - -

      2. If url’s - scheme is - a relative scheme, set url’s - relative flag. - -

      3. If url’s - scheme is - "file", set state to - relative state. - -

      4. Otherwise, if url’s - relative flag is set, base is not null - and base’s - scheme is equal to - url’s scheme, - set state to - relative or authority state. - -

      5. Otherwise, if url’s - relative flag is set, set state to - authority first slash state. - -

      6. Otherwise, set state to - scheme data state. -

      - -
    3. Otherwise, if state override is not given, set - buffer to the empty string, state to - no scheme state, and start over (from the first code point - in input). - -

    4. Otherwise, if c is the - EOF code point, terminate this algorithm. - - -

    5. Otherwise, parse error, terminate this algorithm. -

    - -
    scheme data state -
    -
      -
    1. If c is "?", set - url’s query - to the empty string and state to - query state. - -

    2. Otherwise, if c is "#", set - url’s fragment - to the empty string and state to - fragment state. - -

    3. -

      Otherwise, run these substeps: - -

        -
      1. If c is not the EOF code point, not a - URL code point, and not - "%", parse error. - -

      2. If c is "%" and remaining does - not start with two ASCII hex digits, parse error. - -

      3. If c is none of - EOF code point, U+0009, U+000A, and U+000D, - utf-8 percent encode c using the - simple encode set, and append the result to - url’s - scheme data. -

      -
    - -
    no scheme state -
    -

    If base is null, or base’s - scheme is not a - relative scheme, parse error, return failure. - -

    Due to the protocol attribute’s - ability to change base’s - scheme, base’s - relative flag is not used here. - -

    Otherwise, set state to relative state, - and decrease pointer by one. - -

    relative or authority state -
    -

    If c is "/" and - remaining starts with "/", set - state to authority ignore slashes state - and increase pointer by one. - -

    Otherwise, parse error, set state to - relative state and decrease pointer by - one. - -

    relative state -
    -

    Set url’s relative flag, set - url’s scheme to - base’s scheme if - url’s scheme is not - "file", and then, based on c: - -

    -
    EOF code point -
    -

    Set url’s host - to base’s host, - url’s port to - base’s port, - url’s path to - base’s path, and - url’s query to - base’s query. - -

    "/" -
    "\" -
    -
      -
    1. If c is "\", - parse error. -

    2. Set state to - relative slash state. -

    - -
    "?" -

    Set - url’s host to - base’s host, - url’s port to - base’s port, - url’s path to - base’s path, - url’s query to the empty string, - and state to query state. - -

    "#" -

    Set - url’s host to - base’s host, - url’s port to - base’s port, - url’s path to - base’s path, - url’s query to - base’s query, - url’s fragment to the empty string, - and state to fragment state. - -

    Otherwise -
    -
      -
    1. -

      If url’s scheme is not - "file", or c is not an - ASCII alpha, or remaining does not start with either - ":" or "|", or remaining - consists of one code point, or remaining’s second code point is - not one of "/", "\", "?", - and "#", then set - url’s host to - base’s host, - url’s port to - base’s port, - url’s path to - base’s path, and then remove - url’s path’s last entry. - - -

      This is a (platform-independent) Windows drive letter quirk. - When found at the start of a file URL it is treated as an - absolute path rather than one relative to - base’s path. - -

    2. Set state to relative path state, - and decrease pointer by one. -

    + + + + + + + + +
    +

    +

    URL

    +

    W3C Working Group Note,

    +
    +
    +
    This version: +
    https://www.w3.org/TR/2016/NOTE-url-1-20160926/ +
    Latest published version: +
    https://www.w3.org/TR/url-1/ +
    Editor's Draft: +
    https://url.spec.whatwg.org/ +
    Previous Versions: +
    http://www.w3.org/TR/2014/WD-url-1-20141209/ +
    Feedback: +
    public-webapps@w3.org with subject line “[url] … message topic …” (archives) +
    Editors: +
    (Mozilla) +
    (IBM)
    - -
    relative slash state -
    -

    If c is either "/" or - "\", run these steps: - -

      -
    1. If c is "\", - parse error. - -

    2. If url’s - scheme is - "file", set state to - file host state. - -

    3. Otherwise, set state to - authority ignore slashes state. -

    - -

    Otherwise, run these steps: - -

      -
    1. -

      If url’s scheme is not - "file", set - url’s host to - base’s host and - url’s port to - base’s port. - -

      file:/path/ will not inherit - base’s host. - -

    2. Set state to relative path state, - and decrease pointer by one. -

    - -
    authority first slash state -
    -

    If c is "/", set - state to authority second slash state. - -

    Otherwise, parse error, set state to - authority ignore slashes state, and decrease - pointer by one. - -

    authority second slash state -
    -

    If c is "/", set - state to authority ignore slashes state. - -

    Otherwise, parse error, set state to - authority ignore slashes state, and decrease - pointer by one. - -

    authority ignore slashes state -
    -

    If c is neither "/" nor - "\", set state to - authority state, and decrease pointer by one. - -

    Otherwise, parse error. - -

    authority state -
    -
      -
    1. -

      If c is "@", run these substeps: - -

        -
      1. If the @ flag is set, - parse error, prepend "%40" to - buffer. - -

      2. Set the @ flag. - -

      3. -

        For each code point in buffer, run these substeps: - -

          -
        1. If code point is U+0009, U+000A, or U+000D, - parse error, continue. - -

        2. If code point is not a - URL code point and not - "%", parse error. - -

        3. If code point is "%" and - remaining does not start with two - ASCII hex digits, parse error. - -

        4. If code point is ":" and - url’s - password is null, set - url’s password - to the empty string and continue. - -

        5. utf-8 percent encode code point using - the default encode set and append the result to - url’s password - if url’s password - is non-null, and to - url’s username - otherwise. -

        -
      4. Set buffer to the empty string. -

      - -
    2. Otherwise, if c is one of EOF code point, - "/", "\", "?", - and "#", decrease pointer by the - number of code points in buffer plus one, set - buffer to the empty string, and - state to host state. - -

    3. Otherwise, append c to buffer. -

    - -
    file host state -
    -
      -
    1. -

      If c is one of EOF code point, - "/", "\", "?", - and "#", decrease pointer by one, - and run these substeps: - -

        -
      1. -

        If buffer consists of two code points, of - which the first is an ASCII alpha and the second is - either ":" or "|", set - state to relative path state. - -

        This is a (platform-independent) Windows drive letter quirk. - buffer is not reset here and instead used in the - relative path state. - -

      2. Otherwise, if buffer is the empty string, set - state to relative path start state. - -

      3. -

        Otherwise, run these steps: - -

          -
        1. Let host be the result of - host parsing - buffer. - -

        2. If host is failure, return failure. - -

        3. Set - url’s host to - host, buffer to the empty string, - and state to relative path start state. -

        -
      - -
    2. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    3. Otherwise, append c to buffer. -

    - -
    host state -
    hostname state -
    -
      -
    1. -

      If c is ":" and the - [] flag is unset, run these substeps: - -

        -
      1. Let host be the result of - host parsing - buffer. - -

      2. If host is failure, return failure. - -

      3. Set url’s host to - host, buffer to the empty string, - and state to port state. - -

      4. If state override is hostname state, - terminate this algorithm. -

      - -
    2. -

      Otherwise, if c is the - EOF code point, "/", - "\", "?", or - "#", decrease pointer by one, and - run these substeps: - -

        -
      1. Let host be the result of - host parsing - buffer. - -

      2. If host is failure, return failure. - -

      3. Set url’s host to - host, buffer to the empty string, - and state to relative path start state. - -

      4. If state override is given, terminate this - algorithm. -

      - -
    3. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    4. -

      Otherwise, run these substeps: - -

        -
      1. If c is "[", set the - [] flag. - -

      2. If c is "]", unset the - [] flag. - -

      3. Append c to buffer. -

      -
    - -
    port state -
    -
      -
    1. If c is an ASCII digit, - append c to buffer. - -

    2. -

      Otherwise, if c is one of - EOF code point, "/", - "\", "?", and - "#", or state override is given, run - these substeps: -

        -
      1. -

        Remove leading U+0030 code points from buffer - until either the leading code point is not U+0030 or - buffer is one code point. - -

        - -
        InputOutput -
        "42""42" -
        "031""31" -
        "080""80" -
        "0000""0" -
        -
        - -
      2. If buffer is equal to - url’s scheme’s - default port, set buffer to the empty - string. - -

      3. Set url’s - port to buffer. - -

      4. If state override is given, terminate this - algorithm. - -

      5. Set buffer to the empty string, - state to relative path start state, and - decrease pointer by one. -

      - -
    3. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    4. Otherwise, parse error, return failure. -

    - -
    relative path start state -
    -
      -
    1. If c is "\", - parse error. - -

    2. Set state to relative path state - and if c is neither "/" nor - "\", decrease pointer by one. -

    - -
    relative path state -
    -
      -
    1. -

      If either c is one of - EOF code point, "/", and - "\", or state override is not given and - c is one of "?" and - "#", run these substeps: -

        -
      1. If c is "\", parse error. - -

      2. -

        If buffer, lowercased, matches any row in the first column of - the following table, set buffer to the contents of the cell in - the second column of the matched row: - - -
        "%2e" "." -
        ".%2e" ".." -
        "%2e." -
        "%2e%2e" -
        - -

      3. If buffer is "..", remove - url’s path’s last entry, if - any, and then if c is neither "/" nor - "\", append the empty string to url’s - path. - -

      4. Otherwise, if buffer is "." and - c is neither "/" nor "\", - append an empty string to - url’s path. - -

      5. -

        Otherwise, if buffer is not - ".", run these subsubsteps: - -

          -
        1. -

          If url’s scheme is - "file", url’s path - is empty, buffer consists of two - code points, of which the first is an ASCII alpha, - and the second is "|", replace the second code point in - buffer with ":". - -

          This is a (platform-independent) Windows drive letter quirk. - They are beautiful, no? - -

        2. Append buffer to - url’s path. -

        - -
      6. Set buffer to the empty string. - -

      7. If c is "?", set - url’s query to the empty string, - and state to query state. - -

      8. If c is "#", set - url’s fragment to the empty string, - and state to fragment state. -

      - -
    2. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    3. -

      Otherwise, run these steps: - -

        -
      1. If c is not a - URL code point and not "%", - parse error. - -

      2. If c is "%" and remaining does - not start with two ASCII hex digits, parse error. - -

      3. utf-8 percent encode c using the - default encode set, and append the result to - buffer. -

      -
    - -
    query state -
    -
      -
    1. -

      If c is the EOF code point, or - state override is not given and c - is "#", run these substeps: - -

        -
      1. If url’s relative flag is unset or - url’s scheme is either - "ws" or "wss", set - encoding override to utf-8. - - -

      2. Set buffer to the result of - encoding - buffer using encoding override. - -

      3. -

        For each byte in buffer run - these subsubsteps: - -

          -
        1. If byte is less than 0x21, greater than - 0x7E, or is one of 0x22, 0x23, 0x3C, 0x3E, and 0x60, append - byte, - percent encoded, to - url’s query. - -

        2. Otherwise, append a code point whose value is - byte to url’s - query. -

        - -
      4. Set buffer to the empty string. - -

      5. If c is "#", set - url’s - fragment to the empty string, - and state to fragment state. -

      - -
    2. Otherwise, if c is U+0009, U+000A, or U+000D, - parse error. - -

    3. -

      Otherwise, run these substeps: - -

        -
      1. If c is not a - URL code point and not "%", - parse error. - -

      2. If c is "%" and remaining does - not start with two ASCII hex digits, parse error. - -

      3. Append c to buffer. -

      -
    - -
    fragment state -
    -

    Based on c: -

    -
    EOF code point -

    Do nothing. - -

    U+0000 -
    U+0009 -
    U+000A -
    U+000D -

    Parse error. - -

    Otherwise -
    -
      -
    1. If c is not a URL code point and not - "%", parse error. - -

    2. If c is "%" and remaining does - not start with two ASCII hex digits, parse error. - -

    3. -

      Append c to url’s fragment. - -

      Unfortunately not using percent-encoding is intentional as - implementations with majority market share exhibit this behavior. - -

    -
    -
    - -
  9. Return url. -

- -
- -

To set the username given a url and -username, run these steps: - -

    -
  1. Set url’s username to the - empty string. - -

  2. For each code point in username, - utf-8 percent encode it using the username encode set, and - append the result to url’s - username. -

- -

To set the password given a url and -password, run these steps: - -

    -
  1. If password is the empty string, set url’s - password to null. - -

  2. -

    Otherwise, run these substeps: - -

      -
    1. Set url’s password to - the empty string. - -

    2. For each code point in password, - utf-8 percent encode it using the password encode set, and - append the result to url’s - password. -

    -
- - -

4.3. URL serializing

- -

The URL serializer takes a -URL url, -optionally an exclude fragment flag, and then runs these steps: - -

    -
  1. Let output be url’s - scheme and - ":" concatenated. - -

  2. -

    If url’s relative flag is set: - -

      -
    1. Append "//" to output. - -

    2. -

      If url’s - username is not the empty string - or url’s - password is non-null, run these - substeps: - -

        -
      1. Append url’s - username to - output. - -

      2. If url’s - password is non-null, append - ":" concatenated with url’s - password to - output. - -

      3. Append "@" to output. -

      - -
    3. Append url’s - host, - serialized, to - output. - -

    4. If url’s port - is not the empty string, append ":" concatenated with - url’s port to - output. - -

    5. Append "/" concatenated with the strings in - url’s path - (including empty strings), separated from each other by - "/" to output. -

    - -
  3. Otherwise, if url’s relative flag is - unset, append url’s - scheme data to - output. - -

  4. If url’s query is non-null, - append "?" concatenated with url’s - query to output. - -

  5. If the exclude fragment flag is unset and - url’s fragment is - non-null, append "#" concatenated with - url’s fragment to - output. - -

  6. Return output. -

- -

4.4. Origin

- - -

See origin’s definition in HTML for the -necessary background information. [HTML] - -

A URL’s origin is -the origin returned by running these steps, switching -on URL’s scheme: - -

-
"blob" -
-

Let url be the result of - parsing - URL’s - scheme data. - -

If url is failure, return a new globally unique identifier. - Otherwise, return url’s origin. - - -

The origin of - blob:https://whatwg.org/d0360e2f-caee-469f-9a2f-87d5b0456f6f is - the tuple - (https, whatwg.org, 443). - -

"ftp" -
"gopher" -
"http" -
"https" -
"ws" -
"wss" -

Return a tuple consisting of URL’s - scheme, its - host, and its default port if its - port is the empty string, and its - port otherwise. - -

"file" -

Unfortunate as it is, this is left as an exercise to the reader. When in doubt, - return a new globally unique identifier. - -

Otherwise -

Return a new globally unique identifier. -

- - -

5. application/x-www-form-urlencoded

- -

The application/x-www-form-urlencoded format is a simple way to -encode name-value pairs in a byte sequence where all bytes are in the 0x00 to 0x7F range. - -

While this description makes -application/x-www-form-urlencoded sound dated — and really, it is — the -format is in widespread use due to its prevalence of HTML forms. -[HTML] - -

5.1. application/x-www-form-urlencoded parsing

- -

The features provided by the -application/x-www-form-urlencoded parser -are mainly relevant for server-oriented implementations. A browser-based implementation -only needs what the -application/x-www-form-urlencoded string parser -requires. - -

The -application/x-www-form-urlencoded parser -takes a byte sequence input, optionally with an -encoding encoding override, -optionally with a use _charset_ flag, and optionally with an -isindex flag, and then runs these steps: - -

    -
  1. If encoding override is not given, set it to - utf-8. - -

  2. -

    If encoding override is not - utf-8 and input contains bytes - whose value is greater than 0x7F, return failure. - -

    This can only happen if input was not - generated through the serializer or - URLSearchParams. - -

  3. Let sequences be the result of splitting - input on `&`. - - -

  4. If the isindex flag is set and the first byte sequence in - sequences does not contain a `=`, prepend - `=` to the first byte sequence in sequences. - -

  5. Let pairs be an empty list of name-value pairs where both name - and value hold a byte sequence. - -

  6. -

    For each byte sequence bytes in sequences, - run these substeps: - -

      -
    1. If bytes is the empty byte sequence, run these substeps for the - next byte sequence. - -

    2. If bytes contains a `=`, then let - name be the bytes from the start of bytes up to but - excluding its first `=`, and let value be the - bytes, if any, after the first `=` up to the end of - bytes. If `=` is the first byte, then - name will be the empty byte sequence. If it is the last, then - value will be the empty byte sequence. - -

    3. Otherwise, let name have the value of bytes - and let value be the empty byte sequence. - -

    4. Replace any `+` in name and - value with 0x20. - -

    5. -

      If use _charset_ flag is set, name is - `_charset_`, run these substeps: - -

        -
      1. Let result be the result of - getting an encoding - for value, - decoded. - -

      2. If result is not failure, unset use _charset_ flag and - set encoding override to result. -

      - -
    6. Add a pair consisting of name and - value to pairs. -

    - -
  7. Let output be an empty list of name-value pairs where both name - and value hold a string. - -

  8. For each name-value pair in pairs, append a name-value pair to - output where the new name and value appended to output - are the result of running encoding override’s - decoder on the - percent decoding of the name and value from - pairs, respectively. - -

  9. Return output. -

- -

5.2. application/x-www-form-urlencoded serializing

- -

The -application/x-www-form-urlencoded byte serializer -takes a byte sequence input and then runs these steps: - -

    -
  1. Let output be the empty string. -

  2. -

    For each byte in input, depending on - byte: - -

    -
    0x20 -

    Append U+002B to output. - -

    0x2A -
    0x2D -
    0x2E -
    0x30 to 0x39 -
    0x41 to 0x5A -
    0x5F -
    0x61 to 0x7A -

    Append a code point whose value is byte to - output. - -

    Otherwise -

    Append byte, - percent encoded, to - output. -

    -
  3. Return output. -

- - -

The -application/x-www-form-urlencoded serializer -takes a list of name-value pairs pairs, optionally with an -encoding -encoding override, and then runs these steps: - -

    -
  1. If encoding override is not given, set it to - utf-8. - -

  2. Let output be the empty string. - -

  3. -

    For each pair in pairs, run - these substeps: - -

      -
    1. Let outputPair be a copy of pair. - -

    2. Replace outputPair’s name and value with the result of running - encode on them using - encoding override, respectively. - -

    3. Replace outputPair’s name and value with their - serialization. - -

    4. If pair is not the first pair in pairs, append - "&" to output. - -

    5. Append outputPair’s name, followed by "=", - followed by outputPair’s value to output. -

    - -
  4. Return output. -
- -

5.3. Hooks

- -

The -application/x-www-form-urlencoded string parser -takes a string input, -utf-8 encodes it, and then -returns the result of -application/x-www-form-urlencoded parsing -it. - - - -

6. API

- - - -
[Constructor(USVString url, optional USVString base = "about:blank"), Exposed=(Window,Worker)]
-interface URL {
-  static USVString domainToASCII(USVString domain);
-  static USVString domainToUnicode(USVString domain);
-};
-URL implements URLUtils;
-
-[NoInterfaceObject,
- Exposed=(Window,Worker)]
-interface URLUtils {
-  stringifier attribute USVString href;
-  readonly attribute USVString origin;
-
-           attribute USVString protocol;
-           attribute USVString username;
-           attribute USVString password;
-           attribute USVString host;
-           attribute USVString hostname;
-           attribute USVString port;
-           attribute USVString pathname;
-           attribute USVString search;
-           attribute URLSearchParams searchParams;
-           attribute USVString hash;
-};
-
-[NoInterfaceObject,
- Exposed=(Window,Worker)]
-interface URLUtilsReadOnly {
-  stringifier readonly attribute USVString href;
-  readonly attribute USVString origin;
-
-  readonly attribute USVString protocol;
-  readonly attribute USVString host;
-  readonly attribute USVString hostname;
-  readonly attribute USVString port;
-  readonly attribute USVString pathname;
-  readonly attribute USVString search;
-  readonly attribute USVString hash;
-};
- -

Except where different objects implementing URLUtilsReadOnly are identical -to objects implementing URLUtils. - -

Since all members are readonly and certain members from -URLUtils are not exposed a number of potential optimizations is possible -compared to objects implementing URLUtils. These are left as an exercise to -the reader. - -

- -

Specifications defining objects implementing URLUtils or -URLUtilsReadOnly must define a -get the base algorithm, which must return the -appropriate base URL for the object. - -

Specifications defining objects implementing URLUtils may -define update steps to make it possible for an -underlying string (such as an -attribute value) -to be updated. The update steps are passed a string -value for this purpose. - -

An object implementing URLUtils or URLUtilsReadOnly has an -associated input (a string), -query encoding -(an encoding), -query object -(a URLSearchParams object or null), and a -url (a URL or null). - -

Unless stated otherwise, query encoding is -utf-8 and -query object is null. The others follow -from the set the input algorithm.

- -

The associated -query encoding is a legacy -concept only relevant for HTML. -[HTML] - -

Specifications defining objects implementing URLUtils or -URLUtilsReadOnly must use the -set the input algorithms to set -input, url, and -query object. To -set the input given input -and optionally a url, run these steps: - -

    -
  1. If url is given, set url - to url and input to - input. - -

  2. -

    Otherwise, run these substeps: - -

      -
    1. Set url to null. - -

    2. If input is null, set - input to the empty string. - -

    3. -

      Otherwise, run these subsubsteps: - -

        -
      1. Set input to input. - -

      2. Let url be the result of running the - URL parser on - input with - base URL being the result of running - get the base and - query encoding as - encoding override. - - -

      3. If url is not failure, set - url to url. -

      -
    - -
  3. Let query be url’s - query if url - is non-null, and the empty string otherwise. - -

  4. If query object is null, set - query object to a - new URLSearchParams object - using query, and then append the - context object to - query object’s list of - url objects. - -

  5. Otherwise, set query object’s - list to the result of - parsing query. -

- -

To run the pre-update steps for an object implementing -URLUtils, optionally given a value, run these steps: - -

    -
  1. If value is not given, let value be the result - of serializing the associated - url. - - -

  2. Run the update steps with - value. -

- - -

6.1. Constructors

- -

The -URL(url, base) -constructor, when invoked, must run these steps: - -

    -
  1. Let parsedBase be the result of running the - basic URL parser on base. - -

  2. If parsedBase is failure, - throw a TypeError exception. - -

  3. Set parsedURL to the result of running the - basic URL parser on url - with parsedBase. - -

  4. If parsedURL is failure, - throw a TypeError exception. - -

  5. Let result be a new URL object. - -

  6. Let result’s - get the base return - parsedBase. - -

  7. -

    Run result’s - set the input given the empty string - and parsedURL. - -

    A URL object’s - input is never exposed. - -

  8. Return result. -

- -
-

To Basic URL parse a string into a - URL without using a - base URL, invoke the constructor with a single - argument: - -

var input = "https://example.org/💩",    url = new URL(input)
-url.pathname // "/%F0%9F%92%A9"
- -

Alternatively you can use the base URL of a - document through - baseURI: - -

var input = "/💩",    url = new URL(input, document.baseURI)
-url.href // "https://url.spec.whatwg.org/%F0%9F%92%A9"
-
- - -

6.2. URL statics

- -

The -domainToASCII(domain) -static method, when invoked, must run these steps: - -

    -
  1. Let asciiDomain be the result of - host parsing domain. - -

  2. If asciiDomain is an IPv6 address - or failure, return the empty string. - -

  3. Return asciiDomain. -

- -

The -domainToUnicode(domain) -static method, when invoked, must run these steps: - -

    -
  1. Let unicodeDomain be the result of - host parsing domain with the - Unicode flag set. - -

  2. If unicodeDomain is an - IPv6 address or failure, return the empty string. - -

  3. Return unicodeDomain. -

- -

Add domainToUI() which follows the UA conventions for when to use the Unicode -representation? - - -

6.3. URLUtils and URLUtilsReadOnly members

- -

The URLUtils and URLUtilsReadOnly interfaces are -not exposed on the global object. They are meant to augment other interfaces, such as -URL. - -

The href attribute’s getter must run -these steps: - -

    -
  1. If url is null, return - input. - -

  2. Return the serialization - of url. -

- -

The href attribute’s setter must run these steps: - -

    -
  1. Let input be the given value. - -

  2. -

    If the context object is a - URL object, run these substeps: - -

      -
    1. Let parsedURL be the result of running the - basic URL parser on input - with base URL being the result of running - get the base. - -

    2. If parsedURL is failure, - throw a TypeError exception. - -

    3. -

      Run set the input given the empty - string and parsedURL. - -

      A URL object’s - input is never exposed. -

    - -
  3. -

    Otherwise, run these substeps: - -

      -
    1. Run the set the input - algorithm for input. - -

    2. Run the pre-update steps with the input. -

    - -
    -

    This means that if the href attribute is set to - value that would cause the URL parser to return - failure, that value is still passed through unchanged. This is one of those unfortunate - legacy incidents. - -

    var a = document.createElement("a"),    input = "https://test:test/" // invalid port makes the parser return failure
    -a.href = test
    -a.href === test // true
    +
    +
    +
    + Obsoletion Notice +

    This specification is not being actively maintained, + and should not be used as a guide for implementations. + It may be revived in the future, + but for now should be considered obsolete.

    +

    If you have questions or comments on this specification, + please send an email to the editors.

    +
    +
    + +
    -
- -

The origin attribute’s getter must -run these steps: - -

    -
  1. If url is null, return the empty - string. - -

  2. Return the - Unicode serialization - of url’s origin. - [HTML] -

- -

It returns the Unicode rather than the ASCII serialization for -compatibility with HTML’s MessageEvent feature. -[HTML] - -

The protocol attribute’s getter -must run these steps: - -

    -
  1. If url is null, return - ":". - -

  2. Return scheme and - ":" concatenated. -

- -

The protocol attribute’s setter must -run these steps: - -

    -
  1. If url is null, terminate - these steps. - -

  2. Basic URL parse the given value and - ":" concatenated with - url as url and - scheme start state as state override. - -

  3. Run the pre-update steps. -

- -

The username attribute’s getter -must run these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. Return username. -

- -

The username attribute’s setter must -run these steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Set the username given url - and the given value. - -

  3. Run the pre-update steps. -

- -

The password attribute’s getter -must run these steps: - -

    -
  1. If url is null or its - password is null, return the empty - string. - -

  2. Return password. -

- -

The password attribute’s setter must -run these steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Set the password given url - and the given value. - -

  3. Run the pre-update steps. -

- -

The host attribute’s getter must run -these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. If port is the empty string, - return host, - serialized. - -

  3. Return host, - serialized, - ":", and port - concatenated. -

- -

The host attribute’s setter must run these -steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Basic URL parse the given value with - url as url and - host state as state override. - -

  3. Run the pre-update steps. -

- -

The hostname attribute’s getter -must run these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. Return host, - serialized. -

- -

The hostname attribute’s setter must -run these steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Basic URL parse the given value with - url as url and - hostname state as state override. - -

  3. Run the pre-update steps. -

- -

The port attribute’s getter must run -these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. Return port. -

- -

The port attribute’s setter must run these steps: - -

    -
  1. If url is null, its - relative flag is unset, or its - scheme is "file", - terminate these steps. - -

  2. Otherwise, Basic URL parse - the given value with url as url and - port state as state override. - -

  3. Run the pre-update steps. -

- -

The pathname attribute’s getter -must run these steps: - -

    -
  1. If url is null, return the - empty string. - -

  2. If the relative flag is unset, return - scheme data. - -

  3. Return "/" concatenated with the strings in - path (including empty strings), - separated from each other by "/". -

- -

The pathname attribute’s setter must -run these steps: - -

    -
  1. If url is null, or its - relative flag is unset, terminate these steps. - -

  2. Empty path. - -

  3. Basic URL parse the given value with - url as url and - relative path start state as state override. - -

  4. Run the pre-update steps. -

- -

The search attribute’s getter must -run these steps: - -

    -
  1. If url is null, or its - query is either null or - the empty string, return the empty string. - -

  2. Return "?" concatenated with - query. -

- -

The search attribute’s setter must run these steps: - -

    -
  1. If url is null, terminate these steps. - -

  2. If the given value is the empty string, set - query to null, empty - query object’s - list, run its - update steps, and terminate these - steps. - -

  3. Let input be the given value with a single leading - "?" removed, if any. - -

  4. Set query to the empty string. - -

  5. Basic URL parse input - with url as url, - query state as state override, and the - associated query encoding as - encoding override. - -

  6. Set query object’s - list to the result of - parsing input. - -

  7. Run query object’s - update steps. -

- -

The update steps of -query object are run to ensure all -url objects remain synchronized. - -

The searchParams attribute’s getter -must return the query object. - -

The searchParams attribute’s setter must run -these steps: - -

    -
  1. Let object be the given value. - -

  2. Remove the context object from - query object’s list of - url objects. - -

  3. Append the context object to - object’s list of - url objects. - -

  4. Set query object to - object. - -

  5. Set query to the - serialization of the - query object’s - list. - -

  6. Run the pre-update steps. -

- -

The hash attribute’s getter must run -these steps: - -

    -
  1. If url is null, or its - fragment is either null or - the empty string, return the empty string. - -

  2. Return "#" concatenated with - fragment. -

- -

The hash attribute’s setter must run these steps: - -

    -
  1. If url is null, or its - scheme is - "javascript", terminate these steps. - -

  2. If the given value is the empty string, set - fragment to null, run the - pre-update steps, and terminate these steps. - -

  3. Let input be the given value with a single leading - "#" removed, if any. - -

  4. Set fragment to - the empty string. - -

  5. Basic URL parse input - with url as url and - fragment state as state override. - -

  6. Run the pre-update steps. -

- - -

6.4. Interface URLSearchParams

- -
[Constructor(optional (USVString or URLSearchParams) init = ""), Exposed=(Window,Worker)]
-interface URLSearchParams {
-  void append(USVString name, USVString value);
-  void delete(USVString name);
-  USVString? get(USVString name);
-  sequence<USVString> getAll(USVString name);
-  boolean has(USVString name);
-  void set(USVString name, USVString value);
-  iterable<USVString, USVString>;
-  stringifier;
-};
- -

A URLSearchParams object has an associated -list of name-value pairs, which is initially -empty. - -

A URLSearchParams object has an associated list of zero or more -url objects, which is initially empty. - -

URLSearchParams objects always use -utf-8 as -encoding, despite the existence of -concepts such as -query encoding. This is to -encourage developers to migrate towards -utf-8, which they really ought to -have done a long time ago now. - -

To create a -new URLSearchParams object, optionally -using init, run these steps: - -

    -
  1. Let query be a new URLSearchParams object. - -

  2. If init is the empty string or null, return - query. - - -

  3. If init is a string, - set query’s list to the - result of parsing - init. - -

  4. If init is a URLSearchParams object, set - query’s list to a copy - of init’s list. - -

  5. Return query. -

- -

A URLSearchParams object’s -update steps are to run these steps for -each associated url object -urlObject, in order: - -

    -
  1. Set urlObject’s url’s - query to the - serialization of - URLSearchParams object’s - list. - -

  2. Run urlObject’s pre-update steps. -

- -

The -URLSearchParams(init) -constructor, when invoked, must return a -new URLSearchParams object -using init if given. - -

The -append(name, value) -method, when invoked, must run these steps: - -

    -
  1. Append a new name-value pair whose name is name and - value is value, to list. - -

  2. Run the update steps. -

- -

The -delete(name) -method, when invoked, must run these steps: - -

    -
  1. Remove all name-value pairs whose name is name from - list. - -

  2. Run the update steps. -

- -

The -get(name) -method, when invoked, must return the value of the first name-value pair whose name is -name in list, and null if -there is no such pair. - -

The -getAll(name) -method, when invoked, must return the values of all name-value pairs whose name is -name, in list, -in list order, and the empty sequence otherwise. - -

The -set(name, value) -method, when invoked, must run these steps: - -

    -
  1. If there are any name-value pairs whose name is name, in - list, set the value of the first such - name-value pair to value and remove the others. - -

  2. Otherwise, append a new name-value pair whose name is name and - value is value, to list. - -

  3. Run the update steps. -

- -

The -has(name) -method, when invoked, must return true if there is a name-value pair whose name is -name in list, and false -otherwise. - -

The value pairs to iterate over are the -list name-value pairs with the key being -the name and the value the value. - -

The stringification behavior must return the -serialization of the -URLSearchParams object’s -list. - - -

6.5. URL APIs elsewhere

- -

A standard that exposes URLs, should expose the -URL as a string (by serializing an internal -URL). A standard should not expose a URL using a -URL object. URL objects are -meant for URL manipulation. In IDL the USVString type should be used. - -

The higher-level notion here is that values are to be exposed as immutable -data structures. - -

If a standard decides to use a variant of the name "URL" for a feature it defines, it -should name such a feature "url" (i.e. lowercase and with an "l" at the end). Names such -as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" -(i.e. uppercase) is preferred, e.g. "newURL" and "oldURL". - -

The EventSource and -HashChangeEvent interfaces in HTML are examples of -proper naming. [HTML] - - - -

Acknowledgments

- -

There have been a lot of people that have helped make -URLs more interoperable over the years and -thereby furthered the goals of this standard. Likewise many people have helped making this -standard what it is today. - -

With that, many thanks to -Adam Barth, -Albert Wiersch, -Alexandre Morgaut, -Arkadiusz Michalski, -Behnam Esfahbod, -Bobby Holley, -Boris Zbarsky, -Brandon Ross, -Dan Appelquist, -Daniel Bratell, -David Håsäther, -David Sheets, -David Singer, -Erik Arvidsson, -Gavin Carothers, -Geoff Richards, -Glenn Maynard, -Henri Sivonen, -Ian Hickson, -James Graham, -James Manger, -James Ross, -Joshua Bell, -Kevin Grandon, -Larry Masinter, -Mark Davis, -Marcos Cáceres, -Martin Dürst, -Mathias Bynens, -Michael Peick, -Michael™ Smith, -Michel Suignard, -Peter Occil, -Rodney Rehm, -Roy Fielding, -Santiago M. Mola, -Simon Pieters, -Simon Sapin, -Tab Atkins, -Tantek Çelik, -Tim Berners-Lee, -Vyacheslav Matva, and -成瀬ゆい (Yui Naruse) -for being awesome! - -

This standard is written by -Anne van Kesteren -(Mozilla, -annevk@annevk.nl) -and -Sam Ruby -(IBM, -rubys@intertwingly.net). - -

The upstream draft at https://url.spec.whatwg.org/ is licensed under CC0. - - - - - -

-

-Conformance

- -

- Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. - The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” - in the normative parts of this document - are to be interpreted as described in RFC 2119. - However, for readability, - these words do not appear in all uppercase letters in this specification. - -

- All of the text of this specification is normative - except sections explicitly marked as non-normative, examples, and notes. [RFC2119] - -

- Examples in this specification are introduced with the words “for example” - or are set apart from the normative text with class="example", like this: - -

- This is an example of an informative example. -
- -

- Informative notes begin with the word “Note” - and are set apart from the normative text with class="note", like this: - -

- Note, this is an informative note. - - - -

References

Normative References

[DOM]
Anne van Kesteren; Aryeh Gregor; Ms2ger. DOM. URL: https://dom.spec.whatwg.org/
[ENCODING]
Anne van Kesteren. Encoding. URL: https://encoding.spec.whatwg.org/
[FILEAPI]
Arun Ranganathan; Jonas Sicking. File API. URL: http://dev.w3.org/2006/webapi/FileAPI/
[HTML]
Ian Hickson. HTML. Living Standard. URL: https://html.spec.whatwg.org/
[IDNA]
Mark Davis; Michel Suignard. Unicode IDNA Compatibility Processing. URL: http://www.unicode.org/reports/tr46/
[WEBIDL]
Cameron McCormack; Jonas Sicking. Web IDL. URL: http://heycam.github.io/webidl/
[rfc2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[rfc4291]
R. Hinden; S. Deering. IP Version 6 Addressing Architecture. February 2006. Draft Standard. URL: https://tools.ietf.org/html/rfc4291

Informative References

[rfc3986]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet Standard. URL: https://tools.ietf.org/html/rfc3986
[rfc3987]
M. Duerst; M. Suignard. Internationalized Resource Identifiers (IRIs). January 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc3987
[rfc5952]
S. Kawamura; M. Kawashima. A Recommendation for IPv6 Address Text Representation. August 2010. Proposed Standard. URL: https://tools.ietf.org/html/rfc5952
[rfc6454]
A. Barth. The Web Origin Concept. December 2011. Proposed Standard. URL: https://tools.ietf.org/html/rfc6454

Index

IDL Index

[Constructor(USVString url, optional USVString base = "about:blank"), Exposed=(Window,Worker)]
-interface URL {
-  static USVString domainToASCII(USVString domain);
-  static USVString domainToUnicode(USVString domain);
-};
-URL implements URLUtils;
-
-[NoInterfaceObject,
- Exposed=(Window,Worker)]
-interface URLUtils {
-  stringifier attribute USVString href;
-  readonly attribute USVString origin;
-
-           attribute USVString protocol;
-           attribute USVString username;
-           attribute USVString password;
-           attribute USVString host;
-           attribute USVString hostname;
-           attribute USVString port;
-           attribute USVString pathname;
-           attribute USVString search;
-           attribute URLSearchParams searchParams;
-           attribute USVString hash;
-};
-
-[NoInterfaceObject,
- Exposed=(Window,Worker)]
-interface URLUtilsReadOnly {
-  stringifier readonly attribute USVString href;
-  readonly attribute USVString origin;
-
-  readonly attribute USVString protocol;
-  readonly attribute USVString host;
-  readonly attribute USVString hostname;
-  readonly attribute USVString port;
-  readonly attribute USVString pathname;
-  readonly attribute USVString search;
-  readonly attribute USVString hash;
-};
-[Constructor(optional (USVString or URLSearchParams) init = ""), Exposed=(Window,Worker)]
-interface URLSearchParams {
-  void append(USVString name, USVString value);
-  void delete(USVString name);
-  USVString? get(USVString name);
-  sequence<USVString> getAll(USVString name);
-  boolean has(USVString name);
-  void set(USVString name, USVString value);
-  iterable<USVString, USVString>;
-  stringifier;
-};
-
\ No newline at end of file +

Abstract

+
+

The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API.

+
+

Status of this document

+
+

+

This document is no longer maintained. Please refer to the URL Living Standard for the latest version of this specification.

+
+
+ +
+
+

Conformance

+

Document conventions

+

Conformance requirements are expressed with a combination of + descriptive assertions and RFC 2119 terminology. The key words “MUST”, + “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, + “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this + document are to be interpreted as described in RFC 2119. + However, for readability, these words do not appear in all uppercase + letters in this specification.

+

All of the text of this specification is normative except sections + explicitly marked as non-normative, examples, and notes. [RFC2119]

+

Examples in this specification are introduced with the words “for example” + or are set apart from the normative text with class="example", + like this:

+
+ +

This is an example of an informative example.

+
+

Informative notes begin with the word “Note” and are set apart from the + normative text with class="note", like this:

+

Note, this is an informative note.

+

Conformant Algorithms

+

Requirements phrased in the imperative as part of algorithms (such as + "strip any leading space characters" or "return false and abort these + steps") are to be interpreted with the meaning of the key word ("must", + "should", "may", etc) used in introducing the algorithm.

+

Conformance requirements phrased as algorithms or specific steps can be + implemented in any manner, so long as the end result is equivalent. In + particular, the algorithms defined in this specification are intended to + be easy to understand and are not intended to be performant. Implementers + are encouraged to optimize.

+
+ +

Index

+

Terms defined by this specification

+ \ No newline at end of file