Refine JSON serialization to use UTF-8 encoding for `user.id` and `userHandle` #2013

MasterKale · 2024-01-10T05:56:56Z

Proposed Change

The JSON serialization logic we added in L3 consistently uses base64url encoding for any value that's an ArrayBuffer in the browser. However, in my experience it makes for easier debugging during authentication when UTF-8 encoding is used specifically for user.id and userHandle as it helps developers immediately see the user ID in a recognizable format.

For example, consider an RP that generates its own string identifiers for users:

const userID = 'USER2ML8P7C08R';

An RP dev may naively specify this value for user.id when calling parseCreationOptionsFromJSON() believing, "it's non-PII so why not?"

var opts = PublicKeyCredential.parseCreationOptionsFromJSON({
  // ...
  user: { id: userID, name: 'TestUser', displayName: 'TestUser' },
});

const credential = await navigator.credentials.get({ publicKey: opts });

If the RP commits to the spec's JSON serialization methods during a subsequent authentication then they'll experience an unintuitive footgun:

const opts = PublicKeyCredential.parseRequestOptionsFromJSON({ ... });

const credential = await navigator.credentials.get({ publicKey: opts });
const credentialJSON = credential.toJSON();

console.log(userID);                                         // USER2ML8P7C08R
console.log(credentialJSON.response.userHandle)              // USER2ML8P7C08Q
console.log(userID === credentialJSON.response.userHandle);  // false

The user ID gets munged! The userHandle becomes unusable, and the dev falls back to using credential ID to determine which user should be logged in (which is a bad idea as @sbweeden re-confirmed recently in #1909 (comment))

In my opinion (gleaned through practical experience) if userHandle is the same then it's easy to pull that value out of the front end and cross-reference it when pulling database records, query logging for the value, use and compare the value programmatically in other areas of the product...currently RP devs (that correctly base64url-encoded the UTF-8 bytes in "USER2ML8P7C08R" as per the current text in L3 and specify user.id as "VVNFUjJNTDhQN0MwOFI" instead) have to base64url-decode userHandle to bytes and then UTF-8 encode those bytes to get back to "USER2ML8P7C08R" before continuing with their troubleshooting.

I therefore assert that it is more useful to RP devs for userHandle out of a call to toJSON() to be the same as the user.id string passed into parseCreationOptionsFromJSON(). We can remove this footgun by updating the JSON serialization logic so that the user.id argument is allowed to be any UTF-8 string when calling parseCreationOptionsFromJSON(), and that UTF-8 encoding is used to serialize userHandle when toJSON() is called.

Potential Impact

While trying to find a browser I could run sample code in to put this all together I noticed that only Firefox currently has fully implemented the JSON serialization methods. Chrome currently only supports parseCreationOptionsFromJSON() and toJSON(), while Safari doesn't currently support any of the methods.

I'm hoping this means there's still a chance to discuss updating this logic, as it would mean breaking any current use of these methods.

Footgun Repro

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
  </head>
  <body>
    <button id="startReg">Step 1: Call .create()</button>
    <button id="startAuth">Step 2: Call .get()</button>
    <script>
      /**
       * `user.id` and `userHandle` will not match with this human-readable value
       */
      const userID = "USER2ML8P7C08R";
      /**
       * This `userID` is the base64url-encoded UTF-8 bytes in the string above. `user.id` and
       * `userHandle` will match when this value is used instead.
       */
      // const userID = "VVNFUjJNTDhQN0MwOFI";

      // Values we'll use during auth
      let credIdBase64URL = undefined;
      let credTransports = [];

      document.getElementById("startReg").addEventListener("click", async () => {
        const opts = PublicKeyCredential.parseCreationOptionsFromJSON({
          user: { id: userID, name: "TestUser", displayName: "TestUser"},
          rp: { name: "localhost" },
          challenge: "AAAA",
          pubKeyCredParams: [ { type: "public-key", alg: -7 }, { type: "public-key", alg: -257 }],
        });

        const credential = await navigator.credentials.create({ publicKey: opts });
        const credentialJSON = credential.toJSON();

        credIdBase64URL = credentialJSON.id;
        credTransports = credentialJSON.response.transports;
      });

      document.getElementById("startAuth").addEventListener("click", async () => {
        const opts = PublicKeyCredential.parseRequestOptionsFromJSON({
          challenge: "AAAA",
          allowCredentials: [{ id: credIdBase64URL, transports: credTransports, type: "public-key" }],
        });

        const credential = await navigator.credentials.get({ publicKey: opts });
        const credentialJSON = credential.toJSON();

        console.log("user.id:", userID);
        console.log("userHandle:", credentialJSON.response.userHandle);
        console.log(userID === credentialJSON.response.userHandle);
      });
    </script>
  </body>
</html>

The text was updated successfully, but these errors were encountered:

MasterKale · 2024-01-10T06:00:28Z

A side effect of this change might be that RP's feel empowered to use PII-containing identifiers here since they're already strings. However I think we could easily enough suggest base64url-encoding 64 random bytes and then using that string as user.id when calling parseCreationOptionsFromJSON() and the user privacy considerations in https://w3c.github.io/webauthn/#sctn-user-handle-privacy are maintained.

emlun · 2024-01-10T11:00:35Z

I don't think a minor convenience improvement is worth breaking the pattern and making user handles behave differently from all other binary values. In my own software development experience, exceptions (AKA broken invariants) are the root of (almost) all evil. I don't think this exception seems valuable enough to defy that principle.

Also in my own experience developing apps around WebAuthn, I find it convenient enough to just use the base64 encoding of the user handle as the canonical representation of it - not least because strings in JavaScript can be compared using the === operator while Uint8Arrays cannot. I don't think it needs special treatment just in case its binary form also happens to be human-readable.

arianvp · 2024-01-10T12:30:12Z

The spec is pretty clear that userId should not be a string identifier but a random array buffer. Those will almost never be valid UTF-8 strings. Encoding as UTF-8 seems incompatible with what the spec wants? Or do we also want to change that?

the Relying Party MUST NOT include personally identifying information, e.g., e-mail addresses or usernames, in the user handle. This includes hash values of personally identifying information, unless the hash function is salted with salt values private to the Relying Party, since hashing does not prevent probing for guessable input values. It is RECOMMENDED to let the user handle be 64 random bytes, and store this value in the user account.

kreichgauer · 2024-01-10T18:26:39Z

It took me a minute to understand what's going on here. Basically the chosen value for userID (USER2ML8P7C08R) is 14 base64 characters (sextets) long. Each group of 4 base64 sextets decodes to 3 octets/bytes, but 14 is not a multiple of 4. And so the final two sextets ('8R') decode to one octet, with remaining 4 bits remaining as padding. The base64 spec requires that any encoder must set these padding bits to zero (yielding 'Q' instead of 'R' for the final sextet); but decoders can choose to ignore or reject the non-zero padding bits. (See RFC 4648, 3.5)

So if you pass the userID string through Go's encoding/base64 in strict mode e.g., it will reject the string. Presumably, browser base64 decoders are more forgiving and accept the non-zero padding bits, but then when the re-encode the user handle bytes, they follow the spec and correctly produce 'Q' as the last sextet.

I agree with the other comments here. Consistency seems more important, and we shouldn't handle encoding of these individual fields differently.

(I will look into whether we can make Chromium's base64 decoder stricter for the JSON parsing methods.)

arianvp · 2024-01-10T18:36:24Z

You need to use base64.RawURLEncoding . That's the one that discards padding

kreichgauer · 2024-01-10T18:40:50Z

The issue is with padding bits within a final sextet. RawURLEncoding I think controls the '=' padding character to pad the string into multiples of 4.

kreichgauer · 2024-01-10T19:34:15Z

(Sorry, I mixed up some things in my original response above. Updated for correctness.)

MasterKale · 2024-01-10T21:13:28Z

Alright, this isn't necessarily a hill I want to die on, but I figured I'd give it a shot. I know WebAuthn has recommended user.id being 64 random bytes for a long time now, I just thought there might be a developer experience win in here for RP's that have effectively random identifiers already in play.

Thanks for humoring me ✌️

emlun · 2024-01-11T10:11:31Z

Yeah, I do agree that this could be useful. Many applications I've worked with use UUIDv4s as primary account identifiers and just use the UTF-8 encoding of those as the user handle. It is sometimes a bit of a hassle to have to juggle the various representations of that, but ultimately I think that introducing a special case for user handles in JSON I/O would replace a small problem with a much bigger one.

lgarron · 2024-01-16T00:04:28Z

I agree with @emlun. The spec treats the user handle as a binary string (rather than an encoded text string) in various ways, and I think it would be a liability to change that.

Further, supporting UTF-8 would imply that it is reasonable to encode something more structured or even user-provided instead of random. This could be sufficiently secure, but the use of an opaque binary string encourages a secure implementation by default.

MasterKale added the type:technical label Jan 10, 2024

MasterKale closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine JSON serialization to use UTF-8 encoding for `user.id` and `userHandle` #2013

Refine JSON serialization to use UTF-8 encoding for `user.id` and `userHandle` #2013

MasterKale commented Jan 10, 2024

MasterKale commented Jan 10, 2024

emlun commented Jan 10, 2024

arianvp commented Jan 10, 2024

kreichgauer commented Jan 10, 2024 •

edited

arianvp commented Jan 10, 2024

kreichgauer commented Jan 10, 2024 •

edited

kreichgauer commented Jan 10, 2024

MasterKale commented Jan 10, 2024 •

edited

emlun commented Jan 11, 2024

lgarron commented Jan 16, 2024 •

edited

Refine JSON serialization to use UTF-8 encoding for user.id and userHandle #2013

Refine JSON serialization to use UTF-8 encoding for user.id and userHandle #2013

Comments

MasterKale commented Jan 10, 2024

Proposed Change

Potential Impact

Footgun Repro

MasterKale commented Jan 10, 2024

emlun commented Jan 10, 2024

arianvp commented Jan 10, 2024

kreichgauer commented Jan 10, 2024 • edited

arianvp commented Jan 10, 2024

kreichgauer commented Jan 10, 2024 • edited

kreichgauer commented Jan 10, 2024

MasterKale commented Jan 10, 2024 • edited

emlun commented Jan 11, 2024

lgarron commented Jan 16, 2024 • edited

Refine JSON serialization to use UTF-8 encoding for `user.id` and `userHandle` #2013

Refine JSON serialization to use UTF-8 encoding for `user.id` and `userHandle` #2013

kreichgauer commented Jan 10, 2024 •

edited

kreichgauer commented Jan 10, 2024 •

edited

MasterKale commented Jan 10, 2024 •

edited

lgarron commented Jan 16, 2024 •

edited