Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine JSON serialization to use UTF-8 encoding for user.id and userHandle #2013

Closed
MasterKale opened this issue Jan 10, 2024 · 10 comments
Closed

Comments

@MasterKale
Copy link
Contributor

Proposed Change

The JSON serialization logic we added in L3 consistently uses base64url encoding for any value that's an ArrayBuffer in the browser. However, in my experience it makes for easier debugging during authentication when UTF-8 encoding is used specifically for user.id and userHandle as it helps developers immediately see the user ID in a recognizable format.

For example, consider an RP that generates its own string identifiers for users:

const userID = 'USER2ML8P7C08R';

An RP dev may naively specify this value for user.id when calling parseCreationOptionsFromJSON() believing, "it's non-PII so why not?"

var opts = PublicKeyCredential.parseCreationOptionsFromJSON({
  // ...
  user: { id: userID, name: 'TestUser', displayName: 'TestUser' },
});

const credential = await navigator.credentials.get({ publicKey: opts });

If the RP commits to the spec's JSON serialization methods during a subsequent authentication then they'll experience an unintuitive footgun:

const opts = PublicKeyCredential.parseRequestOptionsFromJSON({ ... });

const credential = await navigator.credentials.get({ publicKey: opts });
const credentialJSON = credential.toJSON();

console.log(userID);                                         // USER2ML8P7C08R
console.log(credentialJSON.response.userHandle)              // USER2ML8P7C08Q
console.log(userID === credentialJSON.response.userHandle);  // false

The user ID gets munged! The userHandle becomes unusable, and the dev falls back to using credential ID to determine which user should be logged in (which is a bad idea as @sbweeden re-confirmed recently in #1909 (comment))

In my opinion (gleaned through practical experience) if userHandle is the same then it's easy to pull that value out of the front end and cross-reference it when pulling database records, query logging for the value, use and compare the value programmatically in other areas of the product...currently RP devs (that correctly base64url-encoded the UTF-8 bytes in "USER2ML8P7C08R" as per the current text in L3 and specify user.id as "VVNFUjJNTDhQN0MwOFI" instead) have to base64url-decode userHandle to bytes and then UTF-8 encode those bytes to get back to "USER2ML8P7C08R" before continuing with their troubleshooting.

I therefore assert that it is more useful to RP devs for userHandle out of a call to toJSON() to be the same as the user.id string passed into parseCreationOptionsFromJSON(). We can remove this footgun by updating the JSON serialization logic so that the user.id argument is allowed to be any UTF-8 string when calling parseCreationOptionsFromJSON(), and that UTF-8 encoding is used to serialize userHandle when toJSON() is called.

Potential Impact

While trying to find a browser I could run sample code in to put this all together I noticed that only Firefox currently has fully implemented the JSON serialization methods. Chrome currently only supports parseCreationOptionsFromJSON() and toJSON(), while Safari doesn't currently support any of the methods.

I'm hoping this means there's still a chance to discuss updating this logic, as it would mean breaking any current use of these methods.

Footgun Repro

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
  </head>
  <body>
    <button id="startReg">Step 1: Call .create()</button>
    <button id="startAuth">Step 2: Call .get()</button>
    <script>
      /**
       * `user.id` and `userHandle` will not match with this human-readable value
       */
      const userID = "USER2ML8P7C08R";
      /**
       * This `userID` is the base64url-encoded UTF-8 bytes in the string above. `user.id` and
       * `userHandle` will match when this value is used instead.
       */
      // const userID = "VVNFUjJNTDhQN0MwOFI";

      // Values we'll use during auth
      let credIdBase64URL = undefined;
      let credTransports = [];

      document.getElementById("startReg").addEventListener("click", async () => {
        const opts = PublicKeyCredential.parseCreationOptionsFromJSON({
          user: { id: userID, name: "TestUser", displayName: "TestUser"},
          rp: { name: "localhost" },
          challenge: "AAAA",
          pubKeyCredParams: [ { type: "public-key", alg: -7 }, { type: "public-key", alg: -257 }],
        });

        const credential = await navigator.credentials.create({ publicKey: opts });
        const credentialJSON = credential.toJSON();

        credIdBase64URL = credentialJSON.id;
        credTransports = credentialJSON.response.transports;
      });

      document.getElementById("startAuth").addEventListener("click", async () => {
        const opts = PublicKeyCredential.parseRequestOptionsFromJSON({
          challenge: "AAAA",
          allowCredentials: [{ id: credIdBase64URL, transports: credTransports, type: "public-key" }],
        });

        const credential = await navigator.credentials.get({ publicKey: opts });
        const credentialJSON = credential.toJSON();

        console.log("user.id:", userID);
        console.log("userHandle:", credentialJSON.response.userHandle);
        console.log(userID === credentialJSON.response.userHandle);
      });
    </script>
  </body>
</html>
@MasterKale
Copy link
Contributor Author

A side effect of this change might be that RP's feel empowered to use PII-containing identifiers here since they're already strings. However I think we could easily enough suggest base64url-encoding 64 random bytes and then using that string as user.id when calling parseCreationOptionsFromJSON() and the user privacy considerations in https://w3c.github.io/webauthn/#sctn-user-handle-privacy are maintained.

@emlun
Copy link
Member

emlun commented Jan 10, 2024

I don't think a minor convenience improvement is worth breaking the pattern and making user handles behave differently from all other binary values. In my own software development experience, exceptions (AKA broken invariants) are the root of (almost) all evil. I don't think this exception seems valuable enough to defy that principle.

Also in my own experience developing apps around WebAuthn, I find it convenient enough to just use the base64 encoding of the user handle as the canonical representation of it - not least because strings in JavaScript can be compared using the === operator while Uint8Arrays cannot. I don't think it needs special treatment just in case its binary form also happens to be human-readable.

@arianvp
Copy link

arianvp commented Jan 10, 2024

The spec is pretty clear that userId should not be a string identifier but a random array buffer. Those will almost never be valid UTF-8 strings. Encoding as UTF-8 seems incompatible with what the spec wants? Or do we also want to change that?

the Relying Party MUST NOT include personally identifying information, e.g., e-mail addresses or usernames, in the user handle. This includes hash values of personally identifying information, unless the hash function is salted with salt values private to the Relying Party, since hashing does not prevent probing for guessable input values. It is RECOMMENDED to let the user handle be 64 random bytes, and store this value in the user account.

@kreichgauer
Copy link
Contributor

kreichgauer commented Jan 10, 2024

It took me a minute to understand what's going on here. Basically the chosen value for userID (USER2ML8P7C08R) is 14 base64 characters (sextets) long. Each group of 4 base64 sextets decodes to 3 octets/bytes, but 14 is not a multiple of 4. And so the final two sextets ('8R') decode to one octet, with remaining 4 bits remaining as padding. The base64 spec requires that any encoder must set these padding bits to zero (yielding 'Q' instead of 'R' for the final sextet); but decoders can choose to ignore or reject the non-zero padding bits. (See RFC 4648, 3.5)

So if you pass the userID string through Go's encoding/base64 in strict mode e.g., it will reject the string. Presumably, browser base64 decoders are more forgiving and accept the non-zero padding bits, but then when the re-encode the user handle bytes, they follow the spec and correctly produce 'Q' as the last sextet.

I agree with the other comments here. Consistency seems more important, and we shouldn't handle encoding of these individual fields differently.

(I will look into whether we can make Chromium's base64 decoder stricter for the JSON parsing methods.)

@arianvp
Copy link

arianvp commented Jan 10, 2024

You need to use base64.RawURLEncoding . That's the one that discards padding

@kreichgauer
Copy link
Contributor

kreichgauer commented Jan 10, 2024

The issue is with padding bits within a final sextet. RawURLEncoding I think controls the '=' padding character to pad the string into multiples of 4.

@kreichgauer
Copy link
Contributor

(Sorry, I mixed up some things in my original response above. Updated for correctness.)

@MasterKale
Copy link
Contributor Author

MasterKale commented Jan 10, 2024

Alright, this isn't necessarily a hill I want to die on, but I figured I'd give it a shot. I know WebAuthn has recommended user.id being 64 random bytes for a long time now, I just thought there might be a developer experience win in here for RP's that have effectively random identifiers already in play.

Thanks for humoring me ✌️

@emlun
Copy link
Member

emlun commented Jan 11, 2024

Yeah, I do agree that this could be useful. Many applications I've worked with use UUIDv4s as primary account identifiers and just use the UTF-8 encoding of those as the user handle. It is sometimes a bit of a hassle to have to juggle the various representations of that, but ultimately I think that introducing a special case for user handles in JSON I/O would replace a small problem with a much bigger one.

@lgarron
Copy link
Contributor

lgarron commented Jan 16, 2024

I agree with @emlun. The spec treats the user handle as a binary string (rather than an encoded text string) in various ways, and I think it would be a liability to change that.

Further, supporting UTF-8 would imply that it is reasonable to encode something more structured or even user-provided instead of random. This could be sufficiently secure, but the use of an opaque binary string encourages a secure implementation by default.

@MasterKale MasterKale closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants