Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of Multibase, Multicodec, and Multihash #3

Open
msporny opened this issue Aug 17, 2018 · 11 comments
Open

Use of Multibase, Multicodec, and Multihash #3

msporny opened this issue Aug 17, 2018 · 11 comments
Assignees

Comments

@msporny
Copy link

msporny commented Aug 17, 2018

Should we use multibase, multicodec, and multihash for the expression of cryptographic identifiers? We're interested in doing this because it might align the IPFS, Bitcoin, Ethereum, Veres One, and Sovrin communities.

There are three decisions we need to make if we are going to do this:

The first decision is whether or not to use multibase. Multibase enables us to easily upgrade base encodings if we need to (or use different base encodings for the same key material). I think this is a fairly easy decision and we should do this.

The second decision is whether or not to use multicodec. Using multicodec enables us to identify public key material as either a multihash (e.g. for RSA keys) or raw bytes (e.g. for ed25519 public keys). I think we should do this, but with some reservations which I'll get to at the end of this post.

The third decision is whether or not to use multihash. If we want to express hashes for public keys, I think we should do this, but again, with some reservations, which I'll get to at the end of this post.

Expressing a base58btc encoded ed25519 cryptographic identifier would look like this:

0x7a 0xed01 ED25519_PUBLIC_KEY_BYTES

Here's what the key looks like when encoded with multibase (0x7a means base58btc) + multicodec_type (0xed01 means ed25519 public key bytes):

z2DhMLJmV8kNQm6zeWUrXQKtmzoh6YkKHSRxVSibscDQ7nq

Expressing a cryptographic identifier that is a base58btc encoded RSA SPKI-based public key fingerprint using SHA2-256/256 would look like this:

0x7a 0x5d 0x12 0x20 HASH_BYTES

Here's what it looks like in practice (0x7a means base58btc) + rsa SPKI public key fingerprint (0x5d -- not yet in the multicodec table) + multicodec_type (0x12 means SHA2-256) + hash byte length (0x20 means 32 bytes):

z2czTJ1VEECSvESEamgp88mBLpqyJvyKvEE4YNamMoY1JWK29sKv

Now for the stuff that makes me uncomfortable:

  1. Multicodec has an expression for ed25519 public keys... but not RSA key fingerprints (or any other sort of key for that matter). So we would either have to add something to the multicodec table (0x5d), which is easy, or we would have to key off of the JSON-LD type field (which is what the example above does). So if the type is RsaVerificationKey2018, then that spec defines how you hash a public key... which would most likely use one of the standard mechanisms for hashing those sorts of keys (like the style adopted in most browsers). This would mean we don't need to add anything to multicodec, but the detail would be buried in the RSASignature2018 spec (which isn't that terrible). If we decide to use multicodec instead, we may need to add 3-8 entries to multicodec because there are a non-trivial number of ways of hashing public keys.
  2. We need to pick a name for this property, perhaps "publicKeyMulticodec"?

So, there are really two paths we can go down:

  1. Use multicodec for everything, so every cryptographic identifier is uniform (this requires us to expand the multicodec table): multibase + multicodec + key_type + (raw_bytes|multihash)

Pros:

  • Every cryptographic identifier expressed across LD Proofs/Signatures is uniform.

Cons:

  • Requires us to expand the multicodec table with multiple new values.
  1. Use multicodec for cryptographic identifiers based off of ed25519 keys (multibase + multicodec + keytype + raw_bytes), and hashed RSA public keys (multibase + multicodec + multihash).

Pros:

  • Does not require us to expand the multicodec table.

Cons:

  • Buries complexity in the LD Signatures specs.
@msporny msporny self-assigned this Aug 17, 2018
@msporny msporny changed the title Use of Multibase and Multihash Use of Multibase, Multicodec, and Multihash Aug 17, 2018
@msporny
Copy link
Author

msporny commented Aug 17, 2018

We probably need input from @jbenet, @lgierth, @dlongley, @mattcollier, @dhh1128, @nage, and @mikelodder7 on this.

@msporny
Copy link
Author

msporny commented Aug 17, 2018

... and before the debate begins, I do want to make the following clear:

We don't have to agree on any of this. The design of the Linked Data Signatures/Proofs stuff is such that each of us could make a radically different decision on this, mint a new JSON-LD term for it, and go our separate ways and not close the door on interoperability (we'd just increase complexity by using different terms, but still enable interop).

There's just a big opportunity here to standardize on compact public key expression and I wanted to see if we could achieve that with the proposal above.

@mikelodder7
Copy link

I'm not as familiar with this as I probably should be. Where/When is the best place to discuss this more in depth?

@msporny
Copy link
Author

msporny commented Aug 20, 2018

Where/When is the best place to discuss this more in depth?

Probably just in here to start. We can take it to the W3C CCG if we think a broader community discussion is warranted. At present, I just need to figure out if it's possible for IPFS, Sovrin, and Veres One to express ed25519 public keys in the same way (we're already doing base58 encoding of ed25519 public keys, IIUC). So, maybe we should start there. How are IPFS and Sovrin expressing their ed25519 public keys?

@mikelodder7
Copy link

I'm pretty sure Sovrin is using base58 as well.

@msporny
Copy link
Author

msporny commented Aug 20, 2018

I'm pretty sure Sovrin is using base58 as well.

Specifically, the Bitcoin/IPFS version of base58 (not the flickr version) ... and you encode the ed25519 public key raw bytes, right? Are the Sovrin pairwise DIDs just base58btc(ed25519PublicKey)?

If so, the only difference between that and IPFS/Veres One is a multiformats 0xed or 0x30ed prefix for the key... I don't know if IPFS prefixes their ed25519 keys with 0xed or 0x30ed... or if they even use ed25519 (I did some reading over the weekend that made it seem like they just use RSA keys?)

@mikelodder7
Copy link

Sovrin uses bitcoin base58 with no prefix. Just the raw key bytes.

@peacekeeper
Copy link
Member

I don't have a strong opinion here. I could imagine leaving the JSON-LD term publicKeyBase58 with a value that's only the (base58btc) key bytes, but I could also imagine introducing a JSON-LD term like publicKeyMulticodec, with the pros and cons you listed.

The JOSE community would probably say just do what JWK does.

Regarding IPFS, I don't think it actually exposes keys in any way. I tried creating an ed25519 IPFS address, and got something like this:

12D3KooWSoeYKbpkb5UoL2T5eiomWRHdxR9cPC4tk11gKU89fFwT

The first two bytes here seem to be (0x30 - multicodec) and (0x31 - multihash), but I'm not sure how to interpret the rest.

@msporny
Copy link
Author

msporny commented Aug 21, 2018

@peacekeeper wrote:

I don't have a strong opinion here. I could imagine leaving the JSON-LD term publicKeyBase58 with a value that's only the (base58btc) key bytes

Yes, I think we'd keep this untouched. Veres One is probably going to keep publicKeyBase58 and I expect Sovrin would do the same.

I've modified the proposal above slightly to really talk about cryptographic identifiers... because I think that's actually the common thread here: How does Veres One, IPFS, and Sovrin create cryptographic identifiers? I'm trying to see if we can align all three communities because I think that we're VERY close.

If we do a good job, we may even be able to convince the Bitcoin and Ethereum communities to support these extensible cryptographic identifiers, but I'm not going to hold my breath on that one. :)

I could also imagine introducing a JSON-LD term like publicKeyMulticodec, with the pros and cons you listed.

Having slept on it, I don't think we'd have to. We could, but I think my original proposal was misguided. We're not talking about expressing public keys... we're really talking about expressing cryptographic identifiers where cryptographic identifier is defined as an identifier tied to cryptographic material in some way (ed25519 public key bytes or RSA SPKI fingerprint) that can be used to cryptographically authenticate a message generated by the holder of the private key material associated with that cryptographic identifier.

The JOSE community would probably say just do what JWK does.

That would be encoding something like this (ed25519 pubkey) in base58:

{"kty":"OKP","crv":"Ed25519","x":"11qYAYKxCrfVS_7TyWQHOg7hcvPapiMlrwIaaPcHURo"}

Which would give you this as a cryptographic identifier (note the double base64/base58 encoding):

2Lpnvt23H6qHswCNPmwCCUSas7YNPHBxibnrGnoEFLTvtJfT4sSFJgUBwVNPGHEE5JFA9djKGmcQzEyJTDNkZhfzVwjV1dC9qdNPR4zDqsCuBX

Now compare that against the equivalent multiformats-based CID:

z2DhMLJmV8kNQm6zeWUrXQKtmzoh6YkKHSRxVSibscDQ7nq

... waaay more compact... and it being compact matters for CIDs.

Also keep in mind that you can also express the JWK as this (note that the x property is first instead of last):

{"x":"11qYAYKxCrfVS_7TyWQHOg7hcvPapiMlrwIaaPcHURo","kty":"OKP","crv":"Ed25519"}

which would generate a completely different identifier. Same if you accidentally add a space, or anything else. So, you'd also need to employ a normalization algorithm... and you'd still end up with a CID that is twice as long as it needs to be.

Regarding IPFS, I don't think it actually exposes keys in any way.

IPFS has Peer IDs: ipfs/specs#58 (comment)

... and they want to move to CIDs.

So, it looks like this is a common problem with a potentially common solution that would work in a backwards-compatible way for Veres One, Sovrin, IPFS, Ethereum, and Bitcoin.

@mikelodder7
Copy link

mikelodder7 commented Aug 21, 2018

looks like IPFS is using some prefix for their keys Issue #4240 and protobuffers from their code Edd25519 Line 43 and Key Line 273

Sovrin is just encoding the public keys as Bitcoin Base58

@msporny
Copy link
Author

msporny commented Jan 25, 2019

NOTE: I had to edit the original proposal to clean it up based on a new understanding wrt. the way the multicodec stuff works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants