Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting or creating a TAP to describe keyid_hash_algorithms #848

Closed
erickt opened this issue Apr 4, 2019 · 30 comments
Closed

Documenting or creating a TAP to describe keyid_hash_algorithms #848

erickt opened this issue Apr 4, 2019 · 30 comments

Comments

@erickt
Copy link

erickt commented Apr 4, 2019

Hello again,

Would you consider documenting, or possibly writing a TAP, to describe the keyid_hash_algorithms field in a key definition? As I upgrade go-tuf (and eventually rust-tuf) to the TUF 1.0 Draft, I am also updating it to be compatible with the metadata produced and consumed by this project. In order to do so, I need to also produce a compatible keyid_hash_algorithms field the metadata I produce. However, I'm not completely sure what I am supposed to do with this field when I receive it from python-tuf.

It appears this field originally came from secure-systems-lab/securesystemslib#37. As best as I can tell, it is present so that a client knows which algorithm was used to compute a keyid, so it can verify that a keyid is correct. Is this right, and are there other ways the keyid_hash_algorithms is used?

@awwad
Copy link
Contributor

awwad commented Apr 4, 2019

((Background note for others: KeyIDs matter in TUF. It is technically possible, I think, to construct a slower client for the reference implementation that could verify metadata without knowing how keyids were calculated, without substantial security issues. You could do this by -- to take signature checking as an example -- trying to verify a signature using all authorized keys rather than just the one with a matching keyid the signature lists.... I'm not recommending that.))

There's been some debate (#442 and #434, for example), about keyid_hash_algorithms and keyid calculation in general. Personally, I don't think the keyid_hash_algorithms field is of use, and I would like to remove it from keys, as I think it's probably sensible to assume across an implementation and deployment that all keyid calculations employ a specific hash function.

As for your question, there are two answers:

  1. The hypothetical feature keyid_hash_algorithms is supposed to provide is the ability to limit the client's calculation of possible keyids for a key to only the algorithms listed there, to reduce collision space for keys or exclude, say, outdated hash algorithms from keyid calculations for that key. You could have keys that expect to only be hashed to produce keyids using SHA512 or something, I guess. (But really, if a hash algorithm is unacceptable, it should be unacceptable for all keys in the implementation, IMHO.)

  2. The current code in the reference implementation includes keyid_hash_algorithms in the keyid calculation, which means that:

    • changing the values in that field for a key changes the key's keyid
    • removing the field changes all keys' keyids, which is a bit of a backward compatibility headache

Sorry for the headache. ^_^ Does that clear things up enough to ensure compatibility?

@awwad
Copy link
Contributor

awwad commented Apr 4, 2019

Oh, sorry, I didn't answer the TAP question:

If we thought that keyid_hash_algorithms was an important element of the system, it'd be a good thing to add to the spec (using a TAP or not), but I don't think we do.... We just need to discuss it a bit more at some point and (hopefully) remove it from the reference implementation in a breaking release. So I'd say this is a compatibility-with-the-reference-implementation issue rather than a specification issue, if that makes sense.

Please let me know if I missed anything else. :)

@erickt
Copy link
Author

erickt commented Sep 24, 2019

@awwad hello! Since it looks like 0.12 is coming up soon, has there been a discussion about removing keyid_hash_algorithms? I'm actually working on a conformance suite to make sure my go-tuf and rust-tuf are compatible with python-tuf, but it's a little unfortunate that the hard requirement of python-tuf to have keys contain keyid_hash_algorithms forces me to add support, even though you think it doesn't add much value.

Is it too late to get removing keyid_hash_algorithms into 0.12? Or to allow for a more transition path, allow python-tuf to parse metadata that doesn't contain keyid_hash_algorithms?

@trishankatdatadog
Copy link
Member

Cc @lukpueh

@trishankatdatadog
Copy link
Member

trishankatdatadog commented Sep 24, 2019

I'm wary of removing keyid_hash_algorithms in 0.12 for two reasons:

  1. Security. There might be some unexpected issues we haven't thought about.
  2. Backwards-compatibility. I'm not a big fan of removing harmless things at the expense of unnecessarily breaking old clients.

@JustinCappos thoughts?

@JustinCappos
Copy link
Member

JustinCappos commented Sep 25, 2019 via email

@erickt
Copy link
Author

erickt commented Sep 25, 2019

If there's security advantages to it, I'm fine implementing it in go-tuf/rust-tuf, I'm just not really sure how to do it. As best as I can tell, even though python-tuf has keyid_hash_algorithms=["sha256","sha512"], I don't see any metadata being generated with sha512 key ids, or anything using it. Is that a bug, or is there something I'm missing?

My first thought is that we're supposed to list keys multiple times, once for each item in the keyid_hash_algorithms. The problem I see is what happens if we have a TUF implementation that doesn't verify key ids, it just uses the key id listed in the signatures block to find the key in the root metadata. If an attacker steals one private key, they could sign metadata twice with the different key ids, thus defeating the threshold count. I tripped over this in go-tuf while I was migrating from the TUF 0.9 to 1.0 key structure (fortunately I caught it before committing it to the repo).

Regarding backwards compatibility, at the very least tuf 0.12 could do what I do, and treat the field as optional so python-tuf can parse metadata produced by another library that doesn't emit keyid_hash_algorithms. I'm doing this in go-tuf / rust-tuf.

@SantiagoTorres
Copy link
Member

. As best as I can tell, even though python-tuf has keyid_hash_algorithms=["sha256","sha512"], I don't see any metadata being generated with sha512 key ids, or anything using it. Is that a bug, or is there something I'm missing?

I believe this happens because the runtime settings aren't the same as the ones being checked in, althoughI could be wrong.

@trishankatdatadog
Copy link
Member

@erickt Yes, I see what you mean now. I think right now there isn't a good way in TUF to group together otherwise identical keys with different keyids. Until we solve this problem (perhaps in a backwards-incompatible TAP), I agree we should ignore keyid_hash_algorithms (it's not like we use SHA-512 keyids anywhere AFAICT?), simply assume SHA-256, and document it clearly in the spec.

@JustinCappos
Copy link
Member

JustinCappos commented Sep 25, 2019 via email

@SantiagoTorres
Copy link
Member

spec-wise, I'm a little bit worried about backwards compatibility and what it entails on the semver angle...

@trishankatdatadog
Copy link
Member

@SantiagoTorres True, it would break the reference implementation if we remove it from the code. What I'm saying is: keyid_hash_algorithms should not make it into the spec. Rather, we assume that the keyid is the SHA-256 of the canonical JSON of the key object. Other implementations SHOULD ignore keyid_hash_algorithms when they see it in metadata produced by the reference implementation.

This also means that while other implementations can consume reference implementation metadata w/o breaking, the converse is not true (because it silently needs keyid_hash_algorithms). We can update the reference implementation to assume some sane default if it is not present.

As for backwards-incompatibility produced by the TAP, we can worry about that later.

Is this all clear?

@erickt
Copy link
Author

erickt commented Sep 26, 2019

The complication is that the keyid_hash_algorithm is included in the keyid computed by python-tuf, so we couldn't just remove it from python-tuf without forcing the ids to be recalculated. I suppose python-tuf could be written in such a way that for some period of time it automatically compute two keyids for a key, one with, and one without the keyid_hash_algorithms. Then during verification make sure to only count a key once.

This is essentially the approach I'm taking in go-tuf, where I automatically trust both keyids, even if both aren't listed in the metadata, and only count the keys once during verification (looking at the code though I really should be ignoring unknown keyid algorithms though). Anyway, with this approach, once all the clients were updated to the latest version, you could switch towards generating the standard TUF-1.0 keyids. Perhaps 0.12 could implement parsing both IDs, and 0.13 could switch the code generation to remove keyid_hash_algorithms?

Also, perhaps the spec state machine should be updated to explicitly include rules on how to deduplicate keys. Would it be sufficient to dedup on the keyval field, or should we dedup on the (keytype, scheme, keyval) tuple?

@lukpueh
Copy link
Member

lukpueh commented Sep 26, 2019

Regarding what others have said about python-tuf only generating sha256 keyids, it indeed does not seem straight forward to generate keyids using any other algorithm, at least not with python-tuf/securesystemslib methods. The corresponding function is a barely documented "private" helper that has a hardcoded "sha256" default argument (see securesystemslib.keys._get_keyid()), and none of the functions that use it allow to change that argument. The only function that does call _get_keyids with different hash algorithms is securesystemslib.keys.format_metadata_to_key(). It does so by consulting a global setting for hash algorithms and returning the list of keyids as undocumented second return value, which makes me think that this feature was added a bit after the fact. That function is used by python-tuf to add the same key under different keyids (different hash algos) to the keydb (tuf's core trust management facility).

I don't think that the desired flexibility gained by supporting different hash algorithms for keyids, justifies the complexity of the (current) implementation, and the associated maintenance cost.

@lukpueh
Copy link
Member

lukpueh commented Sep 26, 2019

Btw. and since this has been an argument here, the next tuf release will already be backwards incompatible.

See first line of tentative changelog:

Add backwards incompatible TUF spec version checks (#842, #844, #854, #914)

@trishankatdatadog
Copy link
Member

@lukpueh We are going off-topic, but would the next release break existing clients that have not updated TUF to the next release?

@lukpueh
Copy link
Member

lukpueh commented Sep 26, 2019

@trishankatdatadog, not necessarily, but let's discuss this somewhere else.

@erickt
Copy link
Author

erickt commented Oct 9, 2019

Just checking in, have you had any luck figuring out what to do with keyid_hash_algorithms?

@trishankatdatadog
Copy link
Member

@erickt We haven't come to a consensus, and we need to fix this.

We should at least remove it from the spec, if not the reference code right now.

@erickt
Copy link
Author

erickt commented Nov 7, 2019

In theupdateframework/specification#58, @trishankatdatadog and @lukpueh clarified that the core roles always find their keys in the root role, and that delegated target roles find their keys in the delegating target role (and not by looking in any delegation chains). If this is correct, then we might be able to simplify handling of the keyid_hash_algorithms issue.

Could the spec be loosened to leave it up to the TUF implementation on how exactly keyids are defined? They just need to guarantee that:

  • keyids are unique in a role
  • keyids must be stable (eg, adding or removing a new key doesn't cause other key ids to change)
  • there is only ever one keyid for a key

It is not required that keyids be globally unique. The spec could suggest, but not require, that implementations could use hexdigest(sha256(cjson(public_keys))), but TUF clients should treat keyids as opaque identifiers to find the key in the appropriate metadata. It would then be perfectly reasonable for TUF implementations to support using human readable names for key ids, or even the raw public key itself for the key id.

For the case of python-tuf if you wanted to get rid of keyid_hash_algorithms, all existing keys with keyid_hash_algorithms must keep the keyid that's in the metadata, but any new key that's generated doesn't populate that field, and therefore conforms with the standard keyid hash algorithm. Clients wouldn't try to re-derive the keyid, and so wouldn't be aware that the algorithm was changing out from under them.

@trishankatdatadog
Copy link
Member

@erickt 💯

I've been trying to say the same thing: keyids need not be globally unique, only within a metadata file

@lukpueh
Copy link
Member

lukpueh commented Nov 8, 2019

Seconding 💯.

However, I'd be very careful to say that they only need to be unique within a metadata file, because the scope of a keyid always stretches over at least two metadata files, that of the delegating role, where the key is defined, and that of the delegated role, where that key is used to create/verify signatures.

And this gets even tricker if multiple roles delegate to a role. Which, IIUC is a rather hypothetical but not strictly forbidden scenario in TUF (see @awwad's excellent elaborations on that matter in #660).

So in a quite roundabout way, I'd phrase @erickt's first condition has

keyids are unique across the definition context ("keys") of all "n" delegating roles and the usage/verification context ("signatures") of all "m" delegated roles, for any "n-to-m" delegation.

I am aware that this wording is not suitable for the spec. But maybe someone can transform it into something more readable? @jhdalek55?

Maybe we can also just get away by saying something simpler, e.g. along the lines of ...

keyids must allow to unambiguously attribute signatures in the metadata of a delegated role to the keys defined in the metadata of a delegating role.... need not be globally unique ... recommend to use hexdigest(sha256(cjson(public_keys)))

@lukpueh
Copy link
Member

lukpueh commented Nov 8, 2019

At any rate, the information about keyids in the spec is pretty sparse so far. There are only these three places.

(1) 4.2. File formats: general principles L497: ...

KEYID is the identifier of the key signing the ROLE dictionary.

(2) ... and L572-L573:

The KEYID of a key is the hexdigest of the SHA-256 hash of the
canonical JSON form of the key.

(3) 4.3. File formats: root.json L630-L633:

The KEYID must be correct for the specified KEY. Clients MUST calculate
each KEYID to verify this is correct for the associated key. Clients MUST
ensure that for any KEYID represented in this key list and in other files,
only one unique key has that KEYID.

I suggest to

  • leave (1) as it is,
  • replace (2) with uniqueness requirements and recommendations we come up with here, and
  • remove (3).

@jhdalek55
Copy link
Contributor

I can take a look at the section you referenced above, but its a bit hard to do out of context. Can you point to the section where this falls?

@lukpueh
Copy link
Member

lukpueh commented Nov 8, 2019

Thanks, @jhdalek55. You're probably right that it is a lot to ask without additional context. Let's see if @trishankatdatadog or @erickt have feedback to my comment.

@awwad's input here would also be helpful, given that he has spent a lot of thought on "promiscuous delegations". And he has also shown (on a different channel) that he has an opinion about the scope/uniqueness of keyids.

lukpueh referenced this issue Jan 10, 2020
Prior to this commit metadadata signature verification as provided
by `tuf.sig.verify()` and used e.g. in `tuf.client.updater` counted
multiple signatures with identical authorized keyids each
separately towards the threshold. This behavior practically
subverts the signature thresholds check.

This commit fixes the issue by counting identical authorized keyids
only once towards the threshold.

The commit further clarifies the behavior of the relevant functions
in the `sig` module, i.e. `get_signature_status` and `verify` in
their respective docstrings. And adds tests for those functions and
also for the client updater.

---

NOTE: With this commit signatures with different authorized keyids
still each count separately towards the threshold, even if the
keyids identify the same key. If this behavior is not desired, I
propose the following fix instead. It verifies uniqueness of keys
(and not keyids):

```
diff --git a/tuf/sig.py b/tuf/sig.py
index ae9bae15..5392e596 100755
--- a/tuf/sig.py
+++ b/tuf/sig.py
@@ -303,7 +303,14 @@ def verify(signable, role, repository_name='default', threshold=None,
   if threshold is None or threshold <= 0: #pragma: no cover
     raise securesystemslib.exceptions.Error("Invalid threshold: " + repr(threshold))

-  return len(good_sigs) >= threshold
+  # Different keyids might point to the same key
+  # To be safe, check against unique public key values
+  unique_good_sig_keys = set()
+  for keyid in good_sigs:
+    key = tuf.keydb.get_key(keyid, repository_name)
+    unique_good_sig_keys.add(key["keyval"]["public"])
+
+  return len(unique_good_sig_keys) >= threshold

```

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
@lukpueh
Copy link
Member

lukpueh commented Jan 21, 2020

Another argument for loosening the keyid specification is interoperability with existing PKI-infrastructure. Here are two data points that support that claim.

erickt added a commit to erickt/rust-tuf that referenced this issue Jan 23, 2020
python-tuf is [considering] getting rid of keyid_hash_algorithms, so we
shouldn't default to generating keys with them specified.

[considering]: theupdateframework/python-tuf#848

Change-Id: I2c3af5d5eb7b0cc30793b54e45155320164cf706
vsrinivas pushed a commit to vsrinivas/fuchsia that referenced this issue Jan 29, 2020
Before this patch, the resolver assumed that all tuf keys should have
the `"keyid_hash_algorithms": ["sha256"]` specified. However this field
was only added to check compatibility with python-tuf, and python-tuf is
[considering] getting rid of them, if they can figure out how to do it
without breaking their current users.

So we'd like to migrate away from using them to avoid having to have these
fields around for all time. This is the first step is to allow us to
verify the initial TUF metadata with two variations of the root TUF keys,
one with, and one without the keyid_hash_algorithms specified. This is
safe (as in we won't double count a key) as long as the metadata doesn't
list the same key multiple times with different keyids.

Once everyone has migrated over to the new metadata that doesn't mention
`keyid_hash_algorithms`, we can get rid of the call to
`PublicKey::from_ed25519_with_keyid_hash_algorithms`.

[considering]: theupdateframework/python-tuf#848

Fixed: 44490
Change-Id: Ib84ca4551b9d68f322039215ba40996608d6ca58
@awwad
Copy link
Contributor

awwad commented Mar 13, 2020

IMHO:

What hashing algorithm is used when a key ID is calculated is an implementation detail. It is not a property of the key. Keys should not each individually define a list of hash algorithms that are acceptable for referring to those keys, leading the key to change any time implementation details do. Referring to a key is something that signatures, delegating metadata, and clients do. I doubt there is any serious need to permit an implementation to support multiple different key ID hash algorithms for the same key type; however, if there is, it should still not be in the hashed value of the key itself (see below).

Refresher:
Recall that keys in TUF show up in three places: in key representations themselves, in signatures, and in delegations. Key IDs are important wherever the possibility exists of large key values that are inconvenient to represent repeatedly (basically, whenever you support using something other than elliptic curve crypto, or wrap that in something like OpenPGP).

  • A key does not need to indicate its keyid (or how to determine its keyid).
  • A signature helpfully prompts you with the keyid of the key that made it (so that you don't have to waste time on every other authorized public key in a delegation).
  • A delegation authorizes keys by keyid.

TODO:

  • We should, in the reference implementation, remove key_id_hash_algorithms from key representations, as soon as practical, before folks start sticking this stuff in their own implementations.... (It's far less effort to PR to the reference implementation than to sign your implementation up for future pain here, I think.)
  • No other implementations should use keyid_hash_algorithms entries in keys. If metadata must define this behavior, it should be a root metadata setting; if that is truly not possible due to diversity (Uptane might be a use case -- ECU limitations), then this behavior should be defined in the key listings in delegating metadata only. In that situation, for convenience, it can be added to signatures (that that would be under attacker control should not be an issue -- LMK if you want more info).
  • If anything should be added to the spec about this (via a TAP or otherwise) it should be the prior point: don't stick this stuff in keys.

I've said previously that the result of the current policy in the reference implementation (aside from the fact that it is confusing) is that trying to refer to a key in a different way requires changing the key's identity and breaking any delegations to it. It will cause (and has caused) headaches. Break metadata delegations once and remove this for the sake of clarity and simplicity, instead of breaking delegations whenever hash algorithms need to change. This change should not cost us any interoperability.

Re. GPG keys: for my part, when I've had to support OpenPGP keys and signatures (basically, for YubiKey interfacing), I've used them as a vector for more typical ed25519 signatures: in signatures, in delegating metadata, and in the verification process, I still refer to and use the public key using its underlying value (not the OpenPGP keyid).

@mnm678
Copy link
Contributor

mnm678 commented Mar 13, 2020

On a related note, I've been working on a proposal to change the TUF specification to allow for more flexibility in how keyids are calculated. A draft of the document is available here and I'd appreciate any feedback. This change would give implementers (including the reference implementation) more flexibility in how keyids can be used while adhering to the specification.

@MVrachev
Copy link
Collaborator

@joshuagl @lukpueh do you think this issue is still relevant given that Remove uses of keyid_hash_algorithms #1014 is merged and we have an issue where we are following the TAP 12 implementation #1084 ?

@lukpueh
Copy link
Member

lukpueh commented Sep 28, 2020

I suggest we close this issue. There is no longer a need for documenting or creating a TAP to describe keyid_hash_algorithms, given that the field is being removed from the reference implementation.

Let's discuss the removal of the field in the corresponding issue #1084.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants