-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
experimental client: Add MetadataBundle #1355
experimental client: Add MetadataBundle #1355
Conversation
From discussion with @sechkova: We do need to expose local metadata loading (instead of doing it automatically) for two reasons:
For delegate handle we also need to have an optional delegator role. So instead of just
|
cc3d3f6
to
057649c
Compare
Initial draft was very neat and clean but after some impacts with reality (thanks Teodora for early review), it currently looks like this:
I still think this is a good direction. The code is not ready by any means but I'm making this pr now so that Teodora can start using it early if we decide this is a path experimental-client should take. I suggest reviewing the final form and not individual commits -- there were false starts in there. The second bullet point in the list above is a potential issue: is the internal state tracking understandable and clear (or can it be improved to be that). The other potential issue is delegated metadata. Delegates can be added and they are validated on insert but they are not currently under the same guarantees that top-level metadata is: if there's a top-level "targets" in the bundle it's guaranteed to be valid, but nothing currently stops caller from updating "role1" so that it no longer delegates to "role2" that is also in the bundle... This can definitely be fixed (I think just reloading all metadata delegated by the just updated metadata should do it and might even simplify the internal state tracking : I just haven't had time to try yet). Third things to mention might be exceptions: I believe one is thrown where needed now... but the actual exceptions are not chosen at all: My plan is that all issues that arise from metadata content will be under one base Exception. Client will likely just handle the base exception. Other exceptions will be internal errors and user errors. |
On the API and naming:
MetadataBundle itself is not a great name but works -- I know Teodora is fond of "trusted" or some variation of it in the name to make that part clear. the loader function names could be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the code is easy to read and follow. The few complicated places (like verifying hashes) have issues that those things can be added in the Metadata API.
Also, I agree that helper functions could be useful like get_snap_version()
in timestamp.
One additional suggestion.
Right now, you are calling self._raise_on_unsupported_state(role_name)
when loading and updating, and you do checks at the beginning of each of _load_*
functions.
That way it's easy to forget or duplicate checks.
I would suggest using self._raise_on_unsupported_state(role_name)
to handle all state checks
and call it at the begging of each of the _load_*
functions.
Yeah this is true
I think I want to only do it early in the public functions and am 99% sure I can just remove the checks in the actual |
There's a lot of if-else inside generic-sounding functions. Is that a good idea? Should we perhaps use specific functions for specific purposes? Not only improves readability, but also detecting errors. |
You're totally correct (I debated myself about writing about this but ... I wrote so many comments already :) ). Let me drill down into this a bit. There are two classes of if-elses:
This relates to mirrors in the following way:
So I believe the dispatching code (the if-clauses that look at role name to decide what to call) is going to exist somewhere anyway (again, assuming current mirroring support) So I guess this brings us to: Re-designing mirror support (essentially limiting it drastically) would make this, and the updater download code, much cleaner. We've been really trying to not solve too many issues at once but maybe we do need to look at that as well... (marking draft until the mirrors discussion happens) |
Updating this based on the mirrors/downloads changes:
I'm quite happy with this: it's not any shorter than the previous version but it's simple and straight forward. I've tested using the bundle from the client and mostly it feels like a clear improvement: the client code can concentrate on deciding what should happen ("Now load snapshot from local storage: if that one's not valid anymore, then load snapshot from remote") while the bundle verifies that metadata state is valid according to spec at every step. It's definitely not complete (most importantly missing tests and exceptions are still WIP, and the spec validation needs to be reviewed carefully by someone who's not me) but I think it might be good enough for the experimental-client? Opinions as well as naming suggestions are welcome. On naming: is MetadataBundle ok? load_local_()? update_()? TLDR documentation:
(EDIT: rebasing this to top of the branch brings out more linting issues: I'll still be fixing those) |
tuf/client_rework/metadata_bundle.py
Outdated
the load functions will refuse to load the files when they are not signed by | ||
current root keys. Deleting at the specified point is possible but means additional | ||
code with some quirks.. | ||
* in general local metadata files are not deleted (they just won't succesfully load) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be worth a discussion. I don't like deleting local metadata files when they're invalid for two reasons:
- it doesn't provide security (if the metadata is invalid it cannot be loaded -- if it could then our program has a bug that should be fixed: deleting the metadata won't really help)
- it prevents all debugging: "oh the app says update failed because of invalid metadata... well, that file doesn't exist anymore 🤷"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering is there a possible situation where you can fill up the storage by providing let's say thousands of invalid metadata files?
Other than that your arguments are sensible.
379d333
to
e839931
Compare
Had to rebase on current experimental-client to get the linting up-to-date: of course the newer linting revealed lots of new issues that I've fixed or documented :) Even more than before I suggest not reading individual commits: there's a lot of less-than-useful history there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice, easy to follow even when attention is depleted. Nice work @jku !
The class level docstring suggests that invalid local metadata is deleted. I think you made a conscious decision not to do that? It looks like the current implementation only ever preserves the most recently verified metadata file for each role? If we did start storing intermediate files on disk, we would need to add a mechanism to prevent the local metadata directory getting too large. But I don't think that's necessary in the current state.
There's a lot of LBYL style code load_local_*
, is_expired()
and if not verify_with_threshold()
, whereas I think we want to lean into the Pythonic EAFP. Can we reasonably have each of those, especially the latter two, raise Exceptions that would be reasonable to surface to the user? Or would we need to catch them and re-raise to provide meaningful errors?
It will be nice to see this without the repeated logic, such as the big block for checking hashes in meta.
Also, this appears to be internal API – did we figure the best way to indicate that? Or is that TBD?
Thanks, good to get "fresh" eyes on this.
Oops.
I've gone into detail about load_local_* in a code comment (you may be right).
Yes, hash checks should also be part of Metadata API (#1377) in my opinion.
My suggestion was to move everything except updater.py and fetcherinterface.py into a tuf/client/_internal/ directory. Would make sense to do before we merge to develop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one doubt about file system access.
On one hand, by doing it inside the bundle, as it is now, the bundle itself is in control of persisting only verified metadata files that passed all checks which seems less error prone. Also keeps the Updater
code simpler.
But on the other hand, if the caller (Updater
) handles both access to the file system and the network download, MetadataBundle will implement only the actual metadata verification steps from the client workflow, will be I/O agnostic and will provide a single public method per metadata type (basically the now private _load_*
methods).
Other than that, I tried to follow the specification and I think all steps that do not relate to the network download are implemented (besides consistent_snapshot and some TODOs already in the code). But for sure another look is needed when all client components are combined together.
These are valid points. I guess the main idea was to get (almost) all spec steps into the same file so it would be easy to follow. But I can definitely be persuaded on this. |
We can come back to this idea when the different PRs are pieced together. |
MetadataBundle keeps track of current valid set of metadata for the client, and handles almost every step of the "Detailed client workflow" in the TUF specification (the remaining steps are download related). The bundle takes care of persisting valid metadata on disk, loading local metadata from disk and deleting invalid local metadata. It also verifies any new metadata (downloaded from remote repository) it is given. This is very much a work-in-progress. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
because we are deserializing, not serializing. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
The bundle should now raise * derivatives of RepositoryError on failures that are likely a result of server error or a malicious server * RuntimeErrors if calls were made when they are not possible * ValueErrors if arguments are invalid last two are callers errors and avoidable. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
There's on value in using custom errors when builtins work. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
This is backwards-compatible and means that most (all?) errors resulting from suspicious or broken metadata are now RepositoryErrors. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Remove file IO from MetadataBundle: * This make the bundle API very clear and easy to understand * This means caller must now read from and persist data to disk but initial prototypes suggest this won't make Updater too complex This change is something we can still back out from if it turns out to be the wrong decision: the file-persisting MetadataBundle has been tested and works fine. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
I've been hinking about this and I think I agree with you: this makes the bundle really easy to understand (and makes bundle really do one thing well), and I think the Updater will still be manageable -- it's main function during refresh() is to do I/O, either network or filesystem. I'll do this change as part of this PR |
e839931
to
f2cff95
Compare
First of all, I've rebased so I can use some fixes from newer experimental-client: those who reviewed already can take commit The major changes are:
All of the smaller comments should be handled now. This PR also includes 2 commits of PR #1390 to make things work. |
See https://github.com/jku/tuf/blob/experimental-client-use-bundle/tuf/client_rework/updater_rework.py#L241 for a working example of Updater using this code (I'll make that into a PR once this one is merged). In this version we end up with following lines-of-code counts for the complete client (as counted by sloccount so comments not counted):
This looks pretty good to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this simpler version. I left one suggestion so that I don't leave this review blank :)
09d377d
to
b8967e2
Compare
oops the type hint fix I included in the last commit depended on a python3.9 feature: I removed that, the commit is now just docstrings based onTeodoras last comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks like a pretty good implementation.
I left only one suggestion.
Of course, we will iterate over time on this code, but now it looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final result here looks good. I noticed a couple of docstring issues and a comment that may not make much sense given the removal of disk and network I/O. I did not do a comprehensive review against the spec, but did notice one place where we do things in a slightly different order than specified. Curious to hear your thoughts about that.
eb648d1 makes me a bit anxious – why do we have Metadata.to_file()
if we can not rely on it to persist metadata which can be verified? Perhaps a topic for a new Issue?
# empty keys and roles | ||
delegations_dict = {"keys":{}, "roles":[]} | ||
delegations = Delegations.from_dict(delegations_dict.copy()) | ||
self.assertEqual(delegations_dict, delegations.to_dict()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: assertDictEqual()
gives more meaningful messages when this fails.
"New root is not signed by root", new_root.signed | ||
) | ||
|
||
if new_root.signed.version != self.root.signed.version + 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the spec, this comes after verifying with a threshold of keys from the new root metadata. The code is certainly nicer if it happens in this same if
... do you think we should move it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct...
I think I'm happy with it being here (this feels like the spec doing things in the non-optimal order and the code happens to look nicer this way as well) but no strong opinions.
Document arguments and exceptions, improve prose in general. Remove mention of local file deletion now that file IO is not done here. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
b8967e2
to
377eac1
Compare
This does feel like an oversight in the spec as well: it talks about using canonical forms (and canonical json in particular) for signatures... but hashes aren't talked about (and requiring the implementer to parse canonical json before checking hashes would be wrong anyway). Even the test files in our repo are not in canonical form WRT whitespace. On the other hand, it's probably not the end of the world: for the client this is not an issue anyway: writing the original bytes seems a lot safer anyway. |
Some clarifications in the spec, or the oft suggested secondary literature, would probably help here.
Agreed. That's what the current client does and what I believe the spec is suggesting when it states what name to download the file as. |
MetadataBundle keeps track of current valid set of metadata for the
client, and handles almost every step of the "Detailed client workflow"
in the TUF specification (the remaining steps are I/O related).
It also
verifies any new metadata (local or downloaded from remote repository).
EDIT this bit is no longer true in the newest version: The bundle takes care of persisting valid metadata on disk, loading
local metadata from disk and deleting invalid local metadata.