Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata API: make Root roles a Mapping #1668

Merged

Conversation

MVrachev
Copy link
Collaborator

@MVrachev MVrachev commented Nov 9, 2021

Fixes #1356

Description of the changes being introduced by the pull request:

This pr includes a couple of smaller changes:

  • make root roles Mapping to indicate that users should not add or remove
    values from the dictionary during the lifetime of the Root object

EDIT: Those two were included in the initial version of the pr.
Then together with @jku we agreed we don't want to do that.
See my comment #1668 (comment)

  • validate that TARGETPATHs are not empty strings
  • validate that METAPATHs are not empty strings or any of the top-level metadata roles

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Mark roles as a Mapping to indicate that users should not add or remove
values from the dictionary during the lifetime of the Root object)

Signed-off-by: Martin Vrachev <mvrachev@vmware.com>
@MVrachev
Copy link
Collaborator Author

MVrachev commented Nov 9, 2021

@jku your suggestion:

accept roles=None in the constructor (and create the default set in that case). This is not strictly validation but lowers the need for validation since then no-one ever needs to manually create the roles dict.

I tried implementing it, but then I realized that If I am given keys dictionary containing 4 keys I am not sure how to map keys to roles.

@coveralls
Copy link

coveralls commented Nov 9, 2021

Pull Request Test Coverage Report for Build 1472701943

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 97.463%

Totals Coverage Status
Change from base Build 1470751886: 0.0%
Covered Lines: 3948
Relevant Lines: 4032

💛 - Coveralls

@MVrachev
Copy link
Collaborator Author

MVrachev commented Nov 9, 2021

METPATHs and TARGETPATHs are not validated that are strings as per specification they can't be anything else but a string.

I am still wondering if we should create a sanity check that you can't use targets as a TARGETPATH...

@jku
Copy link
Member

jku commented Nov 9, 2021

@jku your suggestion:

accept roles=None in the constructor (and create the default set in that case). This is not strictly validation but lowers the need for validation since then no-one ever needs to manually create the roles dict.

I tried implementing it, but then I realized that If I am given keys dictionary containing 4 keys I am not sure how to map keys to roles.

sorry, I don't understand the connection here. Why do you need to map keys to roles in a constructor?

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

METPATHs and TARGETPATHs are not validated that are strings as per specification they can't be anything else but a string.

We're trying to validate that data we got is according to spec, and should not assume that things are strings because spec says so... that said, I too am unsure about type checks: I've been negative about them but am beginning to think we might want to do type checks for dictionary keys as those never get checked by other code (as most other types do). More on that in code comment.

I am still wondering if we should create a sanity check that you can't use targets as a TARGETPATH...

Sorry, I don't understand this.

tuf/api/metadata.py Outdated Show resolved Hide resolved
Comment on lines 1029 to 1030
if any(meta_path == "" for meta_path in meta):
raise ValueError("All meta paths must be non-empty strings")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you originally suggested checking that "meta_path is a non-empty string": now you check that "metapath is not an empty string". These are subtly different as the former includes type checking and the implemented check is so specific that I don't think it's worth two lines of code.

I'm still not sure how much type checking we actually should do... maybe we should validate the dictionary key types as that doesn't happen "implicitly" like it tends to happen for other data that is wrongly typed. If we don't do that we could deserialize (and then later serialize) data that is not actually valid metadata

@MVrachev
Copy link
Collaborator Author

MVrachev commented Nov 9, 2021

We're trying to validate that data we got is according to spec, and should not assume that things are strings because spec says so... that said, I too am unsure about type check

METPATHs and TARGETPATHs are not validated that are strings as per specification they can't be anything else but a string.

We're trying to validate that data we got is according to spec, and should not assume that things are strings because spec says so...

I was unclear with my statement. I meant the JSON format spec which doesn't allow non-string keys for dictionaries
and that's why I am not sure if there is a point in validating that TARGETPATHs and METAPATHs are strings.

@jku
Copy link
Member

jku commented Nov 10, 2021

I was unclear with my statement. I meant the JSON format spec which doesn't allow non-string keys for dictionaries and that's why I am not sure if there is a point in validating that TARGETPATHs and METAPATHs are strings.

Oh this is a really good point! we're supposedly format-independent so maybe shouldn't trust this ... but it's still a convincing argument

@MVrachev
Copy link
Collaborator Author

I think you are correct that if we want to be able to easily switch to a new format it's better if we do the type checks.
@jku should I go forward and add a commit about that?

@jku
Copy link
Member

jku commented Nov 16, 2021

I think you are correct that if we want to be able to easily switch to a new format it's better if we do the type checks. @jku should I go forward and add a commit about that?

Supporting a new format is not going to be easy whatever we do (with the way the spec is written)... so I'd rather avoid investing any lines of code in that right now.

Maybe just fix the bug with meta keys and make sure the check that prevents using toplevel filenames (except targets) is as simple as possible: Could build a private list/set of _invalid_snapshot_meta_keys at metadata.py import time (maybe a snapshot private class variable) and then use that list in snapshot init, for example.

Speaking of simplicity I've been thinking about this and I think checking empty string for meta keys (or even checking whether the string ends in ".json") might not be that useful? It does not guarantee the key is a real filename or even that it's a safe filename... In any case meta should only be used like this metainfo = meta[f"{role}.json"] in a client so things like empty string keys are not a problem. IMO checking for empty string meta key is not useful.

For target paths the empty string check could make sense, especially if this is mentioned in the spec... I can't find that reference though, is this really in the spec?

@MVrachev
Copy link
Collaborator Author

Maybe just fix the bug with meta keys

I haven't spent time on this issue for a while, but which bug do you mean?
I don't find where we discussed a bug?
Do you mean this one #1668 (comment)

and make sure the check that prevents using toplevel filenames (except targets) is as simple as possible: Could build a private list/set of _invalid_snapshot_meta_keys at metadata.py import time (maybe a snapshot private class variable) and then use that list in snapshot init, for example.

Will try it, good idea.

Speaking of simplicity I've been thinking about this and I think checking empty string for meta keys (or even checking whether the string ends in ".json") might not be that useful? It does not guarantee the key is a real filename or even that it's a safe filename... In any case meta should only be used like this metainfo = meta[f"{role}.json"] in a client so things like empty string keys are not a problem. IMO checking for empty string meta key is not useful.

I agree. At some point, we would want to open what's inside meta keys and if there is an empty string there will be an OSError.

For target paths the empty string check could make sense, especially if this is mentioned in the spec... I can't find that reference though, is this really in the spec?

I don't find it explicitly mentioned either.
Why do you think it will make sense to check for an empty string compared to the case with meta?

@jku
Copy link
Member

jku commented Nov 17, 2021

I don't find where we discussed a bug?

Current PR is, I believe, preventing meta keys from being toplevel rolenames. That is not relevant: meta keys being toplevel role filenames could be a problem.

That said, now that I look at this and #1558 together I'm wondering if this PR is worth it overall: meta keys are never meant to be iterated over, client always looks up data from meta with a key constructed with f"{role}.json" where the role comes from DelegatedRole. Maybe we should concentrate on validating DelegatedRole name: that seems to have most value as it will in practice guarantee valid meta keys as well.

I think I'm fine with closing this (or just doing the Dict->Mapping change) in favor of #1158. What do you think?

@MVrachev
Copy link
Collaborator Author

After a discussion with @jku we decided that:

  • we don't want to do validation for METAPATHs. We know that they are strings (JSON keys specification) and that's enough. That is the case because snapshot stores information about targets and delegated targets in its meta. The METAPATHs themselves are not used to open a file anywhere but are used to uniquely distinguish the different target and delegated target files. What actually is used is version, length, and hashes from snapshot meta.

  • DelegationRole names are important to be validated as a client only processes target files by:

  1. going through the delegated roles, finding a role it needs
  2. looking up version etc for that role in meta
  3. loading the file for that version and role

meaning DelegatedRole names are actually used to open files

  • TARGETPATHs are strings (JSON keys specification) and it seems it works to use "" as a valid TARGETPATH. The client will end up constructing a download URL that is basically just updater._target_base_url : weird but we don't see a real problem with that either. Basically, we agreed that we shouldn't absolutely need to special-case handling for this weird case out of many possible weird cases

Because of those reasons, I will drop the other commits which validate TARGETPATH and METAPATH and will leave only the first one.

@MVrachev MVrachev changed the title Metadata API: add sanity checks for TARGETPATH and METAPATH Metadata API: make Root roles a Mapping Nov 17, 2021
@MVrachev MVrachev force-pushed the validate-rest-of-metadata-api branch from 8b21020 to 29da5da Compare November 17, 2021 16:45
Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for going through the steps so we could figure out what we actually wanted here

@jku jku merged commit 747ec19 into theupdateframework:develop Nov 22, 2021
@MVrachev MVrachev deleted the validate-rest-of-metadata-api branch November 25, 2021 11:26
@lukpueh lukpueh mentioned this pull request Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New metadata API: Validate dictionary keys as well as values
3 participants