Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ngclient: review preorder dfs code #1463

Merged
merged 8 commits into from Aug 10, 2021

Conversation

sechkova
Copy link
Contributor

Fixes #1398
Description of the changes being introduced by the pull request:

  • Reviews and simplifies Updater._preorder_depth_first_walk code.
  • makes _visit_child_role a member of DelegatedRole

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@sechkova
Copy link
Contributor Author

Opened as a draft not only because it is dependent on #1408 but also it is not well tested yet.

@jku
Copy link
Member

jku commented Jun 30, 2021

I've had a first look and the ideas look fine. Some smaller comments:

  • some issues in commit splitting (like visit_child_role() being annotated as returning bool and then returning something else)
  • Likewise terminating child roles are added to child_roles_to_visit twice, then duplication is removed in a "simplification" commit
  • IMO we should avoid doing e.g. pathpattern.lstrip(os.sep) -- these are the sort of workarounds that make life hard at some point... either the path pattern is valid or it isn't valid and we should just refuse to load the metadata (we don't have to support every weird case spec defines as valid). That said, leaving it as is and filing an issue is fine as well
  • more type annotations would be nice: Dict without contained types is not great, role_metadata in _preorder_depth_first_walk should be manually annotated as Targets: without this the whole function is a black box as far as static typing is concerned.

I'm suspicious about how this operates when target is not found:

  • old code does not seem to raise if delegation limit is not reached? I agree the old code return value seems nonsense but is raising really the right thing?
  • new code does not refer to delegation limit in the error message if it is reached

Also, is_in_trusted_paths() returns False if both paths and path_hash_prefixes are unset. Is this correct? I can see how that's the "safe" assumption but it also makes the whole idea of unset paths and unset path_hash_prefixes meaningless: the delegation has no reason to exist...

@trishankatdatadog
Copy link
Member

I really think it would be a lot simpler as a recursive function...

Comment on lines 345 to 347
self._load_targets(role_name, parent_role)

role_metadata = self._bundle[role_name].signed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implies we should make bundle (or trusted_set as it maybe now) return the loaded metadata to avoid a lookup immediately afterwards. Can you file an issue please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #1507

@sechkova
Copy link
Contributor Author

sechkova commented Jul 7, 2021

I really think it would be a lot simpler as a recursive function...

I agree, it does make the code look much simpler, let me test implementing those terminating delegations correctly with it :)

@sechkova
Copy link
Contributor Author

sechkova commented Jul 7, 2021

Also, is_in_trusted_paths() returns False if both paths and path_hash_prefixes are unset. Is this correct? I can see how that's the "safe" assumption but it also makes the whole idea of unset paths and unset path_hash_prefixes meaningless: the delegation has no reason to exist...

Both paths and path_hash_prefixes being None should be impossible since there is a check in the constructor which raises an exception but if you think it will be better/safer to duplicate it again here, I can add it.
https://github.com/theupdateframework/tuf/blob/24fa1128116429dc18e185047fbffb009ca509b0/tuf/api/metadata.py#L914

Forget what I wrote, the check is for both being set ...
Which leads me to the question, should the input validation include the check whether both are not set?

@sechkova sechkova force-pushed the review-preorder-dfs branch 2 times, most recently from ead4b19 to 8bd34de Compare July 8, 2021 15:27
@sechkova
Copy link
Contributor Author

sechkova commented Jul 9, 2021

Summarizing the changes made to this PR:

  • rebased on develop
  • improved the commits
  • added more type annotations
  • changed the function to return None when a targetfile is not found (instead of raising an exception)
  • added one possible recursive implementation of the same function e23e64b

Still there are issues pending and I am keeping this a draft to have a chance of reviewing it again myself.

  • File an issue so that self._load_targets(role_name, parent_role) -> TrustedMetadataSet returns the loaded metadata to avoid lookup
  • Avoid doing e.g. pathpattern.lstrip(os.sep) or file an issue about it
  • Revise return values and maybe throw exceptions in corner cases
  • The correct behaviour when both paths and path_hash_prefixes are unset is still unclear.
  • Using fnmatch does not do exactly what is promised in the documentation in terms of UNIX filename pattern matching

Copy link
Collaborator

@MVrachev MVrachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget what I wrote, the check is for both being set ...
Which leads me to the question, should the input validation include the check whether both are not set?

I think we should check for that.
I created an issue where we can discuss that #1497.

I like that it's much simpler and easier to read now and agree that the recursive version simplifies the code.

I think that before making this pr non-draft it's good to have tests for the ngclient updater.
I am a little worried about merging this before confirming we are not missing use-cases with this simplification.

self,
target_filepath: str,
visited_role_names: Set[str],
current_role_pair: List[Tuple[str, ...]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does ... mean in List[Tuple[str, ...]]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotations for both visited_role_names and current_role_pair are wrong

return targetinfo, terminated

# Pop the role name from the top of the stack.
role_name, parent_role = current_role_pair
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the type annotation of current_role_pair I see it's a List[Tuple[str, ...].
Don't you need to get an exact element from current_role_pair which will contain role_name and parent_role pair?

@jku
Copy link
Member

jku commented Jul 16, 2021

  • added one possible recursive implementation of the same function e23e64b

I have some reservations against recursion: when you use it, you then have to think about the call stack depth limitations and how that affects your code... In our case maximum number of delegations now depends on call stack depth.

Many hundreds of delegations is not a use case we should explicitly design for, I accept that. Still, I don't like setting an artificial limit when we don't event know what that limit is: stack depth is implementation specific, cpython uses 1000, but we also don't know how much of that we can use. The application developer might love recursion and may have used half the stack already, and maybe the network stack is special as well and needs a few hundred stack levels...

Basically I think expected call stack depth is larger than maximum reasonable number of delegations... but it's only maybe 5-10 times as large and that sounds uncomfortably close to me.

@trishankatdatadog
Copy link
Member

I have some reservations against recursion: when you use it, you then have to think about the call stack depth limitations and how that affects your code... In our case maximum number of delegations now depends on call stack depth.

As you said, I don't think any reasonable use case has that many delegations to worry about. Furthermore, it makes for more readable code, which is more important, I think.

@sechkova
Copy link
Contributor Author

Rebased on develop + some fixes of stupid errors in last commit.

@sechkova sechkova force-pushed the review-preorder-dfs branch 3 times, most recently from a3ff2b8 to b502434 Compare July 23, 2021 12:24
@sechkova
Copy link
Contributor Author

sechkova commented Jul 23, 2021

Thanks for the reviews and suggestions. I think it is time for this PR to stop being a draft and get a full review. I've intentionally kept the diffs in multiple smaller commits hoping that it may ease the review but you tell me how valuable it is for the history. I do mean to squash whatever feels redundant.

Some issues that were raised meanwhile:

@sechkova sechkova marked this pull request as ready for review July 23, 2021 13:34
@sechkova sechkova changed the title WIP: ngclient: review preorder dfs code ngclient: review preorder dfs code Jul 23, 2021
Copy link
Collaborator

@MVrachev MVrachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the changes seem reasonable and the history indeed is easier to follow when there are multiple smaller commits. Thanks for that @sechkova!

The only missing part in this pr is validation through more tests.
The _preorder_depth_first_walk is not an easy to understand function and we should
make sure we aren't introducing bugs.

We have a couple of tests using get_one_valid_targetinfo() in test_updater_ng.py, but
there are a couple of code paths not tested yet (according to coverage all):

  • delegation role with terminating option turn on
  • test when a target is not found
  • reaching the max limit of visiting delegations, but there are more delegations left

Additionally, we never test DelegatedRole.is_in_trusted_paths() when path_hash_prefixes is set and thus never actually test calculating the hashes.

Make _visit_child_role a public method of
DelegatedRole class. Reduce debug logging.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
@sechkova
Copy link
Contributor Author

Rebased on latest changes in develop.

@sechkova
Copy link
Contributor Author

All of the changes seem reasonable and the history indeed is easier to follow when there are multiple smaller commits. Thanks for that @sechkova!

The only missing part in this pr is validation through more tests.

No two ways about that but I don't know whether this PR is the right way given that this is a common issue for the whole Updater: #1499


role_metadata = self._trusted_set[role_name].signed
role_metadata: Targets = self._trusted_set[role_name].signed
target = role_metadata.targets.get(target_filepath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
target = role_metadata.targets.get(target_filepath)
target: Union[TargetFile, None] = role_metadata.targets.get(target_filepath)

According to the metadata API if role_metadata is an instance of Targets, then
role_metadata.targets.get(target_filepath) should be a TargetFile if it exists.
I think it make sense to add annotation here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as reply to Martins comment:) since role_metadata is now a Targets, this does not need an annotation for target: static type check already figures out that target is a TargetFile | None

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a major improvement, much easier to follow now. _preorder_depth_first_walk() could maybe still be simplified a bit but this is so much better that I say let's get it in and do smaller tweaks afterwards (so the testing work can start).

A good set of test cases would be nice but I think we can leave that to #1499 -- the testing needs some serious thinking as well to prevent a mess of testing spagetti so makes sense to do in its own PR

DelegatedRole.is_in_trusted_paths() seems like a good idea. I tried to think of a better name but ... can't come up with anything, I guess this is fine.

the paths/path_hash_prefixes resolution seems reasonable.

I would remove both the recursive implementation commit and the revert if it's not a direction that was chosen.

I Also left some comments in code but overall this LGTM: I'm approving, but please have a look at whether you want to act on the comments.

def get_one_valid_targetinfo(self, target_path: str) -> Dict:
def get_one_valid_targetinfo(
self, target_path: str
) -> Union[Dict[str, Any], None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> Union[Dict[str, Any], None]:
) -> Optional[Dict[str, Any]]:

def _preorder_depth_first_walk(self, target_filepath) -> Dict:
def _preorder_depth_first_walk(
self, target_filepath: str
) -> Union[Dict[str, Any], None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> Union[Dict[str, Any], None]:
) -> Optional[Dict[str, Any]]:

these mean union is then likely unused import


role_metadata = self._trusted_set[role_name].signed
role_metadata: Targets = self._trusted_set[role_name].signed
target = role_metadata.targets.get(target_filepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as reply to Martins comment:) since role_metadata is now a Targets, this does not need an annotation for target: static type check already figures out that target is a TargetFile | None

Comment on lines 1048 to 1044
# Calculate the hash of the filepath to determine which bin
# to find the target. The client currently assumes the repository
# uses sslib's default algorithm to generate hashes.
digest_object = sslib_hash.digest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the comment about assumptions and specify sha256 as algorithm here (as per spec) , no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right ... it does say exactly sha256

Comment on lines 1001 to 1004
raise ValueError(
"At least one of the attributes 'paths' and"
"'path_hash_prefixes' must be set!"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really appreciate if we could squeeze the exceptions and comments to the minimum number of lines that are required to convey the message... in the long term this has a major effect on readability. The content in this one seems a bit misleading as well so maybe:

Suggested change
raise ValueError(
"At least one of the attributes 'paths' and"
"'path_hash_prefixes' must be set!"
)
raise ValueError("One of paths or path_hash_prefixes must be set")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to shorten the exception messages and also make them look consistently. It may have slightly changed the meaning though: 34f59c9

Comment on lines +174 to +175
Returns:
A targetinfo dictionary or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as long as we aim to get rid of the whole dictionary soon... the dict is just bad API.

The goal for this method would be to just return Optional[TargetFile] but I guess we can't do that yet since

  • updated_targets() for some reason takes multiple TargetFiles and needs targetpaths for each -- this will not look good if caller needs to provide the targetpaths ... but some API redesign might help
  • TargetFile does not have targetpath in it (yet at least)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we can change that with the progress of #1514

Comment on lines -245 to -246
"no hash or path prefix":
'{"keyids": ["keyid"], "name": "a", "terminating": true, "threshold": 3}',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't there now be a test that checks that deserialization fails with this input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one with invalid DelegatedRoles: 34f59c9

@sechkova
Copy link
Contributor Author

sechkova commented Aug 6, 2021

I Also left some comments in code but overall this LGTM: I'm approving, but please have a look at whether you want to act on the comments.

Thanks for the comments, let me try to address them before merging.

@sechkova sechkova force-pushed the review-preorder-dfs branch 2 times, most recently from d49fab1 to fbca198 Compare August 6, 2021 15:35
@sechkova
Copy link
Contributor Author

sechkova commented Aug 6, 2021

I think I addressed everything and removed the commits about the recursive implementation. Should be ready now.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely set of clean ups, nice work @sechkova.

I'd prefer a different name for is_in_trusted_paths(), added a comment in-line. Let me know what you think.

tuf/api/metadata.py Outdated Show resolved Hide resolved
Rename to DelegatedRole.is_delegated_path and
return a boolean flag instead of the role's name.

Minor comments and code style improvements.

Some code simplification.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Remove _get_filepath_hash and call sslib_hash.digest
directly instead.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Simplify the code in _preorder_depth_first_walk.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Reduce redundant logging and simplify code further.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Improve type annotations.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
The specification does not state clearly what is the
behaviour when none of delegation's "paths" and
"path_hash_prefixes" is set. See theupdateframework#1497.

Until this issue is clarified, copy current
Updater which raises an error in such case.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Give 'role_names' the more verbose description
'delegations_to_visit'.

Add some comments and docstrings.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
@jku jku merged commit 55c8a78 into theupdateframework:develop Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ngclient: review _preorder_depth_first_walk()
5 participants