Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update targets delegations in generate_targets_metadata #1074

Merged
merged 6 commits into from Aug 26, 2020

Conversation

sechkova
Copy link
Contributor

@sechkova sechkova commented Jul 9, 2020

Fixes issue #: #1037

Description of the changes being introduced by the pull request:

  • Collect keys and threshold of delegated roles and update delegations in generate_targets_metadata() in a similar manner as generate_root_metadata() does for top-level roles.
  • Move key dictionary generation from generate_root_metadata() to a helper function _add_keys_to_keydict()
  • Update tests

Fixing this issue was partially blocked by #574: after loading the repository the delegated roles' 'keyids' and 'threshold' are lost which leads to throwing "missing key" exceptions while trying to update the delegations.

I suggest a short-term solution of #574 by using the recently added logic of hierarchically loading the delegations in load_reposiroty(). This makes it possible to get the delegated roles keyids and threshold from its delegating role metadata.
Clearly this won't work if more than one delegation to the same role exists but ... this is not supported correctly anyway in the current implementation so lets cover the simplest use case here?

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @sechkova! Your approach seems sound, but I have a couple of questions in-line about the implementation.

storage_backend, repository_name)

# Update roledb with the latest delegations info
if delegations_update:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the reason we're having generate_targets_metadata() return a boolean value and testing it here is because update_roleinfo() is non-performant due to its use of deepcopy()? What is the impact if we just update_roleinfo() without this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea is to avoid calling update_roleinfo() for each delegation on every write, especially when their number can grow high if delegated hashed bins are used.

Anyway:

So if you are hinting towards simplifying the code at the cost of some negligible efficiency, you may have a point 🙂

In general, the overall overhead of the code in this PR seems to be in the range of 1-2 seconds (too small to actually calculate an exact delta without a dedicated test setup)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for crunching the numbers. I'd absolutely be in favour of the code simplification.

Comment on lines 1479 to 1482
# Do we have a duplicate?
if keyids.count(keyid) > 1:
raise securesystemslib.exceptions.Error('Same keyid listed twice:'
' ' + keyid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels odd. We do not handle this exception anywhere, is this likely to happen in practice?
Furthermore, this is all data we manage. If duplicates really are an issue, should we just prune them? What are we expecting the 'user' to do when they see this exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test case about it 😁
In practice add_verification_key has a check to avoid duplicates.
For the rest of your questions, I don't have a strong opinion but I agree this does not seem to be the right place of the check. Unfortunately, I cannot claim that I can track all possible data entry points for roledb ... while this is the only "exit" point. Maybe this is the historical reason for it.
I would gladly remove the check and its test but I don't have a great insight on what to put instead. Nothing?

for role in delegations['roles']:
rolename = role['name']
role_keyids = tuf.roledb.get_role_keyids(rolename, repository_name)
role_threshold= tuf.roledb.get_role_threshold(rolename, repository_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
role_threshold= tuf.roledb.get_role_threshold(rolename, repository_name)
role_threshold = tuf.roledb.get_role_threshold(rolename, repository_name)

role['threshold'] = role_threshold
delegation_update = True

_add_keys_to_keydict(keydict, role_keyids, repository_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be part of the if conditional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it has to be outside the if. The key dictionary has to contain delegations keys of all roles, updated or not.

Comment on lines 1404 to 1405
if delegation_update:
delegations['keys'] = keydict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any harm in always assigning keydict to delegations['keys']? So far as I can tell delegations['keys'] is only used when delegation_update == True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no obvious harm ...

if keyid not in keydict:

# This appears to be a new keyid. Generate the key for it.
if key['keytype'] in ['rsa', 'ed25519', 'ecdsa-sha2-nistp256']:
Copy link
Member

@jku jku Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a reason for hard coding the key types here: we could just let format_keyval_to_metadata() throw FormatError (or handle it and throw something better) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it :)
After a quick search, format_keyval_to_metadata() is used at a couple of places in tuf the way you described it (just letting it throw FormatError) so I agree applying the same here.

It seems also worth trying to merge the functionality of repository_tool._keys_to_keydict() and repository_lib._add_keys_to_keydict() which seem suspiciously similar 👀

@sechkova sechkova force-pushed the delegations-update branch 2 times, most recently from 50817f5 to 7c474bb Compare August 5, 2020 09:52
@sechkova
Copy link
Contributor Author

sechkova commented Aug 5, 2020

Some time has passed since the last updates on this pull request so I had to rebase and force push. However I think the reformatted set of changes is addressing all the comments up to now.

@joshuagl
Copy link
Member

joshuagl commented Aug 7, 2020

Some time has passed since the last updates on this pull request so I had to rebase and force push. However I think the reformatted set of changes is addressing all the comments up to now.

Thank you for the updates. They look good on a first pass, I'm aiming to do a more thorough review next week.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have a few minor comments, mostly on docstring, but otherwise this is looking good.

Comment on lines 2232 to 2272
"""
Iterate over a list of keys and return a list of keyids and a dict mapping
keyid to key metadata
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have promoted this to a public method (implicitly, by removing the leading _) we should improve the docstring so that it's easier for callers to make use of. Docstrings for public methods should include <Arguments> and <Returns> sections, at least. The Arguments section for this method should help a reader understand the types and any constraints of the objects in the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1acd07c

Comment on lines 1379 to 1381

delegation_update:
Boolean set to True if delegations are updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this value is no longer returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops ...
Removed in a18beb2

Comment on lines 1385 to 1399
if delegations is not None:
if delegations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand this change?

if delegations is not None ensures we only enter this loop if the delegations argument is changed from its default value of None. As I understand it this is the standard pattern for checking whether an argument with a default value of None has been changed.

Copy link
Contributor Author

@sechkova sechkova Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad ... I didn't think it through ... 🤦
Reverted in a18beb2

delegations = targets_signable['signed']['delegations']

roleinfo = {}
roleinfo['name'] = 'role1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it's probably better read the role name from the loaded data in delegations, as we are for keyids and threshold, in case the generated data changes.

Copy link
Contributor Author

@sechkova sechkova Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done in f1a0fb1

@sechkova
Copy link
Contributor Author

Someone with enough permissions to restart travis-ci? Seems like another timeout 👀

@jku
Copy link
Member

jku commented Aug 17, 2020

Bad news is #1096 was merged and this now has conflicts. Good news is the proxy tests should be more reliable once you rebase :)

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @sechkova, this looks much nicer.

I have made a few minor style suggestions and this PR needs rebasing on the most recent changes in develop, but otherwise this looks good to me.

Comment on lines 1403 to 1406
role['keyids'] = \
tuf.roledb.get_role_keyids(role['name'], repository_name)
role['threshold']= \
tuf.roledb.get_role_threshold(role['name'], repository_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coding style suggests:

The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better.

Which suits my personal preference too.

Suggested change
role['keyids'] = \
tuf.roledb.get_role_keyids(role['name'], repository_name)
role['threshold']= \
tuf.roledb.get_role_threshold(role['name'], repository_name)
role['keyids'] = tuf.roledb.get_role_keyids(role['name'],
repository_name)
role['threshold']= tuf.roledb.get_role_threshold(role['name'],
repository_name)

if __name__ == '__main__':
# The interactive sessions of the documentation strings can
# be tested by running repository_tool.py as a standalone module:
# be tested by running repository_lib.py as a standalone module:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 387 to 390
targets_metadata = \
repo_lib.generate_targets_metadata(targets_directory, target_files,
version, expiration_date, delegations,
False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise you've just copied other examples in this file, but it's more inline with the coding style to do something like:

Suggested change
targets_metadata = \
repo_lib.generate_targets_metadata(targets_directory, target_files,
version, expiration_date, delegations,
False)
targets_metadata = repo_lib.generate_targets_metadata(targets_directory,
target_files, version, expiration_date, delegations, False)

@sechkova
Copy link
Contributor Author

sechkova commented Aug 19, 2020

Thanks @joshuagl, applied your code style suggestions and rebased. Let's see the tests :)

Edit: Two of the tests fail due to reasons unknown to me ... would need another look ... seems related to secure-systems-lab/securesystemslib#266

@jku
Copy link
Member

jku commented Aug 21, 2020

lint failure is #1115

Use _keys_to_keydict() for the  key dictionary generation in
generate_root_metadata().
Rename it as a public function keys_to_keydict().

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Collect keys and threshold of delegated roles and update
delegations in generate_targets_metadata in a similar manner
as generate_root_metadata() does for top-level roles.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Use the delegation graph traversal during load_repository()
to load delegated roles' 'keyids' and 'threshold' by reading it
from the delegating role metadata.

If more than one delegation to the same role exists, only the first
one is loaded in roledb for this role.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Tests logic is modified to accommodate for the update of the
delegations during generate_targets_metadata().

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Add a new test case in test_generate_targets_metadata
to check if targets metadata is up-to-date with its delegated
roles.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
@sechkova
Copy link
Contributor Author

Rebased on develop with the linter failure fix.

Comment on lines -1245 to +1244
for keyid in tuf.roledb.get_role_keyids(rolename, repository_name):
# Collect keys from all roles in a list
keyids = tuf.roledb.get_role_keyids(rolename, repository_name)
for keyid in keyids:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this change to assigning the return value of roledb.get_role_keyids seems superfluous, is keyids used anywhere other than the iterator here? Why not just keep for keyid in tuf.roledb....?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is used in https://github.com/theupdateframework/tuf/pull/1074/files#diff-9f585d5478a17bd9e0f6cd5f7484ab60L1284

It replaces the keyids list that was used to avoid duplicates which we already decided not necessary somewhere in the discussion above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes I see. Thanks!

@joshuagl joshuagl merged commit 7b7a14a into theupdateframework:develop Aug 26, 2020
@joshuagl
Copy link
Member

Good work, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants