client refactor: simplify mirrors loop #1352

sechkova · 2021-04-15T14:46:09Z

Description of the changes being introduced by the pull request:

This set of changes tries to simplify the code and make it easy to follow as a result it:

Removes mirror_*_download functions and moves the mirror's loop inside the Updater
Replaces file objects used as function arguments for metadata files, which are read in memory right after download, anyway. Temporary files are now handled in a single function.
Does small code cleanup

Please verify and check that the pull request fulfills the following
requirements:

The code follows the Code Style Guidelines
Tests have been added for the bug fix or new feature
Docs have been added for the bug fix or new feature

jku · 2021-04-16T07:17:05Z

One of the reasons for using a file object (a streaming object if you will) is that we have not needed to read the file in memory in all cases. With metadata that's a little moot as we do have to read it to verify and process... but with targets not loading the whole file is a very good idea. So I think the the file object commit is not good. This also means the updater_rework _check_hashes() is wrong.

jku · 2021-04-16T07:58:04Z

This is not really a review (sorry) but maybe background for future discussion:

Moving the download loop back to updater looks like the right move (in other words my original idea was bad): I can understand this code :) . However I'm not sure the idea of separate loops is better than what the current/old updater does. The download loop is not trivial (we've had multiple bugs with file objects etc) so is multiplying it going to help?

I think the underlying idea with your design is that

in current updater is very hard to follow how e.g. specific metadata gets verified (what actually gets checked before new root metadata is accepted)
this becomes easier to understand when the root download code is in its own function and not intermingled with e.g. snapshot download code

This is valid... but I think the cost of duplicating the download code is too high -- this is visible in root download where you already had to refactor that one to another function. I think having a generic metadata download function is fine we just have to figure out how to make it clear what is verified when: I'm hoping #1313 will help there -- my vision of what would happen is something like this:

root download loop starts going through versions N+1
- it calls generic metadata download loop that start downloading files from mirrors
  - for each candidate file, it tries to create a new metadata object: this verifies the internal state
  - and tries to insert the object into the current valid metadata bundle (aka local repository controller) as new root: this does most(?) of the required tuf spec checks in the root update part
  - if anything fails we continue the loop, otherwise our metadata bundle is now using the new root and we can break the loop

Another observation:

the Updater.verify*() functions are named confusingly: they create a Metadata object, then they do additional verification and return the object. It's not a verifying method, it's a factory method to create a Metadata object that meets some criteria

MVrachev · 2021-04-19T14:57:08Z

tuf/client_rework/updater_rework.py

+            if self._metadata["root"].keys(
+                "timestamp"
+            ) != intermediate_root.keys("timestamp"):
+                # FIXME: use abstract storage


Could we use TODO instead of FIXME?
I did a quick search in the tuf repo and found only 1 place in the code, besides this file, where we are using those.
It seems we aren't using FIXME in our code and if (hopefully) one day we decide to fix our TODOs this will be left out.

I don't really want to use abstract storage in the client (or make it a priority, even if Metadata supports it) but as this part of code is hopefully going to move to bundle anyway, I'm not looking at this sort of things too closely now

MVrachev · 2021-04-19T15:02:09Z

This is valid... but I think the cost of duplicating the download code is too high -- this is visible in root download where you already had to refactor that one to another function. I think having a generic metadata download function is fine we just have to figure out how to make it clear what is verified when: I'm hoping #1313 will help there -- my vision of what would happen is something like this:

I was going to write the same thing and then I saw your comment.
I also feel like there is too much repetition and I hope we find another way to do this.
Will follow your Metadata bundle suggestion.

sechkova · 2021-04-20T11:16:20Z

Agree that now when mirror download code is moved back to Updater the code is more readable but also makes the repetition obvious. However replacing it with a single function with the current state of the Updater code does not result in the desired clean implementation. I propose keeping this changes as a base to test the work proposed in #1313, see how it fits in and at the same time reduce the code repetition.
I force pushed, fixed a bug that @jku noticed in the mirrors loop and also split the _check_hashes function in two: one correctly dealing with file objects for target files download and one using file content read in memory for metadata files.

jku

So previous comment still stands

I think the cost of duplicating the download code is too high

... but we have a plan:

get this code into a readable form again (this PR) even if it keeps duplicating a lot of code
reduce support for mirrors (as it really is what generates all of this complexity, and no-one needs most of it) : Teodora is going to document that plan in Split Updater into logical components: redesign mirrors.py and download.py #1307
after that we will hopefully be able to simplify this code drastically
at this point also metadata bundle can be simplified experimental client: Add MetadataBundle #1355

After these steps we will be able to evaluate whether the branch has value or if we should just consider it lessons learned and start over...

So with this idea in mind I've reviewed again: I think I found one actual issue in target_download and then some comments...

jku · 2021-04-26T12:21:27Z

tuf/client_rework/updater_rework.py

+                if temp_obj:
+                    temp_obj.close()


this applies to every case with close(): current client does temp_obj = None here (inside the if-clause) and it makes sense I think: otherwise temp_obj might be defined on next iteration even if it should not be, leading to us calling close twice...

Calling close() twice should not break things so this probably does not fail anything but it looks fishy

In fact now that you've made the metadata handling bytes-based (and not file object), all of the metadata download loops could just use a download method that returns bytes -- but it does not need to happen in this PR. Maybe adding temp_obj = None is more straight forward for now?

You're right, thanks for noticing. I added temp_obj = None for now, since changing the download function would mean a difference between metadata and targets. I'll keep this in mind.

jku · 2021-04-26T12:22:36Z

tuf/client_rework/updater_rework.py

+            except Exception as exception:
+                if temp_obj:
+                    temp_obj.close()
+                raise exceptions.NoWorkingMirrorError({file_mirror: exception})


this should store the exception like the other functions, no?

Yes, fixed, thanks

tuf/client_rework/updater_rework.py

jku · 2021-04-26T12:49:26Z

tuf/client_rework/updater_rework.py

+
+        # Raise an exception if any of the hashes are incorrect.
+        if trusted_hash != computed_hash:
+            raise sslib_exceptions.BadHashError(trusted_hash, computed_hash)


we have a BadHashError of our own and should not raise sslib errors

sechkova · 2021-04-27T11:18:32Z

Oops my "fixup" commit is not signed off ... but I apparently want to rebase and squash this one after your review.
On the other hand, do I add the linter fix by running the latest black version in this PR?

jku · 2021-04-28T07:11:21Z

Looks good enough for me -- let's get this merged so we can see if streamlining mirrors helps...

I'll merge develop into experimental-client now (#1365): After that if you'd like to rebase this on experimental-client (to get the linter fix) and also rebase --autosquash (to squash the fixup commit), that works for me.

mirror_target_download and mirror_meta_download functions that return iterators are causing more troubles than good. Since there is a need to download and verify each file inside the mirrors loop (before continuing to the next mirror), feels like the correct component to do this is the Updater itself. This means slightly repetitive code inside the Updater class but with the benefit of better readability and less bug-prone implementation. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Current implementation does not bring much benefit and can be replaced with a simple f-string. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

By replacing file objects passed as function arguments with the read file content, we simplify temporary file objects life cycle management. Temporary files are handled in a single function. This is done for metadata files, which are fully read into memory right after download, anyway. Same is not true for target files which preferably should be treated in chunks so targets download and verification still deal with file objects. _check_hashes is split in two functions, one dealing correctly with file objects and one using directly file content. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Replace sslib exceptions raised in Updater with the corresponding ones defined in tuf. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

sechkova · 2021-04-28T08:42:03Z

Thanks, it should be ready now.

sechkova added the experimental-client Items related to the development of a new client (see milestone/8 and theexperimental-client branch) label Apr 15, 2021

sechkova added this to the Experimental client milestone Apr 15, 2021

MVrachev reviewed Apr 19, 2021

View reviewed changes

sechkova force-pushed the mirrors-loop branch 3 times, most recently from 6561f7a to 660eb25 Compare April 20, 2021 11:06

sechkova changed the title ~~WIP: client refactor: simplify mirrors loop~~ client refactor: simplify mirrors loop Apr 20, 2021

sechkova marked this pull request as ready for review April 20, 2021 11:54

This was referenced Apr 20, 2021

WIP: Mirrors redesign #1331

Closed

experimental client: download functions review #1339

Closed

jku reviewed Apr 26, 2021

View reviewed changes

sechkova force-pushed the mirrors-loop branch from 660eb25 to a3f63c9 Compare April 27, 2021 11:11

jku approved these changes Apr 28, 2021

View reviewed changes

sechkova added 4 commits April 28, 2021 11:39

Remove _get_relative_meta_name

482ea1c

Current implementation does not bring much benefit and can be replaced with a simple f-string. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

Replace sslib exceptions

dd11098

Replace sslib exceptions raised in Updater with the corresponding ones defined in tuf. Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>

sechkova force-pushed the mirrors-loop branch from a3f63c9 to dd11098 Compare April 28, 2021 08:40

jku merged commit 79aa42a into theupdateframework:experimental-client Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client refactor: simplify mirrors loop #1352

client refactor: simplify mirrors loop #1352

sechkova commented Apr 15, 2021 •

edited

Loading

jku commented Apr 16, 2021 •

edited

Loading

jku commented Apr 16, 2021 •

edited

Loading

MVrachev Apr 19, 2021

jku Apr 28, 2021 •

edited

Loading

MVrachev commented Apr 19, 2021

sechkova commented Apr 20, 2021

jku left a comment

jku Apr 26, 2021 •

edited

Loading

sechkova Apr 27, 2021

jku Apr 26, 2021

sechkova Apr 27, 2021

jku Apr 26, 2021

sechkova commented Apr 27, 2021

jku commented Apr 28, 2021

sechkova commented Apr 28, 2021

client refactor: simplify mirrors loop #1352

client refactor: simplify mirrors loop #1352

Conversation

sechkova commented Apr 15, 2021 • edited Loading

jku commented Apr 16, 2021 • edited Loading

jku commented Apr 16, 2021 • edited Loading

MVrachev Apr 19, 2021

Choose a reason for hiding this comment

jku Apr 28, 2021 • edited Loading

Choose a reason for hiding this comment

MVrachev commented Apr 19, 2021

sechkova commented Apr 20, 2021

jku left a comment

Choose a reason for hiding this comment

jku Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

sechkova Apr 27, 2021

Choose a reason for hiding this comment

jku Apr 26, 2021

Choose a reason for hiding this comment

sechkova Apr 27, 2021

Choose a reason for hiding this comment

jku Apr 26, 2021

Choose a reason for hiding this comment

sechkova commented Apr 27, 2021

jku commented Apr 28, 2021

sechkova commented Apr 28, 2021

sechkova commented Apr 15, 2021 •

edited

Loading

jku commented Apr 16, 2021 •

edited

Loading

jku commented Apr 16, 2021 •

edited

Loading

jku Apr 28, 2021 •

edited

Loading

jku Apr 26, 2021 •

edited

Loading