-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve downloader logging behaviour and make SoftwareManager exit immediately on ^C #2049
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implements what we discussed, but I would need some time and more evidence to move in that direction.
- The new event loop of the
SoftwareManager
actor is more complex than I would expect. I mean by that "dealing with things unrelated to the main task". - Could this complexity moved to the place where it should be, i.e. the runtime?
- If an actor event loop is blindly aborted on
^C
without giving the actor a chance to finalize some critical section, what's the point of all these shutdown signals?
// IMO the best thing would be to make `run` take ownership of `self` | ||
// instead of borrowing, this way we could freely move fields out of the | ||
// struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree with that and this was the first design. There is no reason to run
an actor twice.
This has been changed by #1878, where the need was to have the actors object safe.
However, this PR and the idea to run the main actor event loop in the background ready to be aborted on shutdown make me wonder if this can be generalized .i.e. to let the runtime manage shutdown this way for all actors. In that case passing a &mut self
to the run
method then to a shutdown
method might help.
// If the function processing a single SoftwareRequest were | ||
// cancel-aware, we could wait for it to return an error response | ||
// via the channel. As of now, it wouldn't really change anything, | ||
// because most likely the source of the SoftwareRequest has already | ||
// terminated as well, so for now we just terminate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by cancel-aware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there was a mechanism for the task to know if it should cancel, e.g. because the actor needs to be shut down. This could be achieved by, as you have proposed below, using a cancellation token, or even perhaps a plain oneshot channel.
The tricky part may be to decide how deep do we want to pass the cancellation token. E.g. in the case of Software Manager Actor, we might want to pass the cancellation token to a function that handles a single SoftwareRequest
, so that it can send a proper response when it's cancelled.
However for an operation like download, on cancellation, what should we do with the partially downloaded file? One option is for the downloader to assume that the caller can't do anything with an aborted file, and delete it on abort. But maybe that's not the case and it'd be better for the caller to decide what to do with the file. If so, there is no need for the download to be cancel-aware, because there's no cleanup to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there was a mechanism for the task to know if it should cancel, e.g. because the actor needs to be shut down. This could be achieved by, as you have proposed below, using a cancellation token, or even perhaps a plain oneshot channel.
For a centralized shutdown using cancellation token might be more appropriate as the runtime will not have to maintain a channel per actor.
The tricky part may be to decide how deep do we want to pass the cancellation token. E.g. in the case of Software Manager Actor, we might want to pass the cancellation token to a function that handles a single
SoftwareRequest
, so that it can send a proper response when it's cancelled.
If all the actors are given each a cancellation token (when built or along the run()
method), then each actor can uses this token (or a clone) at different internal level. But I would keep things simple here with a whole software operation request cancellation.
However for an operation like download, on cancellation, what should we do with the partially downloaded file? One option is for the downloader to assume that the caller can't do anything with an aborted file, and delete it on abort. But maybe that's not the case and it'd be better for the caller to decide what to do with the file. If so, there is no need for the download to be cancel-aware, because there's no cleanup to run.
We can live with the second option till we have an improved cancellation story. In any case, both the downloader and the caller must be prepared that some garbage has been left by a previously cancelled request.
One option is to completely revise the shutdown mechanism. Instead of using |
Cancelling any subtasks spawned by the actors can be done under the current approach. The actor could simply create a oneshot channel, or even use a
I'm not sure what's the value of making message boxes run this cancel logic. That'd e.g. make it harder for an actor to ignore the shutdown request and process messages that have already been sent. I understand that we want to force the actors to do the right thing, and that's why we're making all these assumptions, but I wonder if we're not making this overly complex. |
Robot Results
Passed Tests
|
5dfc26c
to
aaba38e
Compare
Codecov Report
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed not as complex as we thought as first.
${OPERATION}= Install Software test-very-large-software,1.0,https://t493319102.eu-latest.cumulocity.com/inventory/binaries/28057693 | ||
|
||
# waiting for the download to start (so, for "Downloading: ...") to appear | ||
# in the log, but I have no clue how to do "wait until log contains ..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possibility is to use the Keyword "Wait Untill Keyword Succeeds", here one example:
Wait Until Log Contains
[Arguments] ${expected_message} ${timeout}
Wait Until Keyword Succeeds ${timeout} 2s
Run Keyword And Return Status Should Log Contain ${expected_message}
Should Log Contain
[Arguments] ${expected_message}
${logs} Get Logs
Log Many ${logs} level=INFO console=yes
Log Many ${logs} level=DEBUG console=yes
Should Contain ${logs} ${expected_message}
ping me we can try it together if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be better to write a keyword in python to facilitate this, as I want to avoid doing too much programming in RobotFramework language if we can help it...And I don't think this will be the last time we have this use-case.
So in summary, I can write a new assertion (which will be exposed as a RobotFramework keyword).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed some updates to the test to add a new keyword to check the presence of log entries (and dynamically wait for them). Just waiting for the CI to run to confirm, however I ran the test 50 times locally and no signs of flakiness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
SoftwareManager actor used in tedge-agent was made to exit immediately when it receives a shutdown request from the runtime. This was achieved by splitting up the message box and concurrently waiting for either completing the request or for Runtime shutdown request. Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
The backoff timing was tweaked in order to more consistently generate increasing backoff times. The time we wait to retry the request is also logged, along with the reason why the backoff was triggered. Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Reuben Miller <reuben.d.miller@gmail.com>
…ilable url Signed-off-by: Reuben Miller <reuben.d.miller@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm Approval
QA has thoroughly checked the feature and here are the results:
|
Proposed changes
This PR is a follow-up to #1966, improving on some things that came up during the partial download demo.
Types of changes
Paste Link to the issue
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments