Improve downloader logging behaviour and make SoftwareManager exit immediately on ^C #2049

Bravo555 · 2023-06-26T16:49:42Z

Proposed changes

This PR is a follow-up to #1966, improving on some things that came up during the partial download demo.

added more logging statements to keep track of what the downloader is doing
fixed mistakenly lowered backoff delay and added a method to customize the backoff for testing
tweaked error handling in plugin_sm to print error causes
made SoftwareManager immediately process shutdown requests, even if an operation is in progress.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

didier-wenzek

This implements what we discussed, but I would need some time and more evidence to move in that direction.

The new event loop of the SoftwareManager actor is more complex than I would expect. I mean by that "dealing with things unrelated to the main task".
Could this complexity moved to the place where it should be, i.e. the runtime?
If an actor event loop is blindly aborted on ^C without giving the actor a chance to finalize some critical section, what's the point of all these shutdown signals?

didier-wenzek · 2023-06-27T07:55:00Z

crates/core/tedge_agent/src/software_manager/actor.rs

+    // IMO the best thing would be to make `run` take ownership of `self`
+    // instead of borrowing, this way we could freely move fields out of the
+    // struct


I fully agree with that and this was the first design. There is no reason to run an actor twice.

This has been changed by #1878, where the need was to have the actors object safe.

However, this PR and the idea to run the main actor event loop in the background ready to be aborted on shutdown make me wonder if this can be generalized .i.e. to let the runtime manage shutdown this way for all actors. In that case passing a &mut self to the run method then to a shutdown method might help.

didier-wenzek · 2023-06-27T07:56:33Z

crates/core/tedge_agent/src/software_manager/actor.rs

+            // If the function processing a single SoftwareRequest were
+            // cancel-aware, we could wait for it to return an error response
+            // via the channel. As of now, it wouldn't really change anything,
+            // because most likely the source of the SoftwareRequest has already
+            // terminated as well, so for now we just terminate.


What do you mean by cancel-aware?

If there was a mechanism for the task to know if it should cancel, e.g. because the actor needs to be shut down. This could be achieved by, as you have proposed below, using a cancellation token, or even perhaps a plain oneshot channel.

The tricky part may be to decide how deep do we want to pass the cancellation token. E.g. in the case of Software Manager Actor, we might want to pass the cancellation token to a function that handles a single SoftwareRequest, so that it can send a proper response when it's cancelled.

However for an operation like download, on cancellation, what should we do with the partially downloaded file? One option is for the downloader to assume that the caller can't do anything with an aborted file, and delete it on abort. But maybe that's not the case and it'd be better for the caller to decide what to do with the file. If so, there is no need for the download to be cancel-aware, because there's no cleanup to run.

If there was a mechanism for the task to know if it should cancel, e.g. because the actor needs to be shut down. This could be achieved by, as you have proposed below, using a cancellation token, or even perhaps a plain oneshot channel.

For a centralized shutdown using cancellation token might be more appropriate as the runtime will not have to maintain a channel per actor.

The tricky part may be to decide how deep do we want to pass the cancellation token. E.g. in the case of Software Manager Actor, we might want to pass the cancellation token to a function that handles a single SoftwareRequest, so that it can send a proper response when it's cancelled.

If all the actors are given each a cancellation token (when built or along the run() method), then each actor can uses this token (or a clone) at different internal level. But I would keep things simple here with a whole software operation request cancellation.

However for an operation like download, on cancellation, what should we do with the partially downloaded file? One option is for the downloader to assume that the caller can't do anything with an aborted file, and delete it on abort. But maybe that's not the case and it'd be better for the caller to decide what to do with the file. If so, there is no need for the download to be cancel-aware, because there's no cleanup to run.

We can live with the second option till we have an improved cancellation story. In any case, both the downloader and the caller must be prepared that some garbage has been left by a previously cancelled request.

didier-wenzek · 2023-06-27T11:09:10Z

One option is to completely revise the shutdown mechanism.

Instead of using Shutdown messages and Receivers owned by the actors, an alternative is to use CancellationToken. These tokens can be freely cloned they can be checked both by the message boxes (to get the next messages unless a shutdown is requested) and the body of the event loops (to interrupt any long-running task on shutdown). This would also help to solve an issue we have with ClientMessageBox that ignores shutdown requests while awaiting a response.

Bravo555 · 2023-06-28T12:07:43Z

Instead of using Shutdown messages and Receivers owned by the actors, an alternative is to use CancellationToken.

Cancelling any subtasks spawned by the actors can be done under the current approach. The actor could simply create a oneshot channel, or even use a CancellationToken itself, and pass the receiver into the spawned task. Then if it were to receive a Shutdown message, it could send a value through the channel closing it/cancel the token, and await on the sub-task to exit, perhaps with some timeout. But as I understand you seek a more "one size fits all" approach, to abstract this behaviour in a way that does not require us to explicitly define all this in each actor individually?

These tokens can be freely cloned they can be checked both by the message boxes (to get the next messages unless a shutdown is requested)

I'm not sure what's the value of making message boxes run this cancel logic. That'd e.g. make it harder for an actor to ignore the shutdown request and process messages that have already been sent. I understand that we want to force the actors to do the right thing, and that's why we're making all these assumptions, but I wonder if we're not making this overly complex.

github-actions · 2023-06-28T12:35:54Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %
239	0	5	239	100

Passed Tests

Name	⏱️ Duration	Suite
Define Child device 1 ID	0.014 s	`C8Y Child Alarms Rpi`
Normal case when the child device does not exist on c8y cloud	2.292 s	`C8Y Child Alarms Rpi`
Normal case when the child device already exists	0.8 s	`C8Y Child Alarms Rpi`
Reconciliation when the new alarm message arrives, restart the mapper	1.209 s	`C8Y Child Alarms Rpi`
Reconciliation when the alarm that is cleared	65.363 s	`C8Y Child Alarms Rpi`
Prerequisite Parent	13.984 s	`Child Conf Mgmt Plugin`
Prerequisite Child	0.161 s	`Child Conf Mgmt Plugin`
Child device bootstrapping	13.863 s	`Child Conf Mgmt Plugin`
Snapshot from device	62.264 s	`Child Conf Mgmt Plugin`
Child device config update	61.605 s	`Child Conf Mgmt Plugin`
Configuration types should be detected on file change (without restarting service)	49.207 s	`Inotify Crate`
Check lock file existence in default folder	1.6320000000000001 s	`Lock File`
Check PID number in lock file	1.374 s	`Lock File`
Check PID number in lock file after restarting the services	1.3780000000000001 s	`Lock File`
Check starting same service twice	1.135 s	`Lock File`
Switch off lock file creation	2.624 s	`Lock File`
Set configuration when file exists	6.845 s	`Configuration Operation`
Set configuration when file does not exist	6.086 s	`Configuration Operation`
Set configuration with broken url	5.258 s	`Configuration Operation`
Get configuration	5.138 s	`Configuration Operation`
Get non existent configuration file	4.848 s	`Configuration Operation`
Get non existent configuration type	4.757 s	`Configuration Operation`
Update configuration plugin config via cloud	4.75 s	`Configuration Operation`
Modify configuration plugin config via local filesystem modify inplace	4.609 s	`Configuration Operation`
Modify configuration plugin config via local filesystem overwrite	3.3970000000000002 s	`Configuration Operation`
Update configuration plugin config via local filesystem copy	3.141 s	`Configuration Operation`
Update configuration plugin config via local filesystem move (different directory)	3.255 s	`Configuration Operation`
Update configuration plugin config via local filesystem move (same directory)	2.735 s	`Configuration Operation`
Update the custom operation dynamically	52.75 s	`Dynamically Reload Operation`
Custom operation successful	123.892 s	`Custom Operation`
Custom operation fails	125.949 s	`Custom Operation`
Successful firmware operation	62.425 s	`Firmware Operation`
Install with empty firmware name	39.563 s	`Firmware Operation`
Prerequisite Parent	15.794 s	`Firmware Operation Child Device`
Prerequisite Child	8.195 s	`Firmware Operation Child Device`
Child device firmware update	6.276 s	`Firmware Operation Child Device`
Child device firmware update with cache	5.997 s	`Firmware Operation Child Device`
Firmware plugin supports restart via service manager #1932	5.416 s	`Firmware Operation Child Device Retry`
Update Inventory data via inventory.json	1.734 s	`Inventory Update`
Inventory includes the agent fragment with version information	1.168 s	`Inventory Update`
Retrieve a JWT tokens	35.118 s	`Jwt Request`
Mapper recovers and processes output of ongoing software update request	16.065 s	`Recover And Publish Software Update Message`
Check running collectd	1.079 s	`Monitor Device Collectd`
Is collectd publishing MQTT messages?	2.907 s	`Monitor Device Collectd`
Check thin-edge monitoring	3.447 s	`Monitor Device Collectd`
Check grouping of measurements	8.872 s	`Monitor Device Collectd`
Main device registration	1.808 s	`Device Registration`
Child device registration	2.073 s	`Device Registration`
Supports restarting the device	49.258 s	`Restart Device`
Update tedge version from previous using Cumulocity	69.948 s	`Tedge Self Update`
Test if all c8y services are up	95.922 s	`Service Monitoring`
Test if all c8y services are down	107.449 s	`Service Monitoring`
Test if all c8y services are using configured service type	167.718 s	`Service Monitoring`
Test if all c8y services using default service type when service type configured as empty	147.06 s	`Service Monitoring`
Check health status of tedge-mapper-c8y service on broker stop start	21.77 s	`Service Monitoring`
Check health status of tedge-mapper-c8y service on broker restart	83.367 s	`Service Monitoring`
Check health status of child device service	18.333 s	`Service Monitoring`
Successful shell command with output	3.868 s	`Shell Operation`
Check Successful shell command with literal double quotes output	3.298 s	`Shell Operation`
Execute multiline shell command	3.333 s	`Shell Operation`
Failed shell command	3.291 s	`Shell Operation`
Software list should be populated during startup	32.416 s	`Software`
Install software via Cumulocity	39.667 s	`Software`
tedge-agent should terminate on SIGINT while downloading file	33.878 s	`Software`
Software list should only show currently installed software and not candidates	32.697 s	`Software`
Create and publish the tedge agent supported operations on mapper restart	35.514 s	`Mapper-Publishing-Agent-Supported-Ops`
Agent gets the software list request once it comes up	25.457 s	`Mapper-Publishing-Agent-Supported-Ops`
Child devices support sending simple measurements	1.883 s	`Child Device Telemetry`
Child devices support sending custom measurements	1.191 s	`Child Device Telemetry`
Child devices support sending custom events	1.263 s	`Child Device Telemetry`
Child devices support sending custom events overriding the type	1.226 s	`Child Device Telemetry`
Child devices support sending custom alarms #1699	3.2800000000000002 s	`Child Device Telemetry`
Child devices support sending inventory data via c8y topic	0.966 s	`Child Device Telemetry`
Child device supports sending custom child device measurements directly to c8y	1.529 s	`Child Device Telemetry`
Check retained alarms	98.868 s	`Raise Alarms`
Thin-edge devices support sending simple measurements	1.471 s	`Thin-Edge Device Telemetry`
Thin-edge devices support sending simple measurements with custom type	3.292 s	`Thin-Edge Device Telemetry`
Thin-edge devices support sending custom measurements	1.007 s	`Thin-Edge Device Telemetry`
Thin-edge devices support sending custom events	1.061 s	`Thin-Edge Device Telemetry`
Thin-edge devices support sending custom events overriding the type	1.303 s	`Thin-Edge Device Telemetry`
Thin-edge devices support sending custom alarms #1699	1.189 s	`Thin-Edge Device Telemetry`
Thin-edge device supports sending custom Thin-edge device measurements directly to c8y	1.476 s	`Thin-Edge Device Telemetry`
Thin-edge device support sending inventory data via c8y topic	1.367 s	`Thin-Edge Device Telemetry`
thin-edge components support a custom config-dir location via flags	26.176 s	`Config Dir`
Validate updated data path used by tedge-agent	0.343 s	`Data Path Config`
Validate updated data path used by c8y-firmware-plugin	10.702 s	`Data Path Config`
Validate updated data path used by tedge-agent	0.501 s	`Log Path Config`
Check existence of init directories	0.813 s	`Tedge Init`
Tedge init and check creation of folders	0.986 s	`Tedge Init`
Check ownership of the folders	0.943 s	`Tedge Init`
Change user/group and check the change	1.23 s	`Tedge Init`
Tedge init and check if default values are restored	1.254 s	`Tedge Init`
Install thin-edge via apt	36.976 s	`Install Apt`
Install latest via script (from current branch)	28.231 s	`Install Tedge`
Install specific version via script (from current branch)	22.339 s	`Install Tedge`
Install latest tedge via script (from main branch)	21.7 s	`Install Tedge`
Install then uninstall latest tedge via script (from main branch)	60.926 s	`Install Tedge`
Support starting and stopping services	34.522 s	`Service-Control`
Supports a reconnect	52.547 s	`Test-Commands`
Supports disconnect then connect	34.732 s	`Test-Commands`
Update unknown setting	36.42 s	`Test-Commands`
Update known setting	34.336 s	`Test-Commands`
It checks MQTT messages using a pattern	61.128 s	`Test-Mqtt`
Stop c8y-configuration-plugin	0.117 s	`Health C8Y-Configuration-Plugin`
Update the service file	0.157 s	`Health C8Y-Configuration-Plugin`
Reload systemd files	0.366 s	`Health C8Y-Configuration-Plugin`
Start c8y-configuration-plugin	0.086 s	`Health C8Y-Configuration-Plugin`
Start watchdog service	10.108 s	`Health C8Y-Configuration-Plugin`
Check PID of c8y-configuration-plugin	0.104 s	`Health C8Y-Configuration-Plugin`
Kill the PID	0.202 s	`Health C8Y-Configuration-Plugin`
Recheck PID of c8y-configuration-plugin	6.423 s	`Health C8Y-Configuration-Plugin`
Compare PID change	0.001 s	`Health C8Y-Configuration-Plugin`
Stop watchdog service	0.142 s	`Health C8Y-Configuration-Plugin`
Remove entry from service file	0.127 s	`Health C8Y-Configuration-Plugin`
Stop c8y-log-plugin	0.099 s	`Health C8Y-Log-Plugin`
Update the service file	0.118 s	`Health C8Y-Log-Plugin`
Reload systemd files	0.424 s	`Health C8Y-Log-Plugin`
Start c8y-log-plugin	0.299 s	`Health C8Y-Log-Plugin`
Start watchdog service	10.211 s	`Health C8Y-Log-Plugin`
Check PID of c8y-log-plugin	0.076 s	`Health C8Y-Log-Plugin`
Kill the PID	0.255 s	`Health C8Y-Log-Plugin`
Recheck PID of c8y-log-plugin	6.534 s	`Health C8Y-Log-Plugin`
Compare PID change	0.016 s	`Health C8Y-Log-Plugin`
Stop watchdog service	0.274 s	`Health C8Y-Log-Plugin`
Remove entry from service file	0.153 s	`Health C8Y-Log-Plugin`
Stop tedge-mapper	0.253 s	`Health Tedge Mapper C8Y`
Update the service file	0.112 s	`Health Tedge Mapper C8Y`
Reload systemd files	0.465 s	`Health Tedge Mapper C8Y`
Start tedge-mapper	0.153 s	`Health Tedge Mapper C8Y`
Start watchdog service	10.219 s	`Health Tedge Mapper C8Y`
Check PID of tedge-mapper	0.09 s	`Health Tedge Mapper C8Y`
Kill the PID	0.172 s	`Health Tedge Mapper C8Y`
Recheck PID of tedge-mapper	6.299 s	`Health Tedge Mapper C8Y`
Compare PID change	0.001 s	`Health Tedge Mapper C8Y`
Stop watchdog service	0.188 s	`Health Tedge Mapper C8Y`
Remove entry from service file	0.109 s	`Health Tedge Mapper C8Y`
Stop tedge-agent	0.077 s	`Health Tedge-Agent`
Update the service file	0.107 s	`Health Tedge-Agent`
Reload systemd files	0.277 s	`Health Tedge-Agent`
Start tedge-agent	0.087 s	`Health Tedge-Agent`
Start watchdog service	10.231 s	`Health Tedge-Agent`
Check PID of tedge-mapper	0.162 s	`Health Tedge-Agent`
Kill the PID	0.434 s	`Health Tedge-Agent`
Recheck PID of tedge-agent	6.5 s	`Health Tedge-Agent`
Compare PID change	0 s	`Health Tedge-Agent`
Stop watchdog service	0.132 s	`Health Tedge-Agent`
Remove entry from service file	0.111 s	`Health Tedge-Agent`
Stop tedge-mapper-az	0.151 s	`Health Tedge-Mapper-Az`
Update the service file	0.064 s	`Health Tedge-Mapper-Az`
Reload systemd files	0.321 s	`Health Tedge-Mapper-Az`
Start tedge-mapper-az	0.189 s	`Health Tedge-Mapper-Az`
Start watchdog service	10.248 s	`Health Tedge-Mapper-Az`
Check PID of tedge-mapper-az	0.113 s	`Health Tedge-Mapper-Az`
Kill the PID	0.172 s	`Health Tedge-Mapper-Az`
Recheck PID of tedge-agent	6.432 s	`Health Tedge-Mapper-Az`
Compare PID change	0.001 s	`Health Tedge-Mapper-Az`
Stop watchdog service	0.113 s	`Health Tedge-Mapper-Az`
Remove entry from service file	0.139 s	`Health Tedge-Mapper-Az`
Stop tedge-mapper-collectd	0.122 s	`Health Tedge-Mapper-Collectd`
Update the service file	0.133 s	`Health Tedge-Mapper-Collectd`
Reload systemd files	0.426 s	`Health Tedge-Mapper-Collectd`
Start tedge-mapper-collectd	0.099 s	`Health Tedge-Mapper-Collectd`
Start watchdog service	10.141 s	`Health Tedge-Mapper-Collectd`
Check PID of tedge-mapper-collectd	0.219 s	`Health Tedge-Mapper-Collectd`
Kill the PID	0.25 s	`Health Tedge-Mapper-Collectd`
Recheck PID of tedge-mapper-collectd	6.646 s	`Health Tedge-Mapper-Collectd`
Compare PID change	0.001 s	`Health Tedge-Mapper-Collectd`
Stop watchdog service	0.111 s	`Health Tedge-Mapper-Collectd`
Remove entry from service file	0.108 s	`Health Tedge-Mapper-Collectd`
tedge-collectd-mapper health status	5.651 s	`Health Tedge-Mapper-Collectd`
c8y-log-plugin health status	6.052 s	`MQTT health endpoints`
c8y-configuration-plugin health status	5.6530000000000005 s	`MQTT health endpoints`
Publish on a local insecure broker	0.39 s	`Basic Pub Sub`
Publish on a local secure broker	3.087 s	`Basic Pub Sub`
Publish on a local secure broker with client authentication	2.6959999999999997 s	`Basic Pub Sub`
Publish events to subscribed topic	0.53 s	`Custom Sub Topics Tedge-Mapper-Aws`
Publish measurements to unsubscribed topic	5.37 s	`Custom Sub Topics Tedge-Mapper-Aws`
Publish measurements to subscribed topic	0.315 s	`Custom Sub Topics Tedge-Mapper-Az`
Publish measurements to unsubscribed topic	5.375 s	`Custom Sub Topics Tedge-Mapper-Az`
Publish events to subscribed topic	0.331 s	`Custom Sub Topics Tedge-Mapper-C8Y`
Publish measurements to unsubscribed topic	5.269 s	`Custom Sub Topics Tedge-Mapper-C8Y`
Check remote mqtt broker #1773	2.998 s	`Remote Mqtt Broker`
Apply name filter	0.173 s	`Filter Packages List Output`
Apply maintainer filter	0.157 s	`Filter Packages List Output`
Apply both filters	0.13 s	`Filter Packages List Output`
No filters	0.13 s	`Filter Packages List Output`
Both filters but name filter as empty string	0.142 s	`Filter Packages List Output`
Both filters but maintainer filter as empty string	0.158 s	`Filter Packages List Output`
Both filters as empty string	0.108 s	`Filter Packages List Output`
Wrong package name	0.121 s	`Improve Tedge Apt Plugin Error Messages`
Wrong version	0.191 s	`Improve Tedge Apt Plugin Error Messages`
Wrong type	0.274 s	`Improve Tedge Apt Plugin Error Messages`
tedge_connect_test_positive	0.173 s	`Tedge Connect Test`
tedge_connect_test_negative	60.371 s	`Tedge Connect Test`
tedge_connect_test_sm_services	6.71 s	`Tedge Connect Test`
tedge_disconnect_test_sm_services	60.278 s	`Tedge Connect Test`
Install thin-edge.io	21.483 s	`Call Tedge`
call tedge -V	0.071 s	`Call Tedge`
call tedge -h	0.08 s	`Call Tedge`
call tedge -h -V	0.107 s	`Call Tedge`
call tedge help	0.084 s	`Call Tedge`
tedge config list	0.061 s	`Call Tedge Config List`
tedge config list --all	0.076 s	`Call Tedge Config List`
set/unset device.type	0.299 s	`Call Tedge Config List`
set/unset device.key_path	0.299 s	`Call Tedge Config List`
set/unset device.cert_path	0.339 s	`Call Tedge Config List`
set/unset c8y.root_cert_path	0.429 s	`Call Tedge Config List`
set/unset c8y.smartrest.templates	0.433 s	`Call Tedge Config List`
set/unset c8y.topics	0.359 s	`Call Tedge Config List`
set/unset az.root_cert_path	0.557 s	`Call Tedge Config List`
set/unset az.topics	0.811 s	`Call Tedge Config List`
set/unset aws.topics	0.455 s	`Call Tedge Config List`
set/unset aws.url	0.311 s	`Call Tedge Config List`
set/unset aws.root_cert_path	0.278 s	`Call Tedge Config List`
set/unset aws.mapper.timestamp	0.311 s	`Call Tedge Config List`
set/unset az.mapper.timestamp	0.229 s	`Call Tedge Config List`
set/unset mqtt.bind.address	0.229 s	`Call Tedge Config List`
set/unset mqtt.bind.port	0.242 s	`Call Tedge Config List`
set/unset http.bind.port	0.312 s	`Call Tedge Config List`
set/unset tmp.path	0.405 s	`Call Tedge Config List`
set/unset logs.path	0.227 s	`Call Tedge Config List`
set/unset run.path	0.22 s	`Call Tedge Config List`
set/unset firmware.child.update.timeout	0.224 s	`Call Tedge Config List`
set/unset c8y.url	0.242 s	`Call Tedge Config List`
set/unset az.url	0.2 s	`Call Tedge Config List`
set/unset mqtt.external.bind.port	0.22 s	`Call Tedge Config List`
mqtt.external.bind.address	0.225 s	`Call Tedge Config List`
mqtt.external.bind.interface	0.219 s	`Call Tedge Config List`
set/unset mqtt.external.ca_path	0.226 s	`Call Tedge Config List`
set/unset mqtt.external.cert_file	0.236 s	`Call Tedge Config List`
set/unset mqtt.external.key_file	0.23 s	`Call Tedge Config List`
set/unset software.plugin.default	0.214 s	`Call Tedge Config List`
Get Put Delete	1.796 s	`Http File Transfer Api`
Set keys should return value on stdout	0.11 s	`Tedge Config Get`
Unset keys should not return anything on stdout and warnings on stderr	0.242 s	`Tedge Config Get`
Invalid keys should not return anything on stdout and warnings on stderr	0.248 s	`Tedge Config Get`
Set configuration via environment variables	1.057 s	`Tedge Config Get`
Set configuration via environment variables for topics	0.314 s	`Tedge Config Get`
Set unknown configuration via environment variables	0.185 s	`Tedge Config Get`

codecov · 2023-07-07T09:16:29Z

Codecov Report

Merging #2049 (c3c73b6) into main (119fcaa) will increase coverage by 0.1%.
The diff coverage is 67.1%.

Additional details and impacted files

Impacted Files	Coverage Δ
crates/core/plugin_sm/src/plugin.rs	`11.2% <0.0%> (-0.1%)`	⬇️
crates/core/tedge_api/src/error.rs	`0.0% <ø> (ø)`
crates/core/tedge_actors/src/message_boxes.rs	`75.5% <15.0%> (-7.7%)`	⬇️
...tes/core/tedge_agent/src/software_manager/actor.rs	`64.0% <72.0%> (+3.9%)`	⬆️
crates/common/download/src/download.rs	`75.8% <81.0%> (-0.6%)`	⬇️

... and 66 files with indirect coverage changes

didier-wenzek

This is indeed not as complex as we thought as first.

crates/common/download/src/download.rs

tests/RobotFramework/tests/cumulocity/software.robot

crates/core/tedge_agent/src/software_manager/actor.rs

gligorisaev · 2023-07-10T05:16:16Z

tests/RobotFramework/tests/cumulocity/software.robot

+    ${OPERATION}=    Install Software        test-very-large-software,1.0,https://t493319102.eu-latest.cumulocity.com/inventory/binaries/28057693
+
+    # waiting for the download to start (so, for "Downloading: ...") to appear
+    # in the log, but I have no clue how to do "wait until log contains ..."


One possibility is to use the Keyword "Wait Untill Keyword Succeeds", here one example:

Wait Until Log Contains [Arguments] ${expected_message} ${timeout} Wait Until Keyword Succeeds ${timeout} 2s Run Keyword And Return Status Should Log Contain ${expected_message} Should Log Contain [Arguments] ${expected_message} ${logs} Get Logs Log Many ${logs} level=INFO console=yes Log Many ${logs} level=DEBUG console=yes Should Contain ${logs} ${expected_message}

ping me we can try it together if you want

I think it will be better to write a keyword in python to facilitate this, as I want to avoid doing too much programming in RobotFramework language if we can help it...And I don't think this will be the last time we have this use-case.

So in summary, I can write a new assertion (which will be exposed as a RobotFramework keyword).

I've pushed some updates to the test to add a new keyword to check the presence of log entries (and dynamically wait for them). Just waiting for the CI to run to confirm, however I ran the test 50 times locally and no signs of flakiness.

didier-wenzek

Approved

crates/core/tedge_agent/src/software_manager/actor.rs

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

SoftwareManager actor used in tedge-agent was made to exit immediately when it receives a shutdown request from the runtime. This was achieved by splitting up the message box and concurrently waiting for either completing the request or for Runtime shutdown request. Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

The backoff timing was tweaked in order to more consistently generate increasing backoff times. The time we wait to retry the request is also logged, along with the reason why the backoff was triggered. Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Signed-off-by: Reuben Miller <reuben.d.miller@gmail.com>

…ilable url Signed-off-by: Reuben Miller <reuben.d.miller@gmail.com>

didier-wenzek

Confirm Approval

gligorisaev · 2023-07-19T05:15:59Z

QA has thoroughly checked the feature and here are the results:

Test for ticket exists in the test suite.
/tests/RobotFramework/tests/cumulocity/software.robot
QA has tested the feature and it meets the required specifications.

Bravo555 had a problem deploying to Test Pull Request June 26, 2023 17:00 — with GitHub Actions Failure

didier-wenzek reviewed Jun 27, 2023

View reviewed changes

Bravo555 force-pushed the download-fixes branch from d1ce287 to 7f77cf0 Compare June 28, 2023 12:12

Bravo555 had a problem deploying to Test Pull Request June 28, 2023 12:21 — with GitHub Actions Failure

Bravo555 force-pushed the download-fixes branch from 7f77cf0 to 8e20663 Compare June 29, 2023 12:32

Bravo555 temporarily deployed to Test Pull Request June 29, 2023 12:43 — with GitHub Actions Inactive

Bravo555 force-pushed the download-fixes branch from 8e20663 to 32be743 Compare June 30, 2023 13:07

Bravo555 had a problem deploying to Test Pull Request June 30, 2023 13:16 — with GitHub Actions Failure

Bravo555 temporarily deployed to Test Pull Request July 4, 2023 09:07 — with GitHub Actions Inactive

Bravo555 force-pushed the download-fixes branch 3 times, most recently from 5dfc26c to aaba38e Compare July 7, 2023 08:53

Bravo555 had a problem deploying to Test Pull Request July 7, 2023 09:02 — with GitHub Actions Failure

Bravo555 force-pushed the download-fixes branch from aaba38e to 4f191c0 Compare July 7, 2023 09:15

Bravo555 temporarily deployed to Test Pull Request July 7, 2023 09:24 — with GitHub Actions Inactive

Bravo555 temporarily deployed to Test Pull Request July 7, 2023 13:59 — with GitHub Actions Inactive

Bravo555 force-pushed the download-fixes branch from a93cdda to 7d62ad0 Compare July 7, 2023 14:09

Bravo555 had a problem deploying to Test Pull Request July 7, 2023 14:18 — with GitHub Actions Failure

Bravo555 temporarily deployed to Test Pull Request July 7, 2023 15:11 — with GitHub Actions Inactive

didier-wenzek reviewed Jul 7, 2023

View reviewed changes

crates/common/download/src/download.rs Outdated Show resolved Hide resolved

tests/RobotFramework/tests/cumulocity/software.robot Outdated Show resolved Hide resolved

crates/core/tedge_agent/src/software_manager/actor.rs Show resolved Hide resolved

Bravo555 force-pushed the download-fixes branch from 7d62ad0 to b23d57a Compare July 7, 2023 16:26

Bravo555 temporarily deployed to Test Pull Request July 7, 2023 16:35 — with GitHub Actions Inactive

Bravo555 marked this pull request as ready for review July 8, 2023 14:25

gligorisaev reviewed Jul 10, 2023

View reviewed changes

didier-wenzek approved these changes Jul 10, 2023

View reviewed changes

crates/core/tedge_agent/src/software_manager/actor.rs Outdated Show resolved Hide resolved

reubenmiller reviewed Jul 10, 2023

View reviewed changes

crates/core/tedge_agent/src/software_manager/actor.rs Outdated Show resolved Hide resolved

Bravo555 added 4 commits July 10, 2023 10:57

Improve downloader logging and usage in plugin_sm

69db5ab

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Remove downloader timeout from some tests

456dffe

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Bravo555 force-pushed the download-fixes branch from b23d57a to e6f36db Compare July 10, 2023 10:57

Bravo555 temporarily deployed to Test Pull Request July 10, 2023 11:06 — with GitHub Actions Inactive

Bravo555 marked this pull request as draft July 10, 2023 12:13

reubenmiller added 2 commits July 18, 2023 09:23

test(lib): support limiting logs from a specific date

d36d30c

Signed-off-by: Reuben Miller <reuben.d.miller@gmail.com>

test(software): refactor test to use new log keyword and publicly ava…

c3c73b6

…ilable url Signed-off-by: Reuben Miller <reuben.d.miller@gmail.com>

reubenmiller temporarily deployed to Test Pull Request July 18, 2023 07:35 — with GitHub Actions Inactive

Bravo555 marked this pull request as ready for review July 18, 2023 08:59

didier-wenzek approved these changes Jul 18, 2023

View reviewed changes

didier-wenzek merged commit 4fed216 into thin-edge:main Jul 18, 2023
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve downloader logging behaviour and make SoftwareManager exit immediately on ^C #2049

Improve downloader logging behaviour and make SoftwareManager exit immediately on ^C #2049

Bravo555 commented Jun 26, 2023 •

edited

didier-wenzek left a comment

didier-wenzek Jun 27, 2023

didier-wenzek Jun 27, 2023

Bravo555 Jun 28, 2023

didier-wenzek Jun 28, 2023

didier-wenzek commented Jun 27, 2023

Bravo555 commented Jun 28, 2023

github-actions bot commented Jun 28, 2023 •

edited

codecov bot commented Jul 7, 2023 •

edited

didier-wenzek left a comment

gligorisaev Jul 10, 2023

reubenmiller Jul 10, 2023

reubenmiller Jul 18, 2023

didier-wenzek left a comment

didier-wenzek left a comment

gligorisaev commented Jul 19, 2023

Improve downloader logging behaviour and make SoftwareManager exit immediately on ^C #2049

Improve downloader logging behaviour and make SoftwareManager exit immediately on ^C #2049

Conversation

Bravo555 commented Jun 26, 2023 • edited

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

didier-wenzek left a comment

Choose a reason for hiding this comment

didier-wenzek Jun 27, 2023

Choose a reason for hiding this comment

didier-wenzek Jun 27, 2023

Choose a reason for hiding this comment

Bravo555 Jun 28, 2023

Choose a reason for hiding this comment

didier-wenzek Jun 28, 2023

Choose a reason for hiding this comment

didier-wenzek commented Jun 27, 2023

Bravo555 commented Jun 28, 2023

github-actions bot commented Jun 28, 2023 • edited

Robot Results

Passed Tests

codecov bot commented Jul 7, 2023 • edited

Codecov Report

didier-wenzek left a comment

Choose a reason for hiding this comment

gligorisaev Jul 10, 2023

Choose a reason for hiding this comment

reubenmiller Jul 10, 2023

Choose a reason for hiding this comment

reubenmiller Jul 18, 2023

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

gligorisaev commented Jul 19, 2023

Bravo555 commented Jun 26, 2023 •

edited

github-actions bot commented Jun 28, 2023 •

edited

codecov bot commented Jul 7, 2023 •

edited