New `c8y_firmware_manager` #1830

rina23q · 2023-03-21T19:14:15Z

Proposed changes

Porting c8y_firnware_plugin to library with actor model.

Todo:

MQTT-based message logic
Operation timeout logic
Download logic including adding JWT token for c8y url
Use new test helper
Create a new downloader crate (done with the PR Add tedge-downloader-actor #1884 )
Address some comments

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#1806

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

Porting c8y_firnware_plugin to library with actor model. Leftovers: - Downloader is missing. It's mocked now. - Timeout doesn't work yet. - Some small unit test need to be ported. Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

crates/bin/c8y-device-management/src/main.rs

crates/extensions/c8y_firmware_manager/src/lib.rs

didier-wenzek · 2023-03-22T13:16:25Z

crates/extensions/c8y_firmware_manager/src/actor.rs

+        // TODO: Do we really need to send 500 from each actor?
+        self.get_pending_operations_from_cloud(&mut message_box)
+            .await?;


Good point!

It would be good to introduce an actor that manages the handshake with Cumulicity, on start up but also when the bridge is down then up. In order to reduce the number of actors this can be the same actor that would notify C8Y the health status of thin-edge.

An actor that performs all such bootstrapping with C8y would/should be part of the c8y mapper refactoring.

crates/extensions/c8y_firmware_manager/src/actor.rs

crates/extensions/c8y_firmware_manager/src/message.rs

didier-wenzek · 2023-03-22T13:39:01Z

crates/extensions/c8y_firmware_manager/src/tests.rs

+// TODO: We don't need mockito???
+const DOWNLOAD_URL: &str = "http://test.domain.com";


It depends on the system under test.

You can:

either test only the firmware actor using a SimpleMessageBox<C8YRestRequest, C8YRestResult> to check the messages sent to C8Y and simulates C8Y responses. See this PR for an example.

or test the combination firmware actor <-> c8y proxy actor <-> http actor. In that case mockito is required to simulate C8Y REST.

Note that you can the same for MQTT: either use a SimpleMessageBox<MqttMessage, MqttMessage> to interact with the firmware actor; or connect and spawn an MQTT actor and then use an mqtt_tests broker.

I would take the simplest approach in your case.

If you re-using tests from the former implementation of the firmware manager => Test the whole chain and use Mockito.

If you are writing new tests then focus on the firmware actor and interact with it using message boxes.

- Remove unnecessary code - Refactor all tests in tests.rs with new test helper - Port unit tests Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

crates/extensions/c8y_firmware_manager/src/lib.rs

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

albinsuresh · 2023-03-28T05:28:27Z

crates/extensions/c8y_firmware_manager/src/actor.rs

+        // TODO: Do we really need to send 500 from each actor?
+        self.get_pending_operations_from_cloud(&mut message_box)
+            .await?;


An actor that performs all such bootstrapping with C8y would/should be part of the c8y mapper refactoring.

albinsuresh · 2023-03-28T05:31:15Z

crates/extensions/c8y_firmware_manager/src/operation.rs

+}
+
+impl FirmwareOperationEntry {
+    pub fn create_status_file(&self, firmware_dir: &Path) -> Result<(), FirmwareManagementError> {


An independent actor that manages the operation store could be extracted out as an independent actor in a follow-up PR.

crates/extensions/c8y_firmware_manager/src/lib.rs

crates/extensions/c8y_firmware_manager/src/config.rs

albinsuresh · 2023-03-28T06:10:38Z

crates/extensions/c8y_firmware_manager/src/actor.rs

+            self.remove_status_file(operation_id)?;
+            self.remove_entry_from_active_operations(&OperationKey::new(child_id, operation_id))
+        } else {
+            ActiveOperationState::Pending


That logic seems probelematic. An operation_key may not be present in the active_child_ops map even when the operation has already completed with successful/failed response, upon which the entries are removed from the active_child_ops map. In that case, assuming that operation to be Pending would be wrong.

albinsuresh · 2023-03-28T06:14:25Z

crates/extensions/c8y_firmware_manager/src/actor.rs

+        if let Some(operation_state) = self.active_child_ops.remove(operation_key) {
+            operation_state
+        } else {
+            ActiveOperationState::Pending


Falling back to Pending could be problematic as mentioned in the comment above in fail_operation_in_cloud. An operation is Pending only when we have added that entry in the map. Non-existence of an entry doesn't necessarily mean Pending. If we receive a non-existent operation id, we can't assume that it'd in pending state and send responses to cloud.

Apparently, some cases call fail_operation_in_cloud before inserting the key to active_child_ops. In this case, this assumption is correct.

I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function. Also, in general, sending 501/502/503 is potentially fragile as long as we cannot use operation ID from c8y. It's impossible to cover all error handling. If we cover one, we will have another hole.

I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function.

Yeah, for pretty much all the cases that I could think of, like duplicate responses from child devices, we have guard rails at other places to prevent this case from happening. As you rightly pointed out, this code can be simplified a lot, avoiding all those guard rails spread all over the place, once we have access to request IDs from the cloud. This is fine for now.

crates/extensions/c8y_firmware_manager/src/actor.rs

albinsuresh · 2023-03-28T06:21:47Z

crates/extensions/c8y_firmware_manager/src/actor.rs

+                self.publish_c8y_failed_message(
+                    &child_id,
+                    "No failure reason provided by child device.",
+                    message_box,
+                )
+                .await?;
+                self.remove_status_file(operation_id)?;
+                self.remove_entry_from_active_operations(&OperationKey::new(
+                    &child_id,
+                    &operation_id,
+                ));


We could just re-use fail_operation_in_cloud.

albinsuresh · 2023-03-28T06:23:31Z

crates/extensions/c8y_firmware_manager/src/actor.rs

+                self.remove_status_file(operation_id)?;
+                self.remove_entry_from_active_operations(&operation_key);


This could be refactored into a single cleanup function which is used here as well as when the operation is failed.

albinsuresh · 2023-03-28T06:26:55Z

crates/extensions/c8y_firmware_manager/src/actor.rs

+        if self.config.health_check_topics.accept(&message) {
+            self.send_health_status_message(message_box).await?;
+            return Ok(());


It is not required at the actor level, at least not yet. The health check contract is for a daemon and not for each and every actor. So, the main that's wrapping this actor will be the one that responds to the health-check requests, which is the c8y-device-management plugin in this case. When you refactor the existing c8y-firmware-plugin with this actor impl, you'll have another main that's expected to respond to tedge/health-check/c8y-firmware-plugin requests. Currently all such health-checks are handled by the tedge-health-check actor which is already wired with the c8y-device-management.

In future, we'd want the daemon/main to really health-check the underlying actors as well, instead of blindly sending a response immeidately.

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

…#1846 Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

didier-wenzek

The actor code is correct. I didn't check the firmware protocol itself.

@rina23q do you plan to refactor the c8_firmware_plugin in this PR? This would be good as there is some duplicated code. e.g. I would prefer a git mv plugins/c8y_firmware_plugin/src/message.rs crates/extensions/c8y_firmware_manager/src/message.rs rather an error prone copy as we currently have.

crates/extensions/c8y_firmware_manager/Cargo.toml

crates/extensions/c8y_firmware_manager/src/actor.rs

crates/extensions/c8y_firmware_manager/src/lib.rs

crates/extensions/c8y_firmware_manager/src/actor.rs

albinsuresh · 2023-04-10T11:18:22Z

crates/extensions/tedge_downloader_ext/src/actor.rs

@@ -69,14 +69,16 @@ impl DownloaderActor {

 #[async_trait]
 impl Server for DownloaderActor {
-    type Request = DownloadRequest;
-    type Response = DownloadResult;
+    type Request = (String, DownloadRequest);


Instead of this, why not make the String id a part of the DownloadRequest itself? Same for DownloadResponse as well.

For the response, (String, Result<DownloadResponse, DownloadError>) is simpler because one also needs a request id for the errors. Doing the same for requests is then more natural.

Another alternative would have been to change the DownloadResult from a type alias to an explicit struct with fields id: String and result: Result<DownloadResponse, DownloadError>). But this is fine for now.

albinsuresh · 2023-04-12T05:07:32Z

crates/core/tedge_api/src/topic.rs

@@ -64,6 +64,18 @@ pub fn get_child_id_from_measurement_topic(topic: &str) -> Option<String> {
    }
 }

+pub fn get_child_id_from_child_topic(topic: &str) -> Option<String> {


Suggested change

pub fn get_child_id_from_child_topic(topic: &str) -> Option<String> {

pub fn get_child_id_from_commands_topic(topic: &str) -> Option<String> {

albinsuresh · 2023-04-12T05:22:32Z

crates/extensions/c8y_firmware_manager/src/lib.rs

+            mqtt_publisher: None,
+            jwt_retriever: None,
+            timer_sender: None,
+            download_sender: None,


It's unfortunate that we still have to initialise these fields as None and get them set via the set_connection calls immediately. One of the goals of taking all the providers in this new method itself was to avoid this case. I understand that you were forced to do this due to the limitations of the existing connection APIs. To fix this, we'll have to split the current "two-way connection" happening in in set_connection into individual steps:

where you retrieve the sender from the provider and use that to initialise this builder

where this builder is connected to the provider.

But that's definitely not under the scope of this PR and can be attempted in a follow-up PR.

But that's definitely not under the scope of this PR and can be attempted in a follow-up PR.

Yes, we need to take some time to find a nice solution here.

albinsuresh

The actor integration and even the updates to the logic parts of the plugin itself looks fine.

crates/extensions/c8y_firmware_manager/src/actor.rs

albinsuresh · 2023-04-12T06:43:24Z

crates/extensions/c8y_firmware_manager/src/actor.rs

+        if let Some(operation_state) = self.active_child_ops.remove(operation_key) {
+            operation_state
+        } else {
+            ActiveOperationState::Pending


I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function.

Yeah, for pretty much all the cases that I could think of, like duplicate responses from child devices, we have guard rails at other places to prevent this case from happening. As you rightly pointed out, this code can be simplified a lot, avoiding all those guard rails spread all over the place, once we have access to request IDs from the cloud. This is fine for now.

didier-wenzek · 2023-04-12T13:06:04Z

crates/extensions/c8y_firmware_manager/src/tests.rs

+async fn publish_smartrest_firmware_operation(
+    mqtt_message_box: &mut TimedMessageBox<SimpleMessageBox<MqttMessage, MqttMessage>>,
+) -> Result<(), DynError> {


Maybe this kind of function could be wrapped around structs that represent actual entities as c8y or a child device.

So the tests could be written as follows:

c8y.publish_smartrest_firmware_operation(&mut mqtt).await?; let operation_id = child_device.receive_firmware_operation(&mut mqtt).await?; child_device.publish_firmware_update_response("successful", &operation_id, &mut mqtt).await?; c8y.receive_firmware_update_response("successful", &mut mqtt).await?;

These tests will then be not only easier to read but also acting as a documentation of the protocol.

didier-wenzek

Approved.

- Clean up Cargo.toml - Remove FirmwareManagerMessageBox::send() - Renaming functions and variables - Take all required actor builders as arguments of Builder::new() - Add some comments Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

rina23q marked this pull request as draft March 21, 2023 19:16

rina23q had a problem deploying to Test Pull Request March 21, 2023 19:22 — with GitHub Actions Failure

rina23q added 2 commits March 22, 2023 13:39

Merge branch 'main' into rewrite-firmware-plugin-with-actor

ce792a5

Update FirmwareManagerMessageBox and FirmwareManagerBuilder

45f4d8b

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

rina23q temporarily deployed to Test Pull Request March 22, 2023 12:58 — with GitHub Actions Inactive

didier-wenzek reviewed Mar 22, 2023

View reviewed changes

This comment was marked as outdated.

Sign in to view

Refactor the code

54a4ea0

- Remove unnecessary code - Refactor all tests in tests.rs with new test helper - Port unit tests Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

rina23q temporarily deployed to Test Pull Request March 22, 2023 17:53 — with GitHub Actions Inactive

Merge branch 'main' into rewrite-firmware-plugin-with-actor

89e6e70

PradeepKiruvale reviewed Mar 24, 2023

View reviewed changes

crates/extensions/c8y_firmware_manager/src/lib.rs Outdated Show resolved Hide resolved

rina23q added 3 commits March 27, 2023 11:33

Adjust the code based on the upstream API change

463654a

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

Merge branch 'main' into rewrite-firmware-plugin-with-actor

eea37ee

Update tests and replace move_file by tedge_utils' one

e78f5d5

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

rina23q temporarily deployed to Test Pull Request March 27, 2023 10:54 — with GitHub Actions Inactive

albinsuresh reviewed Mar 28, 2023

View reviewed changes

rina23q added 3 commits April 3, 2023 18:57

Merge branch 'main' into rewrite-firmware-plugin-with-actor

ca8abb5

Merge branch 'main' into rewrite-firmware-plugin-with-actor

5dfbf4e

Update code to adjust new APIs

267250d

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

rina23q had a problem deploying to Test Pull Request April 4, 2023 14:12 — with GitHub Actions Failure

rina23q added 3 commits April 5, 2023 13:30

Change downloader API to receive and response also ID (String)

1d43191

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

Connect downloader to c8y_firmware_manager actor

36c1752

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

Update validate_same_request_in_progress() logic along with thin-edge…

07265f1

…#1846 Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

rina23q had a problem deploying to Test Pull Request April 5, 2023 17:34 — with GitHub Actions Failure

Connect JWT actor

44f5739

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

rina23q marked this pull request as ready for review April 6, 2023 18:00

rina23q temporarily deployed to Test Pull Request April 6, 2023 18:06 — with GitHub Actions Inactive

didier-wenzek reviewed Apr 7, 2023

View reviewed changes

albinsuresh reviewed Apr 10, 2023

View reviewed changes

rina23q had a problem deploying to Test Pull Request April 11, 2023 12:01 — with GitHub Actions Failure

rina23q temporarily deployed to Test Pull Request April 11, 2023 12:18 — with GitHub Actions Inactive

rina23q temporarily deployed to Test Pull Request April 11, 2023 12:37 — with GitHub Actions Inactive

rina23q temporarily deployed to Test Pull Request April 11, 2023 14:42 — with GitHub Actions Inactive

rina23q temporarily deployed to Test Pull Request April 11, 2023 15:12 — with GitHub Actions Inactive

rina23q had a problem deploying to Test Pull Request April 11, 2023 15:48 — with GitHub Actions Failure

rina23q temporarily deployed to Test Pull Request April 11, 2023 16:10 — with GitHub Actions Inactive

albinsuresh reviewed Apr 12, 2023

View reviewed changes

albinsuresh approved these changes Apr 12, 2023

View reviewed changes

didier-wenzek reviewed Apr 12, 2023

View reviewed changes

didier-wenzek approved these changes Apr 12, 2023

View reviewed changes

rina23q temporarily deployed to Test Pull Request April 12, 2023 14:42 — with GitHub Actions Inactive

rina23q added 3 commits April 12, 2023 15:31

Misc changes

5b6c0da

- Clean up Cargo.toml - Remove FirmwareManagerMessageBox::send() - Renaming functions and variables - Take all required actor builders as arguments of Builder::new() - Add some comments Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

Move get_child_id_from_child_topic to tedge_api::topic

87a55b2

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

Rewrite c8y-firmware-plugin with c8y-firmware-manager library

2c37aaa

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>

rina23q force-pushed the rewrite-firmware-plugin-with-actor branch from 0c761a3 to 2c37aaa Compare April 12, 2023 15:35

rina23q temporarily deployed to Test Pull Request April 12, 2023 15:42 — with GitHub Actions Inactive

rina23q merged commit 5f3fad2 into thin-edge:main Apr 12, 2023
15 checks passed

rina23q deleted the rewrite-firmware-plugin-with-actor branch April 12, 2023 17:18

rina23q mentioned this pull request May 30, 2023

Experiment/json mqtt firmware plugin rina23q/thin-edge.io#27

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New `c8y_firmware_manager` #1830

New `c8y_firmware_manager` #1830

rina23q commented Mar 21, 2023 •

edited

didier-wenzek Mar 22, 2023

albinsuresh Mar 28, 2023

didier-wenzek Mar 22, 2023

This comment was marked as outdated.

albinsuresh Mar 28, 2023

albinsuresh Mar 28, 2023

albinsuresh Mar 28, 2023

albinsuresh Mar 28, 2023

rina23q Apr 11, 2023

albinsuresh Apr 12, 2023

albinsuresh Mar 28, 2023

albinsuresh Mar 28, 2023

albinsuresh Mar 28, 2023

didier-wenzek left a comment

albinsuresh Apr 10, 2023

didier-wenzek Apr 11, 2023

albinsuresh Apr 12, 2023

albinsuresh Apr 12, 2023

albinsuresh Apr 12, 2023

didier-wenzek Apr 12, 2023

albinsuresh left a comment

albinsuresh Apr 12, 2023

didier-wenzek Apr 12, 2023 •

edited

didier-wenzek left a comment

		// TODO: We don't need mockito???
		const DOWNLOAD_URL: &str = "http://test.domain.com";

		self.remove_status_file(operation_id)?;
		self.remove_entry_from_active_operations(&operation_key);

	pub fn get_child_id_from_child_topic(topic: &str) -> Option<String> {
	pub fn get_child_id_from_commands_topic(topic: &str) -> Option<String> {

New c8y_firmware_manager #1830

New c8y_firmware_manager #1830

Conversation

rina23q commented Mar 21, 2023 • edited

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albinsuresh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek Apr 12, 2023 • edited

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

New `c8y_firmware_manager` #1830

New `c8y_firmware_manager` #1830

rina23q commented Mar 21, 2023 •

edited

didier-wenzek Apr 12, 2023 •

edited