Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New c8y_firmware_manager #1830

Merged
merged 18 commits into from
Apr 12, 2023

Conversation

rina23q
Copy link
Member

@rina23q rina23q commented Mar 21, 2023

Proposed changes

Porting c8y_firnware_plugin to library with actor model.

Todo:

  • MQTT-based message logic
  • Operation timeout logic
  • Download logic including adding JWT token for c8y url
  • Use new test helper
  • Create a new downloader crate (done with the PR Add tedge-downloader-actor #1884 )
  • Address some comments

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#1806

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Porting c8y_firnware_plugin to library with actor model.

Leftovers:
- Downloader is missing. It's mocked now.
- Timeout doesn't work yet.
- Some small unit test need to be ported.

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
@rina23q rina23q marked this pull request as draft March 21, 2023 19:16
@rina23q rina23q temporarily deployed to Test Pull Request March 22, 2023 12:58 — with GitHub Actions Inactive
crates/bin/c8y-device-management/src/main.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 70 to 72
// TODO: Do we really need to send 500 from each actor?
self.get_pending_operations_from_cloud(&mut message_box)
.await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

It would be good to introduce an actor that manages the handshake with Cumulicity, on start up but also when the bridge is down then up. In order to reduce the number of actors this can be the same actor that would notify C8Y the health status of thin-edge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An actor that performs all such bootstrapping with C8y would/should be part of the c8y mapper refactoring.

crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/message.rs Outdated Show resolved Hide resolved
Comment on lines 29 to 30
// TODO: We don't need mockito???
const DOWNLOAD_URL: &str = "http://test.domain.com";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the system under test.

You can:

  • either test only the firmware actor using a SimpleMessageBox<C8YRestRequest, C8YRestResult> to check the messages sent to C8Y and simulates C8Y responses. See this PR for an example.
  • or test the combination firmware actor <-> c8y proxy actor <-> http actor. In that case mockito is required to simulate C8Y REST.

Note that you can the same for MQTT: either use a SimpleMessageBox<MqttMessage, MqttMessage> to interact with the firmware actor; or connect and spawn an MQTT actor and then use an mqtt_tests broker.

I would take the simplest approach in your case.

  • If you re-using tests from the former implementation of the firmware manager => Test the whole chain and use Mockito.
  • If you are writing new tests then focus on the firmware actor and interact with it using message boxes.

@github-actions

This comment was marked as outdated.

- Remove unnecessary code
- Refactor all tests in tests.rs with new test helper
- Port unit tests

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
@rina23q rina23q temporarily deployed to Test Pull Request March 22, 2023 17:53 — with GitHub Actions Inactive
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
@rina23q rina23q temporarily deployed to Test Pull Request March 27, 2023 10:54 — with GitHub Actions Inactive
Comment on lines 70 to 72
// TODO: Do we really need to send 500 from each actor?
self.get_pending_operations_from_cloud(&mut message_box)
.await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An actor that performs all such bootstrapping with C8y would/should be part of the c8y mapper refactoring.

}

impl FirmwareOperationEntry {
pub fn create_status_file(&self, firmware_dir: &Path) -> Result<(), FirmwareManagementError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An independent actor that manages the operation store could be extracted out as an independent actor in a follow-up PR.

crates/extensions/c8y_firmware_manager/src/lib.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/config.rs Outdated Show resolved Hide resolved
self.remove_status_file(operation_id)?;
self.remove_entry_from_active_operations(&OperationKey::new(child_id, operation_id))
} else {
ActiveOperationState::Pending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That logic seems probelematic. An operation_key may not be present in the active_child_ops map even when the operation has already completed with successful/failed response, upon which the entries are removed from the active_child_ops map. In that case, assuming that operation to be Pending would be wrong.

if let Some(operation_state) = self.active_child_ops.remove(operation_key) {
operation_state
} else {
ActiveOperationState::Pending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Falling back to Pending could be problematic as mentioned in the comment above in fail_operation_in_cloud. An operation is Pending only when we have added that entry in the map. Non-existence of an entry doesn't necessarily mean Pending. If we receive a non-existent operation id, we can't assume that it'd in pending state and send responses to cloud.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, some cases call fail_operation_in_cloud before inserting the key to active_child_ops. In this case, this assumption is correct.

I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function. Also, in general, sending 501/502/503 is potentially fragile as long as we cannot use operation ID from c8y. It's impossible to cover all error handling. If we cover one, we will have another hole.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function.

Yeah, for pretty much all the cases that I could think of, like duplicate responses from child devices, we have guard rails at other places to prevent this case from happening. As you rightly pointed out, this code can be simplified a lot, avoiding all those guard rails spread all over the place, once we have access to request IDs from the cloud. This is fine for now.

crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
Comment on lines 387 to 397
self.publish_c8y_failed_message(
&child_id,
"No failure reason provided by child device.",
message_box,
)
.await?;
self.remove_status_file(operation_id)?;
self.remove_entry_from_active_operations(&OperationKey::new(
&child_id,
&operation_id,
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just re-use fail_operation_in_cloud.

Comment on lines +383 to +384
self.remove_status_file(operation_id)?;
self.remove_entry_from_active_operations(&operation_key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be refactored into a single cleanup function which is used here as well as when the operation is failed.

Comment on lines 106 to 108
if self.config.health_check_topics.accept(&message) {
self.send_health_status_message(message_box).await?;
return Ok(());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not required at the actor level, at least not yet. The health check contract is for a daemon and not for each and every actor. So, the main that's wrapping this actor will be the one that responds to the health-check requests, which is the c8y-device-management plugin in this case. When you refactor the existing c8y-firmware-plugin with this actor impl, you'll have another main that's expected to respond to tedge/health-check/c8y-firmware-plugin requests. Currently all such health-checks are handled by the tedge-health-check actor which is already wired with the c8y-device-management.

In future, we'd want the daemon/main to really health-check the underlying actors as well, instead of blindly sending a response immeidately.

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
…#1846

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
@rina23q rina23q marked this pull request as ready for review April 6, 2023 18:00
@rina23q rina23q temporarily deployed to Test Pull Request April 6, 2023 18:06 — with GitHub Actions Inactive
Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actor code is correct. I didn't check the firmware protocol itself.

@rina23q do you plan to refactor the c8_firmware_plugin in this PR? This would be good as there is some duplicated code. e.g. I would prefer a git mv plugins/c8y_firmware_plugin/src/message.rs crates/extensions/c8y_firmware_manager/src/message.rs rather an error prone copy as we currently have.

crates/extensions/c8y_firmware_manager/Cargo.toml Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/Cargo.toml Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/Cargo.toml Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/Cargo.toml Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/Cargo.toml Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/lib.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/lib.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
@@ -69,14 +69,16 @@ impl DownloaderActor {

#[async_trait]
impl Server for DownloaderActor {
type Request = DownloadRequest;
type Response = DownloadResult;
type Request = (String, DownloadRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, why not make the String id a part of the DownloadRequest itself? Same for DownloadResponse as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the response, (String, Result<DownloadResponse, DownloadError>) is simpler because one also needs a request id for the errors. Doing the same for requests is then more natural.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative would have been to change the DownloadResult from a type alias to an explicit struct with fields id: String and result: Result<DownloadResponse, DownloadError>). But this is fine for now.

@rina23q rina23q temporarily deployed to Test Pull Request April 11, 2023 12:18 — with GitHub Actions Inactive
@rina23q rina23q temporarily deployed to Test Pull Request April 11, 2023 12:37 — with GitHub Actions Inactive
@rina23q rina23q temporarily deployed to Test Pull Request April 11, 2023 14:42 — with GitHub Actions Inactive
@rina23q rina23q temporarily deployed to Test Pull Request April 11, 2023 15:12 — with GitHub Actions Inactive
@rina23q rina23q temporarily deployed to Test Pull Request April 11, 2023 16:10 — with GitHub Actions Inactive
@@ -64,6 +64,18 @@ pub fn get_child_id_from_measurement_topic(topic: &str) -> Option<String> {
}
}

pub fn get_child_id_from_child_topic(topic: &str) -> Option<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn get_child_id_from_child_topic(topic: &str) -> Option<String> {
pub fn get_child_id_from_commands_topic(topic: &str) -> Option<String> {

Comment on lines +69 to +72
mqtt_publisher: None,
jwt_retriever: None,
timer_sender: None,
download_sender: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that we still have to initialise these fields as None and get them set via the set_connection calls immediately. One of the goals of taking all the providers in this new method itself was to avoid this case. I understand that you were forced to do this due to the limitations of the existing connection APIs. To fix this, we'll have to split the current "two-way connection" happening in in set_connection into individual steps:

  1. where you retrieve the sender from the provider and use that to initialise this builder
  2. where this builder is connected to the provider.

But that's definitely not under the scope of this PR and can be attempted in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's definitely not under the scope of this PR and can be attempted in a follow-up PR.

Yes, we need to take some time to find a nice solution here.

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actor integration and even the updates to the logic parts of the plugin itself looks fine.

crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_firmware_manager/src/actor.rs Outdated Show resolved Hide resolved
if let Some(operation_state) = self.active_child_ops.remove(operation_key) {
operation_state
} else {
ActiveOperationState::Pending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function.

Yeah, for pretty much all the cases that I could think of, like duplicate responses from child devices, we have guard rails at other places to prevent this case from happening. As you rightly pointed out, this code can be simplified a lot, avoiding all those guard rails spread all over the place, once we have access to request IDs from the cloud. This is fine for now.

Comment on lines +680 to +682
async fn publish_smartrest_firmware_operation(
mqtt_message_box: &mut TimedMessageBox<SimpleMessageBox<MqttMessage, MqttMessage>>,
) -> Result<(), DynError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this kind of function could be wrapped around structs that represent actual entities as c8y or a child device.

So the tests could be written as follows:

c8y.publish_smartrest_firmware_operation(&mut mqtt).await?;
let operation_id = child_device.receive_firmware_operation(&mut mqtt).await?;
child_device.publish_firmware_update_response("successful", &operation_id, &mut mqtt).await?;
c8y.receive_firmware_update_response("successful", &mut mqtt).await?;

These tests will then be not only easier to read but also acting as a documentation of the protocol.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@rina23q rina23q temporarily deployed to Test Pull Request April 12, 2023 14:42 — with GitHub Actions Inactive
- Clean up Cargo.toml
- Remove FirmwareManagerMessageBox::send()
- Renaming functions and variables
- Take all required actor builders as arguments of Builder::new()
- Add some comments

Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
@rina23q rina23q force-pushed the rewrite-firmware-plugin-with-actor branch from 0c761a3 to 2c37aaa Compare April 12, 2023 15:35
@rina23q rina23q temporarily deployed to Test Pull Request April 12, 2023 15:42 — with GitHub Actions Inactive
@rina23q rina23q merged commit 5f3fad2 into thin-edge:main Apr 12, 2023
15 checks passed
@rina23q rina23q deleted the rewrite-firmware-plugin-with-actor branch April 12, 2023 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants