-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New c8y_firmware_manager
#1830
New c8y_firmware_manager
#1830
Conversation
Porting c8y_firnware_plugin to library with actor model. Leftovers: - Downloader is missing. It's mocked now. - Timeout doesn't work yet. - Some small unit test need to be ported. Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
// TODO: Do we really need to send 500 from each actor? | ||
self.get_pending_operations_from_cloud(&mut message_box) | ||
.await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
It would be good to introduce an actor that manages the handshake with Cumulicity, on start up but also when the bridge is down then up. In order to reduce the number of actors this can be the same actor that would notify C8Y the health status of thin-edge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An actor that performs all such bootstrapping with C8y would/should be part of the c8y mapper refactoring.
// TODO: We don't need mockito??? | ||
const DOWNLOAD_URL: &str = "http://test.domain.com"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the system under test.
You can:
- either test only the firmware actor using a
SimpleMessageBox<C8YRestRequest, C8YRestResult>
to check the messages sent to C8Y and simulates C8Y responses. See this PR for an example. - or test the combination
firmware actor <-> c8y proxy actor <-> http actor
. In that case mockito is required to simulate C8Y REST.
Note that you can the same for MQTT: either use a SimpleMessageBox<MqttMessage, MqttMessage>
to interact with the firmware actor; or connect and spawn an MQTT actor and then use an mqtt_tests
broker.
I would take the simplest approach in your case.
- If you re-using tests from the former implementation of the firmware manager => Test the whole chain and use Mockito.
- If you are writing new tests then focus on the firmware actor and interact with it using message boxes.
This comment was marked as outdated.
This comment was marked as outdated.
- Remove unnecessary code - Refactor all tests in tests.rs with new test helper - Port unit tests Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
// TODO: Do we really need to send 500 from each actor? | ||
self.get_pending_operations_from_cloud(&mut message_box) | ||
.await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An actor that performs all such bootstrapping with C8y would/should be part of the c8y mapper refactoring.
} | ||
|
||
impl FirmwareOperationEntry { | ||
pub fn create_status_file(&self, firmware_dir: &Path) -> Result<(), FirmwareManagementError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An independent actor that manages the operation store could be extracted out as an independent actor in a follow-up PR.
self.remove_status_file(operation_id)?; | ||
self.remove_entry_from_active_operations(&OperationKey::new(child_id, operation_id)) | ||
} else { | ||
ActiveOperationState::Pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That logic seems probelematic. An operation_key
may not be present in the active_child_ops
map even when the operation has already completed with successful/failed response, upon which the entries are removed from the active_child_ops
map. In that case, assuming that operation to be Pending
would be wrong.
if let Some(operation_state) = self.active_child_ops.remove(operation_key) { | ||
operation_state | ||
} else { | ||
ActiveOperationState::Pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Falling back to Pending
could be problematic as mentioned in the comment above in fail_operation_in_cloud
. An operation is Pending
only when we have added that entry in the map. Non-existence of an entry doesn't necessarily mean Pending
. If we receive a non-existent operation id, we can't assume that it'd in pending state and send responses to cloud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, some cases call fail_operation_in_cloud
before inserting the key to active_child_ops
. In this case, this assumption is correct.
I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function. Also, in general, sending 501/502/503 is potentially fragile as long as we cannot use operation ID from c8y. It's impossible to cover all error handling. If we cover one, we will have another hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function.
Yeah, for pretty much all the cases that I could think of, like duplicate responses from child devices, we have guard rails at other places to prevent this case from happening. As you rightly pointed out, this code can be simplified a lot, avoiding all those guard rails spread all over the place, once we have access to request IDs from the cloud. This is fine for now.
self.publish_c8y_failed_message( | ||
&child_id, | ||
"No failure reason provided by child device.", | ||
message_box, | ||
) | ||
.await?; | ||
self.remove_status_file(operation_id)?; | ||
self.remove_entry_from_active_operations(&OperationKey::new( | ||
&child_id, | ||
&operation_id, | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just re-use fail_operation_in_cloud
.
self.remove_status_file(operation_id)?; | ||
self.remove_entry_from_active_operations(&operation_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be refactored into a single cleanup function which is used here as well as when the operation is failed.
if self.config.health_check_topics.accept(&message) { | ||
self.send_health_status_message(message_box).await?; | ||
return Ok(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not required at the actor level, at least not yet. The health check contract is for a daemon and not for each and every actor. So, the main that's wrapping this actor will be the one that responds to the health-check requests, which is the c8y-device-management plugin in this case. When you refactor the existing c8y-firmware-plugin with this actor impl, you'll have another main that's expected to respond to tedge/health-check/c8y-firmware-plugin requests. Currently all such health-checks are handled by the tedge-health-check actor which is already wired with the c8y-device-management.
In future, we'd want the daemon/main to really health-check the underlying actors as well, instead of blindly sending a response immeidately.
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
…#1846 Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actor code is correct. I didn't check the firmware protocol itself.
@rina23q do you plan to refactor the c8_firmware_plugin
in this PR? This would be good as there is some duplicated code. e.g. I would prefer a git mv plugins/c8y_firmware_plugin/src/message.rs crates/extensions/c8y_firmware_manager/src/message.rs
rather an error prone copy as we currently have.
@@ -69,14 +69,16 @@ impl DownloaderActor { | |||
|
|||
#[async_trait] | |||
impl Server for DownloaderActor { | |||
type Request = DownloadRequest; | |||
type Response = DownloadResult; | |||
type Request = (String, DownloadRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, why not make the String
id a part of the DownloadRequest
itself? Same for DownloadResponse
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the response, (String, Result<DownloadResponse, DownloadError>)
is simpler because one also needs a request id for the errors. Doing the same for requests is then more natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative would have been to change the DownloadResult
from a type alias to an explicit struct with fields id: String
and result: Result<DownloadResponse, DownloadError>)
. But this is fine for now.
@@ -64,6 +64,18 @@ pub fn get_child_id_from_measurement_topic(topic: &str) -> Option<String> { | |||
} | |||
} | |||
|
|||
pub fn get_child_id_from_child_topic(topic: &str) -> Option<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn get_child_id_from_child_topic(topic: &str) -> Option<String> { | |
pub fn get_child_id_from_commands_topic(topic: &str) -> Option<String> { |
mqtt_publisher: None, | ||
jwt_retriever: None, | ||
timer_sender: None, | ||
download_sender: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that we still have to initialise these fields as None
and get them set via the set_connection
calls immediately. One of the goals of taking all the providers in this new
method itself was to avoid this case. I understand that you were forced to do this due to the limitations of the existing connection APIs. To fix this, we'll have to split the current "two-way connection" happening in in set_connection
into individual steps:
- where you retrieve the sender from the
provider
and use that to initialisethis
builder - where
this builder
is connected to theprovider
.
But that's definitely not under the scope of this PR and can be attempted in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's definitely not under the scope of this PR and can be attempted in a follow-up PR.
Yes, we need to take some time to find a nice solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actor integration and even the updates to the logic parts of the plugin itself looks fine.
if let Some(operation_state) = self.active_child_ops.remove(operation_key) { | ||
operation_state | ||
} else { | ||
ActiveOperationState::Pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot come up with the cases, where the key has been removed already because of the completion but dropping into this function.
Yeah, for pretty much all the cases that I could think of, like duplicate responses from child devices, we have guard rails at other places to prevent this case from happening. As you rightly pointed out, this code can be simplified a lot, avoiding all those guard rails spread all over the place, once we have access to request IDs from the cloud. This is fine for now.
async fn publish_smartrest_firmware_operation( | ||
mqtt_message_box: &mut TimedMessageBox<SimpleMessageBox<MqttMessage, MqttMessage>>, | ||
) -> Result<(), DynError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this kind of function could be wrapped around structs that represent actual entities as c8y or a child device.
So the tests could be written as follows:
c8y.publish_smartrest_firmware_operation(&mut mqtt).await?;
let operation_id = child_device.receive_firmware_operation(&mut mqtt).await?;
child_device.publish_firmware_update_response("successful", &operation_id, &mut mqtt).await?;
c8y.receive_firmware_update_response("successful", &mut mqtt).await?;
These tests will then be not only easier to read but also acting as a documentation of the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
- Clean up Cargo.toml - Remove FirmwareManagerMessageBox::send() - Renaming functions and variables - Take all required actor builders as arguments of Builder::new() - Add some comments Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
Signed-off-by: Rina Fujino <18257209+rina23q@users.noreply.github.com>
0c761a3
to
2c37aaa
Compare
Proposed changes
Porting c8y_firnware_plugin to library with actor model.
Todo:
Types of changes
Paste Link to the issue
#1806
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments