-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Declare child devices again if deleted in cloud #1644
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,
When the request is sent to the cloud to get the child devices list
from the cloud, how long will it wait to get the child devices list? Is there any timeout? What if it gets stuck and never receives the list from the cloud?
689a51c
to
5ff5ccc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work is going in the right direction. However, I have questions regarding the expected behavior. Previously the reference were the local device. Now, we compute a diff between the cloud view and the local view. It's not clear to me how this diff needs to be resolved. Notably, the current PR pushes local child-device info to the cloud if not declared in the cloud. But nothing is done in the reverse direction. What if a child device is declared in the cloud but not locally known? We also have to double check that nothing has been broken regarding the dynamic declaration of new child devices (when a child device starts to send telemetry data without any declaration).
// if there are any local child devices that are not included in the | ||
// `cloud_child_devices` struct, we create them on the cloud, sending a 101 | ||
// message. Then proceed to declare their supported operations. | ||
let difference: Vec<&String> = local_child_devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the child devices that are declared cloud side but not locally? Do we have to create something on the local device? On the file-system? In the Converter.children: HashMap<String, Operations>
cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it was decided not to update the local state. Just wanted to know if it was just deferred for later or we've decided not to sync state from the cloud for some reason? If it's the latter, curious to know why.
The proposed fix is definitely the right thing to do as such a reconciliation with cloud will be needed soon anyway. But, for the specific reported issue of the "device name getting messed up", a simpler fix was also possible: Simply send a |
I agree with this, sending |
This was what @reubenmiller and I first considered but concluded that doing this does not scale to more than just a few child devices |
26842f0
to
eaf3a1a
Compare
@reubenmiller Agree that it will be a scalability issue. Is there any way to send multiple child device registration/creation messages as one message?, Kind of batch registering? |
Yes. You can send multiple SmartREST 101 commands across multiple lines in a single MQTT message as follows:
We can just blindly send these messages and ignore the "device already exists "responses as we're only interested in recreating any missing devices. So, I'm also curious to understand this scalability issue that you're highlighting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
afeb964
to
7e66f48
Compare
Thanks for approving @didier-wenzek but I made more changes and I would like another review, it should not take too long. The changes I made are the following:
I did not touch anything you reviewed before |
9026e8f
to
0a3660c
Compare
"106" => register_child_device_supported_operations(config_dir, payload, children), | ||
// Ignore any other child device incoming request as not yet supported | ||
_ => { | ||
info!("Ignored. Message not yet supported: {payload}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info!("Ignored. Message not yet supported: {payload}"); | |
debug!("Ignored. Message not yet supported: {payload}"); |
This might be better as a debug message otherwise I see the mapper logging this for every operation request sent to any child device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug!
or info!
? I proposed info!
because it's not convenient when you have to change the log level to discover that the messages sent to the device are ignored. But I agree that if this is a normal case to have these messages ignored by the mapper and processed by a plugin, then debug!
is more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we extend the mapper in future where it performs all the message mapping for all operations(including config mgmt,log mgmt etc), even for child devices, info
or even warn
logging would be the right thing to do as we would want to know when the device receives a message that the mapper can not handle. But today, since the mapper only maps software management operations for the parent device only, I feel that this log will be triggered most of the time.
/// │ └── c8y_LogfileRequest | ||
/// └── child-3 | ||
/// └── c8y_LogfileRequest | ||
fn make_n_child_devices_with_k_operations(n: u8, ttd: &TempTedgeDir) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test utility. You've gone to quite some lengths to ensure that randomness across test runs.
// if there are any local child devices that are not included in the | ||
// `cloud_child_devices` struct, we create them on the cloud, sending a 101 | ||
// message. Then proceed to declare their supported operations. | ||
let difference: Vec<&String> = local_child_devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it was decided not to update the local state. Just wanted to know if it was just deferred for later or we've decided not to sync state from the cloud for some reason? If it's the latter, curious to know why.
@@ -1188,6 +1188,8 @@ async fn mapper_publishes_child_device_create_message() { | |||
|
|||
publish_a_fake_jwt_token(broker).await; | |||
|
|||
broker.publish("c8y/s/ds", "106,child-one").await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the mapper init block is not waiting for the child device list response, is this really necessary?
@@ -1212,6 +1214,7 @@ async fn mapper_publishes_supported_operations_for_child_device() { | |||
let (_tmp_dir, sm_mapper) = start_c8y_mapper(broker.port, &cfg_dir).await.unwrap(); | |||
|
|||
publish_a_fake_jwt_token(broker).await; | |||
broker.publish("c8y/s/ds", "106,child-one").await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as the one above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albinsuresh , yes because now we don't send 114 on startup but only when we receive a 116 from the cloud. The logic is the following:
mapper sends 105
cloud replies with 106
mapper computes diff between cloud and local
mapper sends 101 for new child devices
mapper sends 114 for all child devices
LGTM. Have just added some minor comments/queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm my approval, having already approved 9638859.
/// updated with operations. This child device will not be cached. | ||
/// | ||
/// - Any child device that is present locally but not in the cloud will be created and | ||
/// then supported operations will be published to the cloud and the device will be cached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then supported operations will be published to the cloud
This part is not tested. Neither the messages returned by process_smartrest
.
For that test to be perfect, I would add a check that there is a declaration message for each child-device known locally but not in the cloud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update this test 😄
0cf41d6
to
b600799
Compare
When deleting a child device from the cloud representation the supported operations logic would re-create this child device but fail to notice it is not available on the cloud any longer. Thus resulting in a child device name prefixed by "MQTT Device". This fix, changes the logic of initialising supported operations on startup, to first requesting the cloud representation of child devices, computing a diff between cloud and local, initialising as new child devices anything that is available locally but is not seen on the cloud and then proceeds to add supported operations for the child devices. Signed-off-by: alexandru solomes <solo@softwareag.com>
Signed-off-by: alexandru solomes <solo@softwareag.com>
Signed-off-by: alexandru solomes <solo@softwareag.com>
Added a series of tests that checks a few assumptions. Namely: - the source of true child devices is the parent device (not the cloud) - local child devices are updated to the cloud - remote child devices are ignored - the child device cache is updated according to these rules Signed-off-by: alexandru solomes <solo@softwareag.com>
Signed-off-by: alexandru solomes <solo@softwareag.com>
b600799
to
0901582
Compare
Proposed changes
When deleting a child device from the cloud representation the supported operations logic would re-create this child device but fail to notice it is not available on the cloud any longer. Thus resulting in a child device name prefixed by "MQTT Device".
This fix, changes the logic of initialising supported operations on startup, to first requesting the cloud representation of child devices, computing a diff between cloud and local, initialising as new child devices anything that is available locally but is not seen on the cloud and then proceeds to add supported operations for the child devices.
Types of changes
Paste Link to the issue
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments