-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug-540] tedge agent missing software list request #653
[Bug-540] tedge agent missing software list request #653
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion for a fix would be to always send a dummy software list response on agent startup without waiting for a request. If it receives a request after the startup it will respond again, but I don't see any problem with that.
There is the possibility of mapper not receiving this dummy response if the mapper itself was not started or not ready to receive these messages. But even in that case, we're covered as the mapper will eventually send an explicit request to the agent for which it will get a response.
crates/core/tedge_agent/src/agent.rs
Outdated
@@ -205,12 +210,11 @@ impl SmAgent { | |||
async fn subscribe_and_process( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function doesn't do subscribe
anymore, calling this function subscribe_and_process
looks misleading.
@@ -171,6 +171,9 @@ impl SmAgent { | |||
pub async fn start(&mut self) -> Result<(), AgentError> { | |||
info!("Starting tedge agent"); | |||
|
|||
let mqtt = Client::connect(self.name.as_str(), &self.config.mqtt_client_config).await?; | |||
let mut operations = mqtt.subscribe(self.config.request_topics.clone()).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure if this fix is sufficient (although I agree that this might make the issue less frequent). What if the mapper process sends the request before this agent process execution even reaches this line? That message will still be lost, right?(unless we're sending those requests with retained flag ON, which I doubt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that there are still two race conditions here but these are not related to the retain flag.
The first race condition arises if this is the very first connection of the agent and if messages have already been sent to the input topic of the agent. Since no client has subscribed to this topic the broker discards all these messages. To fix that subscriptions must created on behalf of the agent before any message is sent. This can be done on install for all the daemon.
The second issue is what is mitigated by this PR. With the current mqtt_client
crate, a client connection might start to consume messages before a call to subscribe. The proper fix is a bit more complex (see #575).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix that subscriptions must be created on behalf of the agent before any message is sent. This can be done on install for all the daemon.
Understood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except renaming the function, it looks good to me. I confirmed it works now on RPi4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
@@ -171,6 +171,9 @@ impl SmAgent { | |||
pub async fn start(&mut self) -> Result<(), AgentError> { | |||
info!("Starting tedge agent"); | |||
|
|||
let mqtt = Client::connect(self.name.as_str(), &self.config.mqtt_client_config).await?; | |||
let mut operations = mqtt.subscribe(self.config.request_topics.clone()).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that there are still two race conditions here but these are not related to the retain flag.
The first race condition arises if this is the very first connection of the agent and if messages have already been sent to the input topic of the agent. Since no client has subscribed to this topic the broker discards all these messages. To fix that subscriptions must created on behalf of the agent before any message is sent. This can be done on install for all the daemon.
The second issue is what is mitigated by this PR. With the current mqtt_client
crate, a client connection might start to consume messages before a call to subscribe. The proper fix is a bit more complex (see #575).
Proposed changes
Tedge agent was missing a software list request message on a new device.
The fix here is to move the mqtt connect and subscribe to the mqtt messages upfront in the tedge agent start code before doing any other initializations, so the messages are never lost.
Types of changes
What types of changes does your code introduce to
thin-edge.io
?Put an
x
in the boxes that applyPaste Link to the issue
Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...