Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug-540] tedge agent missing software list request #653

Conversation

PradeepKiruvale
Copy link
Contributor

@PradeepKiruvale PradeepKiruvale commented Nov 30, 2021

Proposed changes

Tedge agent was missing a software list request message on a new device.
The fix here is to move the mqtt connect and subscribe to the mqtt messages upfront in the tedge agent start code before doing any other initializations, so the messages are never lost.

Types of changes

What types of changes does your code introduce to thin-edge.io?
Put an x in the boxes that apply

  • [x ] Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

@PradeepKiruvale PradeepKiruvale changed the title [Bug-540] tedge agent missing list request [Bug-540] tedge agent missing software list request Nov 30, 2021
Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion for a fix would be to always send a dummy software list response on agent startup without waiting for a request. If it receives a request after the startup it will respond again, but I don't see any problem with that.

There is the possibility of mapper not receiving this dummy response if the mapper itself was not started or not ready to receive these messages. But even in that case, we're covered as the mapper will eventually send an explicit request to the agent for which it will get a response.

@@ -205,12 +210,11 @@ impl SmAgent {
async fn subscribe_and_process(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function doesn't do subscribe anymore, calling this function subscribe_and_process looks misleading.

@@ -171,6 +171,9 @@ impl SmAgent {
pub async fn start(&mut self) -> Result<(), AgentError> {
info!("Starting tedge agent");

let mqtt = Client::connect(self.name.as_str(), &self.config.mqtt_client_config).await?;
let mut operations = mqtt.subscribe(self.config.request_topics.clone()).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure if this fix is sufficient (although I agree that this might make the issue less frequent). What if the mapper process sends the request before this agent process execution even reaches this line? That message will still be lost, right?(unless we're sending those requests with retained flag ON, which I doubt).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that there are still two race conditions here but these are not related to the retain flag.

The first race condition arises if this is the very first connection of the agent and if messages have already been sent to the input topic of the agent. Since no client has subscribed to this topic the broker discards all these messages. To fix that subscriptions must created on behalf of the agent before any message is sent. This can be done on install for all the daemon.

The second issue is what is mitigated by this PR. With the current mqtt_client crate, a client connection might start to consume messages before a call to subscribe. The proper fix is a bit more complex (see #575).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix that subscriptions must be created on behalf of the agent before any message is sent. This can be done on install for all the daemon.

Understood.

Copy link
Member

@rina23q rina23q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except renaming the function, it looks good to me. I confirmed it works now on RPi4.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@@ -171,6 +171,9 @@ impl SmAgent {
pub async fn start(&mut self) -> Result<(), AgentError> {
info!("Starting tedge agent");

let mqtt = Client::connect(self.name.as_str(), &self.config.mqtt_client_config).await?;
let mut operations = mqtt.subscribe(self.config.request_topics.clone()).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that there are still two race conditions here but these are not related to the retain flag.

The first race condition arises if this is the very first connection of the agent and if messages have already been sent to the input topic of the agent. Since no client has subscribed to this topic the broker discards all these messages. To fix that subscriptions must created on behalf of the agent before any message is sent. This can be done on install for all the daemon.

The second issue is what is mitigated by this PR. With the current mqtt_client crate, a client connection might start to consume messages before a call to subscribe. The proper fix is a bit more complex (see #575).

@PradeepKiruvale PradeepKiruvale merged commit ff073f8 into thin-edge:main Dec 1, 2021
@rina23q rina23q linked an issue Dec 6, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Software list upload fails for a new device
4 participants