Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tedge-agent stuck on too many pending operations #2640

Merged
merged 4 commits into from
Feb 1, 2024

Conversation

didier-wenzek
Copy link
Contributor

Proposed changes

Introduce unbounded receivers. Given a cycle of actors (say A sending messages to B, B to C and C to A), at least one of these actors must use such an unbounded receiver.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#2639

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link
Contributor

github-actions bot commented Jan 31, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
387 0 3 387 100 53m28.617s

Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: 67 lines in your changes are missing coverage. Please review.

Comparison is base (6bb5198) 75.8% compared to head (862357a) 75.9%.
Report is 83 commits behind head on main.

Additional details and impacted files
Files Coverage Δ
...tedge_agent/src/tedge_operation_converter/actor.rs 47.1% <ø> (ø)
...dge_agent/src/tedge_operation_converter/builder.rs 90.2% <100.0%> (ø)
crates/core/tedge_api/src/message_log.rs 84.0% <100.0%> (ø)
crates/extensions/c8y_mapper_ext/src/converter.rs 81.1% <100.0%> (+0.1%) ⬆️
...s/c8y_mapper_ext/src/operations/firmware_update.rs 88.6% <80.0%> (+0.4%) ⬆️
...es/extensions/c8y_mapper_ext/src/operations/mod.rs 91.5% <94.8%> (+4.8%) ⬆️
crates/core/tedge_api/src/entity_store.rs 92.9% <0.0%> (ø)
...nsions/c8y_mapper_ext/src/operations/log_upload.rs 88.8% <89.1%> (+0.4%) ⬆️
crates/extensions/tedge_mqtt_ext/src/lib.rs 64.7% <16.6%> (-1.4%) ⬇️
crates/core/tedge_actors/src/channels.rs 61.7% <41.6%> (-2.4%) ⬇️
... and 2 more

... and 5 files with indirect coverage changes

pub(crate) input_receiver: LoggingReceiver<AgentInput>,
pub(crate) input_receiver: UnboundedLoggingReceiver<AgentInput>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to fix the issue could be to use two receivers.

  • One receiver to receive progress feedback from the software actor
  • Another to collect requests from MQTT

Using a biased select, one can then give the priority to the feedback messages.

Copy link
Member

@rina23q rina23q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
At least one such receiver should be used when there is a loop of actors,
say A sending data to B and B sending data to A.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
The tedge_operation_converter is sending messages to the
software_manager actor, while receiving messages from this same actor.
This leads to a deadlock if one their message boxes start to be full.
This is notably the case on startup if there are too many pending
operations awaiting over MQTT to be processed. With the tedge_operation_converter
using an unbounded receiver, all these messages can be consumed without
blocking status messages received from the software_manager actor.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
@didier-wenzek didier-wenzek added this pull request to the merge queue Feb 1, 2024
Merged via the queue into thin-edge:main with commit 6fb7ee4 Feb 1, 2024
20 checks passed
@didier-wenzek didier-wenzek deleted the fix/stuck-tedge-agent branch February 7, 2024 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants