Fix tedge-agent stuck on too many pending operations #2640

didier-wenzek · 2024-01-31T13:37:23Z

Proposed changes

Introduce unbounded receivers. Given a cycle of actors (say A sending messages to B, B to C and C to A), at least one of these actors must use such an unbounded receiver.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#2639

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

github-actions · 2024-01-31T13:58:14Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %	⏱️ Duration
387	0	3	387	100	53m28.617s

codecov · 2024-02-01T09:25:40Z

Codecov Report

Attention: 67 lines in your changes are missing coverage. Please review.

Comparison is base (6bb5198) 75.8% compared to head (862357a) 75.9%.
Report is 83 commits behind head on main.

Additional details and impacted files

Files	Coverage Δ
...tedge_agent/src/tedge_operation_converter/actor.rs	`47.1% <ø> (ø)`
...dge_agent/src/tedge_operation_converter/builder.rs	`90.2% <100.0%> (ø)`
crates/core/tedge_api/src/message_log.rs	`84.0% <100.0%> (ø)`
crates/extensions/c8y_mapper_ext/src/converter.rs	`81.1% <100.0%> (+0.1%)`	⬆️
...s/c8y_mapper_ext/src/operations/firmware_update.rs	`88.6% <80.0%> (+0.4%)`	⬆️
...es/extensions/c8y_mapper_ext/src/operations/mod.rs	`91.5% <94.8%> (+4.8%)`	⬆️
crates/core/tedge_api/src/entity_store.rs	`92.9% <0.0%> (ø)`
...nsions/c8y_mapper_ext/src/operations/log_upload.rs	`88.8% <89.1%> (+0.4%)`	⬆️
crates/extensions/tedge_mqtt_ext/src/lib.rs	`64.7% <16.6%> (-1.4%)`	⬇️
crates/core/tedge_actors/src/channels.rs	`61.7% <41.6%> (-2.4%)`	⬇️
... and 2 more

... and 5 files with indirect coverage changes

didier-wenzek · 2024-02-01T13:52:11Z

crates/core/tedge_agent/src/tedge_operation_converter/actor.rs

-    pub(crate) input_receiver: LoggingReceiver<AgentInput>,
+    pub(crate) input_receiver: UnboundedLoggingReceiver<AgentInput>,


An alternative to fix the issue could be to use two receivers.

One receiver to receive progress feedback from the software actor

Another to collect requests from MQTT

Using a biased select, one can then give the priority to the feedback messages.

rina23q

LGTM

tests/RobotFramework/tests/tedge_agent/main_tedge_agent.robot

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

At least one such receiver should be used when there is a loop of actors, say A sending data to B and B sending data to A. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

The tedge_operation_converter is sending messages to the software_manager actor, while receiving messages from this same actor. This leads to a deadlock if one their message boxes start to be full. This is notably the case on startup if there are too many pending operations awaiting over MQTT to be processed. With the tedge_operation_converter using an unbounded receiver, all these messages can be consumed without blocking status messages received from the software_manager actor. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

didier-wenzek temporarily deployed to Test Pull Request January 31, 2024 13:44 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto January 31, 2024 13:46 — with GitHub Actions Failure

didier-wenzek force-pushed the fix/stuck-tedge-agent branch from e1c7000 to b9e3d85 Compare January 31, 2024 14:00

didier-wenzek temporarily deployed to Test Pull Request January 31, 2024 14:07 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto January 31, 2024 14:11 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request January 31, 2024 14:51 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto January 31, 2024 15:06 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request January 31, 2024 16:16 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto January 31, 2024 16:21 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request January 31, 2024 16:51 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto January 31, 2024 17:02 — with GitHub Actions Failure

didier-wenzek force-pushed the fix/stuck-tedge-agent branch from 2fffde2 to fb0b85e Compare January 31, 2024 17:27

didier-wenzek temporarily deployed to Test Pull Request January 31, 2024 17:35 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto January 31, 2024 17:39 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request January 31, 2024 18:08 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto January 31, 2024 18:16 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request February 1, 2024 09:20 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto February 1, 2024 09:22 — with GitHub Actions Failure

didier-wenzek force-pushed the fix/stuck-tedge-agent branch from b94312b to 11ac977 Compare February 1, 2024 10:51

didier-wenzek temporarily deployed to Test Pull Request February 1, 2024 10:59 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto February 1, 2024 11:00 — with GitHub Actions Failure

didier-wenzek marked this pull request as ready for review February 1, 2024 11:24

didier-wenzek requested review from albinsuresh, jarhodes314, rina23q and a team as code owners February 1, 2024 11:24

didier-wenzek commented Feb 1, 2024

View reviewed changes

didier-wenzek force-pushed the fix/stuck-tedge-agent branch from 11ac977 to f96fb86 Compare February 1, 2024 15:32

didier-wenzek temporarily deployed to Test Pull Request February 1, 2024 15:39 — with GitHub Actions Inactive

didier-wenzek temporarily deployed to Test Auto February 1, 2024 15:47 — with GitHub Actions Inactive

rina23q approved these changes Feb 1, 2024

View reviewed changes

tests/RobotFramework/tests/tedge_agent/main_tedge_agent.robot Outdated Show resolved Hide resolved

tests/RobotFramework/tests/tedge_agent/main_tedge_agent.robot Outdated Show resolved Hide resolved

didier-wenzek added 4 commits February 1, 2024 17:26

Reproduce tedge-agent stuck on too many pending operations

cb0c46f

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

Impl UnboundedLoggingReceiver

fdab20b

At least one such receiver should be used when there is a loop of actors, say A sending data to B and B sending data to A. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

Cargo +nightly fmt

862357a

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

didier-wenzek force-pushed the fix/stuck-tedge-agent branch from f96fb86 to 862357a Compare February 1, 2024 16:26

didier-wenzek temporarily deployed to Test Pull Request February 1, 2024 16:33 — with GitHub Actions Inactive

didier-wenzek temporarily deployed to Test Auto February 1, 2024 16:34 — with GitHub Actions Inactive

didier-wenzek added this pull request to the merge queue Feb 1, 2024

Merged via the queue into thin-edge:main with commit 6fb7ee4 Feb 1, 2024
20 checks passed

didier-wenzek deleted the fix/stuck-tedge-agent branch February 7, 2024 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tedge-agent stuck on too many pending operations #2640

Fix tedge-agent stuck on too many pending operations #2640

didier-wenzek commented Jan 31, 2024

github-actions bot commented Jan 31, 2024 •

edited

codecov bot commented Feb 1, 2024 •

edited

didier-wenzek Feb 1, 2024 •

edited

rina23q left a comment

		pub(crate) input_receiver: LoggingReceiver<AgentInput>,
		pub(crate) input_receiver: UnboundedLoggingReceiver<AgentInput>,

Fix tedge-agent stuck on too many pending operations #2640

Fix tedge-agent stuck on too many pending operations #2640

Conversation

didier-wenzek commented Jan 31, 2024

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

github-actions bot commented Jan 31, 2024 • edited

Robot Results

codecov bot commented Feb 1, 2024 • edited

Codecov Report

didier-wenzek Feb 1, 2024 • edited

Choose a reason for hiding this comment

rina23q left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 31, 2024 •

edited

codecov bot commented Feb 1, 2024 •

edited

didier-wenzek Feb 1, 2024 •

edited