Move c8y-mapper to new health topics #2276

Bravo555 · 2023-09-20T16:00:08Z

Proposed changes

This PR updates most of the services to use new health status topics:

tedge/health/SERVICE_NAME
-> te/device/main/service/SERVICE_NAME/status/health

tedge/health/CHILD_ID/SERVICE_NAME
-> te/device/CHILD_ID/service/SERVICE_NAME/status/health

Health check commands are not part of this PR and will be done in a follow-up PR.

tedge_to_te_converter mapping of tedge/health to te/+/+/+/+/health/status is still active so e.g. cloud bridges can still publish messages to old topics but be converted to new ones.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Health check and status for c8y mapper on te/ #2274: This PR completes half of the scope of this ticket by moving c8y-mapper and other services to new health status message
Update tedge-watchdog to use the te MQTT topics. #2308: tedge-watchdog was updated to read health status messages from new topics, but still uses old tedge/health-check topic for health check commands

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

Follow-up tasks

Improve construction of startup registration messages

github-actions · 2023-09-20T17:30:38Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %	⏱️ Duration
316	0	3	316	100	51m23.445s

codecov · 2023-09-21T09:01:56Z

Codecov Report

Merging #2276 (2a8cccc) into main (d39c1c2) will decrease coverage by 0.4%.
Report is 1 commits behind head on main.
The diff coverage is 63.8%.

Additional details and impacted files

Files	Coverage Δ
crates/common/batcher/src/driver.rs	`77.4% <100.0%> (ø)`
.../tedge_config/src/tedge_config_cli/tedge_config.rs	`81.8% <100.0%> (ø)`
crates/core/c8y_api/src/smartrest/inventory.rs	`100.0% <100.0%> (ø)`
crates/core/c8y_api/src/smartrest/message.rs	`97.8% <ø> (ø)`
crates/core/tedge_actors/src/actors.rs	`75.2% <100.0%> (+0.7%)`	⬆️
crates/core/tedge_actors/src/builders.rs	`69.2% <ø> (ø)`
crates/core/tedge_actors/src/converter.rs	`53.9% <100.0%> (ø)`
crates/core/tedge_actors/src/run_actor.rs	`100.0% <100.0%> (ø)`
crates/core/tedge_actors/src/runtime.rs	`86.6% <100.0%> (ø)`
crates/core/tedge_actors/src/servers/actors.rs	`87.0% <100.0%> (ø)`
... and 51 more

... and 8 files with indirect coverage changes

crates/core/tedge_agent/src/agent.rs

crates/extensions/c8y_mapper_ext/src/converter.rs

didier-wenzek

I still need some time to finalize this review.

crates/core/tedge_api/src/mqtt_topics.rs

didier-wenzek · 2023-09-25T10:05:48Z

crates/core/tedge_api/src/mqtt_topics.rs

+///
+/// It's most often in a format `device/DEVICE_NAME/service/SERVICE_NAME`.
+#[derive(Debug, Clone, Hash, PartialEq, Eq)]
+pub struct ServiceTopicId(EntityTopicId);


I'm not convinced by this type.

Why would we have a specific type to identify services but not devices?

Also addressed it here.

We can have a specific type to identify devices as well, just by the scope of this PR I thought it was a good place to add one for a service. I think as we plan to support custom MQTT schemes, where any identifier can be either a device or a service, it makes sense to save this information by encoding it in a type. Granted, producers of these wrappers are currently not well defined, so maybe this is a premature abstraction, but ultimately, I think these ServiceTopicId/DeviceTopicId wrapper types could be produced by the entity store (that is, they can be derived from EntityType which is a part of EntityMetadata), or derived from a default topic scheme.

I'm still not convinced with these abstractions because they assume things about the topic. An EntityTopicId in itself doesn't describe whether it's a device or not. It's the registration message that dictates whether that topic maps to a device or service. So, we shouldn't overload the topic abstraction itself with that information.

And yes, we have default topic schemes for devices and services, but they're just specializations of EntityTopicId and can easily be created with helpers like default_child_device, default_main_service etc. So, I'm really not sure if these additional types are required. There's already enough confusion around all the IDs like EntityTopicId, EntityExternalId etc. Why add more for very little value addition?

Responded in this comment, if you think this abstraction adds little value I can remove ServiceTopicId and use EntityTopicId.

As of now, I see indeed little value in ServiceTopicId and DeviceTopicId. However, this is not blocking.

What really add values is the struct Service which combines the EntityTopicId of a service with the EntityTopicId of its parent device removing the need for guessing one from the other.

I finally understood what is worrying me. This struct ServiceTopicId and even the struct Service are halfway between EntityTopicId and EntityMetadata.

EntityTopicIds are just identifiers of sources & targets of MQTT messages - with no assumption of the nature of these sources and targets.

EntityMetadatas describe these entities - notably providing their type and their parent if any.

So, if a feature needs to know the type of entity or its parent, then the implementation must be given an EntityMetadata not a decorated EntityTopicId.

Hence, now I do think that it would be better to not introduce neither ServiceTopicId nor DeviceTopicId and not even Service.

So, if a feature needs to know the type of entity or its parent, then the implementation must be given an EntityMetadata not a decorated EntityTopicId.

Yeah, I agree that EntityMetadata is probably the closest to what we eventually want to have. One problem with it though is that this type is designed for flat storage in a vector, i.e. parent field being just a string key that one can look in a hashmap - this is far simpler than using references. So currently, if we want to see who is the parent in EntityMetadata, we have to call .get() again on the entity store.

The solution for this I think would be to have an Entity struct, which is models the entity so as to make invalid states unrepresentable and contain any other relevant entities, e.g. a parent, so one wouldn't have to call entity store again. This is similar to .ancestors() method, but instead of returning IDs, it could return objects with all necessary information.

For now though, as agreed, I won't be making any sweeping API changes and just do it with EntityMetadata directly, in order to have full picture of the pain points before resolving them.

crates/extensions/tedge_health_ext/src/lib.rs

didier-wenzek · 2023-09-25T10:25:30Z

crates/core/tedge_api/src/mqtt_topics.rs

+    /// If in a default MQTT scheme, returns a device topic id of this service.
+    pub fn to_device_topic_id(&self) -> Option<DeviceTopicId> {


We really should avoid these methods that assume a default topic scheme.

Yes, this assumption of the default topic scheme, spread everywhere with no good way to indicate it (i.e. using a comment or a custom .expect() message, instead of better utilising the type system) leads to some unreliable code, and I think implementing a solution to this should be a priority. I'll look at the current usages and make a ticket.

This is why I see more values in the struct Service which removes the need for this kind of guessing work.

See this comment #2276 (comment).

Now I do think that any use of a Service should be replaced by an EntityMetadata.

crates/core/tedge_api/src/mqtt_topics.rs

crates/extensions/tedge_health_ext/src/lib.rs

crates/extensions/c8y_mapper_ext/src/converter.rs

crates/core/tedge_api/src/health.rs

crates/extensions/c8y_mapper_ext/src/converter.rs

crates/core/tedge_api/src/health.rs

albinsuresh · 2023-09-26T13:58:01Z

crates/core/tedge_api/src/mqtt_topics.rs

+///
+/// It's most often in a format `device/DEVICE_NAME/service/SERVICE_NAME`.
+#[derive(Debug, Clone, Hash, PartialEq, Eq)]
+pub struct ServiceTopicId(EntityTopicId);


I'm still not convinced with these abstractions because they assume things about the topic. An EntityTopicId in itself doesn't describe whether it's a device or not. It's the registration message that dictates whether that topic maps to a device or service. So, we shouldn't overload the topic abstraction itself with that information.

And yes, we have default topic schemes for devices and services, but they're just specializations of EntityTopicId and can easily be created with helpers like default_child_device, default_main_service etc. So, I'm really not sure if these additional types are required. There's already enough confusion around all the IDs like EntityTopicId, EntityExternalId etc. Why add more for very little value addition?

crates/extensions/tedge_health_ext/src/lib.rs

didier-wenzek

The tedge_to_te_converter must stop converting tedge/health/# messages.

See https://github.com/thin-edge/thin-edge.io/pull/2276/files#r1362176975

albinsuresh · 2023-10-17T15:12:08Z

crates/core/c8y_api/src/smartrest/inventory.rs

+/// to set the status of).
+///
+/// https://cumulocity.com/guides/reference/smartrest-two/#104
+pub fn service_status_update_message(external_ids: &[String], service_status: &str) -> Message {


To avoid any confusion, as described in the comments above, and to keep this API consistent with the other APIs in this module like service_creation_message, I'd just take the service_id and ancestors as separate parameters, the latter without the target service_id in it. But while passing it to publish_topic_from_ancestors, you'll have to append them.

I'd like do do it properly later as part of a follow-up task.

Bravo555 · 2023-10-17T15:59:50Z

The tedge_to_te_converter must stop converting tedge/health/# messages.

Should be fixed in 87ac7f8.

albinsuresh · 2023-10-17T16:45:43Z

crates/core/tedge_agent/src/tedge_to_te_converter/converter.rs

@@ -30,9 +30,6 @@ impl TedgetoTeConverter {
            }
            topic if topic.name.starts_with("tedge/events") => self.convert_event(message),
            topic if topic.name.starts_with("tedge/alarms") => self.convert_alarm(message),
-            topic if topic.name.starts_with("tedge/health") => {


I understand that this was removed to avoid a certain bug discussed in an earlier comment. But, I'm wondering about the impact of this change on backward compatibility. I was of the impression that these conversions exist to make the transition of third-party component written against the old APIs smoother on our next release where they'll continue publishing to the older topics and we'll internally map them with this tedge_to_te_converter. So, removing this conversion support for tedge/health will break all third-party components publishing to the old health topics.

But, if tedge/health was only an internal API in the earlier releases, meant only to be used by our own components and not by third-party components, then this is okay. I'm just not sure if it was just an internal API. @didier-wenzek could you confirm?

That's true that we could only exclude a white-list of tedge daemons. However I don't think this API has been used by many as it was not documented beyond tedge daemons.

albinsuresh · 2023-10-17T16:49:11Z

tests/RobotFramework/tests/cumulocity/service_monitoring/service_monitoring.robot

@@ -99,26 +99,6 @@ Check health status of tedge-mapper-c8y service on broker restart

    Custom Test Teardown

-Check health status of child device service


Why was this test removed and not just adapted to use the new health endpoints? I understand that this test probably was relying on the tedge_to_te_converter earlier and that's why you removed. But, we could have just updated it to use the new te/.../health/status endpoints directly, unless this is duplicate and a test for child device service health status already exists elsewhere.

I may have made a mistake with that one. We agreed with @reubenmiller that health status messages should not change entity properties, so it follows that they shouldn't also create entities by themselves either.
However I forgot about auto-registration mechanism, in which it is expected that an entity sending any message will be automatically registered in C8y, so the same goes for health status messages.

I'll try to restore and update the test.

didier-wenzek

Approved. Thank you for this sustained effort.

Follow-up tasks:

Introduce a new constructor for HealthMonitorBuilder from tedge-config to avoid repeating the same logic in all tedge daemons.
Add a user-configurable service topic id to all tedge daemons, instead of building it after the device id.
Consider to deprecate ServiceTopicId, EntityTopicId and Service, replacing their usage with EntityMetadata or intriducing an Entity type (see Move c8y-mapper to new health topics #2276 (comment)).
Consider to deprecate this idea of a service type at the level of thin-edge Entity. This should be related to the inventory i.e. data attached by the user to devices and services.

albinsuresh

Approving, but still review my last comment on an existing test that was removed, which might reduce our test coverage. I'm just hoping that it was removed for genuine reasons.

Also, create a list of planned follow-up tasks in a comment, or even better as issues. Group trivial and related ones together and feel free to raise multiple issues, if some are going to be more time consuming than others, and unrelated to other changes.

Bravo555 · 2023-10-18T08:53:17Z

Also, create a list of planned follow-up tasks in a comment, or even better as issues. Group trivial and related ones together and feel free to raise multiple issues, if some are going to be more time consuming than others, and unrelated to other changes.

Created a list of follow-up tasks in this issue: #2274

This commit updates tedge-mapper-c8y to subscribe to new health status topics: ``` tedge/health/SERVICE_NAME -> te/device/main/service/SERVICE_NAME/status/health tedge/health/CHILD_ID/SERVICE_NAME -> te/device/CHILD_ID/service/SERVICE_NAME/status/health ``` Producers of health status topics (i.e. different thin-edge services) are not in the scope of this PR, however the tests for tedge-agent were failing because after it terminated, the topic converter could not convert a last will message to the new health status topic. As such, only tedge-agent was updated to publish its health status messages to a new health topic, introducing a new health topic API that will be used by other services, however for now they still publish to old health status topics, and these messages are converted by the tedge-agent's topic converter. Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

A new `Service` type was created, containing topic ids of the service itself and the associated device. Functions which need to take an entity topic id of a service can use this type to ensure that the given argument is indeed a service, as well as to obtain the topic id of the associated device. Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

didier-wenzek · 2023-10-18T08:57:27Z

Created a list of follow-up tasks in this issue: #2274

Thank you

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

didier-wenzek · 2023-10-18T09:55:28Z

The test "Successful log operation" is flaky.

Bravo555 mentioned this pull request Sep 20, 2023

tedge-agent: move to new health topic #2271

Closed

13 tasks

Bravo555 had a problem deploying to Test Pull Request September 20, 2023 16:10 — with GitHub Actions Failure

Bravo555 force-pushed the services-new-health-topics branch from eb9bec8 to d042c6f Compare September 21, 2023 08:43

Bravo555 had a problem deploying to Test Pull Request September 21, 2023 08:53 — with GitHub Actions Failure

Bravo555 force-pushed the services-new-health-topics branch 3 times, most recently from a194320 to 1c5a9ec Compare September 22, 2023 10:51

Bravo555 temporarily deployed to Test Pull Request September 22, 2023 11:01 — with GitHub Actions Inactive

Bravo555 force-pushed the services-new-health-topics branch from 1c5a9ec to d8360ae Compare September 22, 2023 12:21

Bravo555 temporarily deployed to Test Pull Request September 22, 2023 12:30 — with GitHub Actions Inactive

Bravo555 force-pushed the services-new-health-topics branch from d8360ae to f6cf40f Compare September 25, 2023 08:50

Bravo555 temporarily deployed to Test Pull Request September 25, 2023 09:00 — with GitHub Actions Inactive

Bravo555 commented Sep 25, 2023

View reviewed changes

crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved

crates/extensions/c8y_mapper_ext/src/converter.rs Outdated Show resolved Hide resolved

Bravo555 marked this pull request as ready for review September 25, 2023 09:41

Bravo555 requested review from albinsuresh, didier-wenzek and PradeepKiruvale September 25, 2023 09:42

didier-wenzek reviewed Sep 25, 2023

View reviewed changes

Bravo555 had a problem deploying to Test Pull Request September 25, 2023 11:57 — with GitHub Actions Failure

Bravo555 temporarily deployed to Test Pull Request September 25, 2023 14:48 — with GitHub Actions Inactive

Bravo555 force-pushed the services-new-health-topics branch from 825c291 to 4c88266 Compare September 25, 2023 14:54

Bravo555 temporarily deployed to Test Pull Request September 25, 2023 15:05 — with GitHub Actions Inactive

didier-wenzek reviewed Sep 25, 2023

View reviewed changes

PradeepKiruvale reviewed Sep 26, 2023

View reviewed changes

crates/core/tedge_api/src/health.rs Show resolved Hide resolved

crates/extensions/c8y_mapper_ext/src/converter.rs Outdated Show resolved Hide resolved

Bravo555 had a problem deploying to Test Pull Request September 26, 2023 13:28 — with GitHub Actions Failure

albinsuresh reviewed Sep 26, 2023

View reviewed changes

Bravo555 temporarily deployed to Test Pull Request September 27, 2023 14:37 — with GitHub Actions Inactive

Bravo555 force-pushed the services-new-health-topics branch from 93b1e24 to f509718 Compare September 28, 2023 11:46

Bravo555 force-pushed the services-new-health-topics branch 2 times, most recently from 3862236 to 3fbb470 Compare October 17, 2023 11:49

Bravo555 temporarily deployed to Test Pull Request October 17, 2023 12:00 — with GitHub Actions Inactive

didier-wenzek requested changes Oct 17, 2023

View reviewed changes

Bravo555 temporarily deployed to Test Pull Request October 17, 2023 15:07 — with GitHub Actions Inactive

albinsuresh reviewed Oct 17, 2023

View reviewed changes

Bravo555 had a problem deploying to Test Pull Request October 17, 2023 16:06 — with GitHub Actions Failure

Bravo555 temporarily deployed to Test Pull Request October 17, 2023 16:41 — with GitHub Actions Inactive

albinsuresh reviewed Oct 17, 2023

View reviewed changes

didier-wenzek approved these changes Oct 17, 2023

View reviewed changes

albinsuresh approved these changes Oct 17, 2023

View reviewed changes

Bravo555 force-pushed the services-new-health-topics branch from 87ac7f8 to c2b618e Compare October 18, 2023 07:29

Bravo555 temporarily deployed to Test Pull Request October 18, 2023 07:40 — with GitHub Actions Inactive

Bravo555 mentioned this pull request Oct 18, 2023

Handle remaining references to tedge/health in c8y #2349

Closed

Bravo555 added 3 commits October 18, 2023 08:53

Use EntityStore to check if main device agent up

22d0d32

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Bravo555 added 5 commits October 18, 2023 09:11

Rewrite health message conversion

bedde76

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Remove dead module c8y_api::smartrest::monitor

372b82b

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Use service.type as default service type

83ba035

Services publish registration message on startup

e73be3c

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>

Remove mapping of health status messages

2a8cccc

Bravo555 force-pushed the services-new-health-topics branch from c2b618e to 2a8cccc Compare October 18, 2023 09:17

Bravo555 had a problem deploying to Test Pull Request October 18, 2023 09:29 — with GitHub Actions Failure

Bravo555 temporarily deployed to Test Pull Request October 18, 2023 09:55 — with GitHub Actions Inactive

Bravo555 merged commit f811fad into thin-edge:main Oct 18, 2023
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move c8y-mapper to new health topics #2276

Move c8y-mapper to new health topics #2276

Bravo555 commented Sep 20, 2023 •

edited

github-actions bot commented Sep 20, 2023 •

edited

codecov bot commented Sep 21, 2023 •

edited

didier-wenzek left a comment

didier-wenzek Sep 25, 2023

Bravo555 Sep 25, 2023

albinsuresh Sep 26, 2023

Bravo555 Sep 27, 2023 •

edited

didier-wenzek Sep 28, 2023

didier-wenzek Sep 28, 2023

Bravo555 Sep 29, 2023

didier-wenzek Sep 25, 2023

Bravo555 Sep 25, 2023

didier-wenzek Sep 28, 2023

didier-wenzek Sep 28, 2023

albinsuresh Sep 26, 2023

didier-wenzek left a comment

albinsuresh Oct 17, 2023

Bravo555 Oct 17, 2023

Bravo555 commented Oct 17, 2023

albinsuresh Oct 17, 2023 •

edited

didier-wenzek Oct 17, 2023

albinsuresh Oct 17, 2023

Bravo555 Oct 18, 2023

didier-wenzek left a comment

albinsuresh left a comment •

edited

Bravo555 commented Oct 18, 2023

didier-wenzek commented Oct 18, 2023

didier-wenzek commented Oct 18, 2023

		/// If in a default MQTT scheme, returns a device topic id of this service.
		pub fn to_device_topic_id(&self) -> Option<DeviceTopicId> {

		@@ -99,26 +99,6 @@ Check health status of tedge-mapper-c8y service on broker restart

		Custom Test Teardown

		Check health status of child device service

Move c8y-mapper to new health topics #2276

Move c8y-mapper to new health topics #2276

Conversation

Bravo555 commented Sep 20, 2023 • edited

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

Follow-up tasks

github-actions bot commented Sep 20, 2023 • edited

Robot Results

codecov bot commented Sep 21, 2023 • edited

Codecov Report

didier-wenzek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bravo555 Sep 27, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bravo555 commented Oct 17, 2023

albinsuresh Oct 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didier-wenzek left a comment

Choose a reason for hiding this comment

albinsuresh left a comment • edited

Choose a reason for hiding this comment

Bravo555 commented Oct 18, 2023

didier-wenzek commented Oct 18, 2023

didier-wenzek commented Oct 18, 2023

Bravo555 commented Sep 20, 2023 •

edited

github-actions bot commented Sep 20, 2023 •

edited

codecov bot commented Sep 21, 2023 •

edited

Bravo555 Sep 27, 2023 •

edited

albinsuresh Oct 17, 2023 •

edited

albinsuresh left a comment •

edited