Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move c8y-mapper to new health topics #2276

Merged
merged 8 commits into from
Oct 18, 2023

Conversation

Bravo555
Copy link
Contributor

@Bravo555 Bravo555 commented Sep 20, 2023

Proposed changes

This PR updates most of the services to use new health status topics:

tedge/health/SERVICE_NAME
-> te/device/main/service/SERVICE_NAME/status/health

tedge/health/CHILD_ID/SERVICE_NAME
-> te/device/CHILD_ID/service/SERVICE_NAME/status/health

Health check commands are not part of this PR and will be done in a follow-up PR.

tedge_to_te_converter mapping of tedge/health to te/+/+/+/+/health/status is still active so e.g. cloud bridges can still publish messages to old topics but be converted to new ones.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Follow-up tasks

  • Improve construction of startup registration messages

@github-actions
Copy link
Contributor

github-actions bot commented Sep 20, 2023

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
316 0 3 316 100 51m23.445s

@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

Merging #2276 (2a8cccc) into main (d39c1c2) will decrease coverage by 0.4%.
Report is 1 commits behind head on main.
The diff coverage is 63.8%.

Additional details and impacted files
Files Coverage Δ
crates/common/batcher/src/driver.rs 77.4% <100.0%> (ø)
.../tedge_config/src/tedge_config_cli/tedge_config.rs 81.8% <100.0%> (ø)
crates/core/c8y_api/src/smartrest/inventory.rs 100.0% <100.0%> (ø)
crates/core/c8y_api/src/smartrest/message.rs 97.8% <ø> (ø)
crates/core/tedge_actors/src/actors.rs 75.2% <100.0%> (+0.7%) ⬆️
crates/core/tedge_actors/src/builders.rs 69.2% <ø> (ø)
crates/core/tedge_actors/src/converter.rs 53.9% <100.0%> (ø)
crates/core/tedge_actors/src/run_actor.rs 100.0% <100.0%> (ø)
crates/core/tedge_actors/src/runtime.rs 86.6% <100.0%> (ø)
crates/core/tedge_actors/src/servers/actors.rs 87.0% <100.0%> (ø)
... and 51 more

... and 8 files with indirect coverage changes

@Bravo555 Bravo555 force-pushed the services-new-health-topics branch 3 times, most recently from a194320 to 1c5a9ec Compare September 22, 2023 10:51
@Bravo555 Bravo555 temporarily deployed to Test Pull Request September 22, 2023 11:01 — with GitHub Actions Inactive
@Bravo555 Bravo555 temporarily deployed to Test Pull Request September 22, 2023 12:30 — with GitHub Actions Inactive
@Bravo555 Bravo555 temporarily deployed to Test Pull Request September 25, 2023 09:00 — with GitHub Actions Inactive
@Bravo555 Bravo555 marked this pull request as ready for review September 25, 2023 09:41
Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need some time to finalize this review.

crates/core/tedge_api/src/mqtt_topics.rs Outdated Show resolved Hide resolved
///
/// It's most often in a format `device/DEVICE_NAME/service/SERVICE_NAME`.
#[derive(Debug, Clone, Hash, PartialEq, Eq)]
pub struct ServiceTopicId(EntityTopicId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by this type.

Why would we have a specific type to identify services but not devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also addressed it here.

We can have a specific type to identify devices as well, just by the scope of this PR I thought it was a good place to add one for a service. I think as we plan to support custom MQTT schemes, where any identifier can be either a device or a service, it makes sense to save this information by encoding it in a type. Granted, producers of these wrappers are currently not well defined, so maybe this is a premature abstraction, but ultimately, I think these ServiceTopicId/DeviceTopicId wrapper types could be produced by the entity store (that is, they can be derived from EntityType which is a part of EntityMetadata), or derived from a default topic scheme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced with these abstractions because they assume things about the topic. An EntityTopicId in itself doesn't describe whether it's a device or not. It's the registration message that dictates whether that topic maps to a device or service. So, we shouldn't overload the topic abstraction itself with that information.

And yes, we have default topic schemes for devices and services, but they're just specializations of EntityTopicId and can easily be created with helpers like default_child_device, default_main_service etc. So, I'm really not sure if these additional types are required. There's already enough confusion around all the IDs like EntityTopicId, EntityExternalId etc. Why add more for very little value addition?

Copy link
Contributor Author

@Bravo555 Bravo555 Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded in this comment, if you think this abstraction adds little value I can remove ServiceTopicId and use EntityTopicId.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, I see indeed little value in ServiceTopicId and DeviceTopicId. However, this is not blocking.

What really add values is the struct Service which combines the EntityTopicId of a service with the EntityTopicId of its parent device removing the need for guessing one from the other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally understood what is worrying me. This struct ServiceTopicId and even the struct Service are halfway between EntityTopicId and EntityMetadata.

  • EntityTopicIds are just identifiers of sources & targets of MQTT messages - with no assumption of the nature of these sources and targets.
  • EntityMetadatas describe these entities - notably providing their type and their parent if any.

So, if a feature needs to know the type of entity or its parent, then the implementation must be given an EntityMetadata not a decorated EntityTopicId.

Hence, now I do think that it would be better to not introduce neither ServiceTopicId nor DeviceTopicId and not even Service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if a feature needs to know the type of entity or its parent, then the implementation must be given an EntityMetadata not a decorated EntityTopicId.

Yeah, I agree that EntityMetadata is probably the closest to what we eventually want to have. One problem with it though is that this type is designed for flat storage in a vector, i.e. parent field being just a string key that one can look in a hashmap - this is far simpler than using references. So currently, if we want to see who is the parent in EntityMetadata, we have to call .get() again on the entity store.

The solution for this I think would be to have an Entity struct, which is models the entity so as to make invalid states unrepresentable and contain any other relevant entities, e.g. a parent, so one wouldn't have to call entity store again. This is similar to .ancestors() method, but instead of returning IDs, it could return objects with all necessary information.

For now though, as agreed, I won't be making any sweeping API changes and just do it with EntityMetadata directly, in order to have full picture of the pain points before resolving them.

crates/extensions/tedge_health_ext/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 358 to 403
/// If in a default MQTT scheme, returns a device topic id of this service.
pub fn to_device_topic_id(&self) -> Option<DeviceTopicId> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should avoid these methods that assume a default topic scheme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this assumption of the default topic scheme, spread everywhere with no good way to indicate it (i.e. using a comment or a custom .expect() message, instead of better utilising the type system) leads to some unreliable code, and I think implementing a solution to this should be a priority. I'll look at the current usages and make a ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I see more values in the struct Service which removes the need for this kind of guessing work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this comment #2276 (comment).

Now I do think that any use of a Service should be replaced by an EntityMetadata.

crates/core/tedge_api/src/mqtt_topics.rs Outdated Show resolved Hide resolved
@Bravo555 Bravo555 temporarily deployed to Test Pull Request September 25, 2023 14:48 — with GitHub Actions Inactive
@Bravo555 Bravo555 temporarily deployed to Test Pull Request September 25, 2023 15:05 — with GitHub Actions Inactive
crates/extensions/tedge_health_ext/src/lib.rs Outdated Show resolved Hide resolved
crates/extensions/c8y_mapper_ext/src/converter.rs Outdated Show resolved Hide resolved
crates/core/tedge_api/src/health.rs Outdated Show resolved Hide resolved
crates/core/tedge_api/src/health.rs Show resolved Hide resolved
crates/core/tedge_api/src/health.rs Show resolved Hide resolved
///
/// It's most often in a format `device/DEVICE_NAME/service/SERVICE_NAME`.
#[derive(Debug, Clone, Hash, PartialEq, Eq)]
pub struct ServiceTopicId(EntityTopicId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced with these abstractions because they assume things about the topic. An EntityTopicId in itself doesn't describe whether it's a device or not. It's the registration message that dictates whether that topic maps to a device or service. So, we shouldn't overload the topic abstraction itself with that information.

And yes, we have default topic schemes for devices and services, but they're just specializations of EntityTopicId and can easily be created with helpers like default_child_device, default_main_service etc. So, I'm really not sure if these additional types are required. There's already enough confusion around all the IDs like EntityTopicId, EntityExternalId etc. Why add more for very little value addition?

crates/extensions/tedge_health_ext/src/lib.rs Outdated Show resolved Hide resolved
@Bravo555 Bravo555 temporarily deployed to Test Pull Request September 27, 2023 14:37 — with GitHub Actions Inactive
@Bravo555 Bravo555 force-pushed the services-new-health-topics branch 2 times, most recently from 3862236 to 3fbb470 Compare October 17, 2023 11:49
@Bravo555 Bravo555 temporarily deployed to Test Pull Request October 17, 2023 12:00 — with GitHub Actions Inactive
Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tedge_to_te_converter must stop converting tedge/health/# messages.

See https://github.com/thin-edge/thin-edge.io/pull/2276/files#r1362176975

@Bravo555 Bravo555 temporarily deployed to Test Pull Request October 17, 2023 15:07 — with GitHub Actions Inactive
/// to set the status of).
///
/// https://cumulocity.com/guides/reference/smartrest-two/#104
pub fn service_status_update_message(external_ids: &[String], service_status: &str) -> Message {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid any confusion, as described in the comments above, and to keep this API consistent with the other APIs in this module like service_creation_message, I'd just take the service_id and ancestors as separate parameters, the latter without the target service_id in it. But while passing it to publish_topic_from_ancestors, you'll have to append them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like do do it properly later as part of a follow-up task.

@Bravo555
Copy link
Contributor Author

The tedge_to_te_converter must stop converting tedge/health/# messages.

Should be fixed in 87ac7f8.

@Bravo555 Bravo555 temporarily deployed to Test Pull Request October 17, 2023 16:41 — with GitHub Actions Inactive
@@ -30,9 +30,6 @@ impl TedgetoTeConverter {
}
topic if topic.name.starts_with("tedge/events") => self.convert_event(message),
topic if topic.name.starts_with("tedge/alarms") => self.convert_alarm(message),
topic if topic.name.starts_with("tedge/health") => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this was removed to avoid a certain bug discussed in an earlier comment. But, I'm wondering about the impact of this change on backward compatibility. I was of the impression that these conversions exist to make the transition of third-party component written against the old APIs smoother on our next release where they'll continue publishing to the older topics and we'll internally map them with this tedge_to_te_converter. So, removing this conversion support for tedge/health will break all third-party components publishing to the old health topics.

But, if tedge/health was only an internal API in the earlier releases, meant only to be used by our own components and not by third-party components, then this is okay. I'm just not sure if it was just an internal API. @didier-wenzek could you confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true that we could only exclude a white-list of tedge daemons. However I don't think this API has been used by many as it was not documented beyond tedge daemons.

@@ -99,26 +99,6 @@ Check health status of tedge-mapper-c8y service on broker restart

Custom Test Teardown

Check health status of child device service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed and not just adapted to use the new health endpoints? I understand that this test probably was relying on the tedge_to_te_converter earlier and that's why you removed. But, we could have just updated it to use the new te/.../health/status endpoints directly, unless this is duplicate and a test for child device service health status already exists elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have made a mistake with that one. We agreed with @reubenmiller that health status messages should not change entity properties, so it follows that they shouldn't also create entities by themselves either.
However I forgot about auto-registration mechanism, in which it is expected that an entity sending any message will be automatically registered in C8y, so the same goes for health status messages.

I'll try to restore and update the test.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Thank you for this sustained effort.

Follow-up tasks:

  • Introduce a new constructor for HealthMonitorBuilder from tedge-config to avoid repeating the same logic in all tedge daemons.
  • Add a user-configurable service topic id to all tedge daemons, instead of building it after the device id.
  • Consider to deprecate ServiceTopicId, EntityTopicId and Service, replacing their usage with EntityMetadata or intriducing an Entity type (see Move c8y-mapper to new health topics #2276 (comment)).
  • Consider to deprecate this idea of a service type at the level of thin-edge Entity. This should be related to the inventory i.e. data attached by the user to devices and services.

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but still review my last comment on an existing test that was removed, which might reduce our test coverage. I'm just hoping that it was removed for genuine reasons.

Also, create a list of planned follow-up tasks in a comment, or even better as issues. Group trivial and related ones together and feel free to raise multiple issues, if some are going to be more time consuming than others, and unrelated to other changes.

@Bravo555 Bravo555 temporarily deployed to Test Pull Request October 18, 2023 07:40 — with GitHub Actions Inactive
@Bravo555
Copy link
Contributor Author

Also, create a list of planned follow-up tasks in a comment, or even better as issues. Group trivial and related ones together and feel free to raise multiple issues, if some are going to be more time consuming than others, and unrelated to other changes.

Created a list of follow-up tasks in this issue: #2274

This commit updates tedge-mapper-c8y to subscribe to new health status
topics:

```
tedge/health/SERVICE_NAME
-> te/device/main/service/SERVICE_NAME/status/health

tedge/health/CHILD_ID/SERVICE_NAME
-> te/device/CHILD_ID/service/SERVICE_NAME/status/health
```

Producers of health status topics (i.e. different thin-edge services)
are not in the scope of this PR, however the tests for tedge-agent were
failing because after it terminated, the topic converter could not
convert a last will message to the new health status topic. As such,
only tedge-agent was updated to publish its health status messages to a
new health topic, introducing a new health topic API that will be used
by other services, however for now they still publish to old health
status topics, and these messages are converted by the tedge-agent's
topic converter.

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
A new `Service` type was created, containing topic ids of the service
itself and the associated device. Functions which need to take an entity
topic id of a service can use this type to ensure that the given
argument is indeed a service, as well as to obtain the topic id of the
associated device.

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
@didier-wenzek
Copy link
Contributor

Created a list of follow-up tasks in this issue: #2274

Thank you

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
@didier-wenzek
Copy link
Contributor

@Bravo555 Bravo555 temporarily deployed to Test Pull Request October 18, 2023 09:55 — with GitHub Actions Inactive
@Bravo555 Bravo555 merged commit f811fad into thin-edge:main Oct 18, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants