Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a health-check actor for c8y-device-management plugin #1815

Merged
merged 1 commit into from
Mar 17, 2023

Conversation

PradeepKiruvale
Copy link
Contributor

@PradeepKiruvale PradeepKiruvale commented Mar 16, 2023

Proposed changes

Code changes for adding a health-check actor.
This actor responds to the mqtt requests on tedge/health-check/# topics about the status of the service on tedge/health/<service-name> topic.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#1768

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request March 16, 2023 13:41 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2023

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass %
152 0 5 152 100

Passed Tests

Name ⏱️ Duration Suite
Define Child device 1 ID 0.006 s C8Y Child Alarms Rpi
Normal case when the child device does not exist on c8y cloud 3.017 s C8Y Child Alarms Rpi
Normal case when the child device already exists 0.875 s C8Y Child Alarms Rpi
Reconciliation when the new alarm message arrives, restart the mapper 1.3980000000000001 s C8Y Child Alarms Rpi
Reconciliation when the alarm that is cleared 6.116 s C8Y Child Alarms Rpi
Prerequisite Parent 21.387 s Child Conf Mgmt Plugin
Prerequisite Child 0.293 s Child Conf Mgmt Plugin
Child device bootstrapping 16.667 s Child Conf Mgmt Plugin
Snapshot from device 19.381 s Child Conf Mgmt Plugin
Child device config update 17.564 s Child Conf Mgmt Plugin
Configuration types should be detected on file change (without restarting service) 44.406 s Inotify Crate
Child devices support sending simple measurements 58.081 s Child Device Telemetry
Child devices support sending custom measurements 54.385 s Child Device Telemetry
Child devices support sending custom events 56.067 s Child Device Telemetry
Child devices support sending custom events overriding the type 47.162 s Child Device Telemetry
Child devices support sending custom alarms #1699 45.179 s Child Device Telemetry
Child devices support sending inventory data via c8y topic 23.895 s Child Device Telemetry
Main device support sending inventory data via c8y topic 24.763 s Child Device Telemetry
Successful firmware operation 71.476 s Firmware Operation
Install with empty firmware name 61.105 s Firmware Operation
Prerequisite Parent 22.989 s Firmware Operation Child Device
Prerequisite Child 7.9030000000000005 s Firmware Operation Child Device
Child device firmware update 6.172 s Firmware Operation Child Device
Child device firmware update with cache 6.344 s Firmware Operation Child Device
Retrieve a JWT tokens 53.687 s Jwt Request
Supports restarting the device 90.367 s Restart Device
Update tedge version from previous using Cumulocity 131.954 s Tedge Self Update
Test if all c8y services are up 71.337 s Service Monitoring
Test if all c8y services are down 65.404 s Service Monitoring
Test if all c8y services are using configured service type 59.817 s Service Monitoring
Test if all c8y services using default service type when service type configured as empty 60.192 s Service Monitoring
Check health status of tedge-mapper-c8y service on broker restart 33.396 s Service Monitoring
Check health status of child device service 23.901 s Service Monitoring
Successful shell command with output 3.423 s Shell Operation
Check Successful shell command with literal double quotes output 3.23 s Shell Operation
Execute multiline shell command 3.038 s Shell Operation
Failed shell command 3.539 s Shell Operation
Software list should be populated during startup 58.456 s Software
Install software via Cumulocity 78.253 s Software
Software list should only show currently installed software and not candidates 48.397 s Software
Stop tedge-agent service 0.592 s Log Path Config
Customize the log path 0.337 s Log Path Config
Initialize tedge-agent 0.284 s Log Path Config
Check created folders 0.129 s Log Path Config
Remove created custom folders 0.17 s Log Path Config
Install thin-edge via apt 62.65 s Install Apt
Install latest via script (from current branch) 29.652 s Install Tedge
Install specific version via script (from current branch) 31.517 s Install Tedge
Install latest tedge via script (from main branch) 22.355 s Install Tedge
Support starting and stopping services 49.112 s Service-Control
Supports a reconnect 59.538 s Test-Commands
Supports disconnect then connect 57.176 s Test-Commands
Update unknown setting 43.344 s Test-Commands
Update known setting 33.851 s Test-Commands
Stop c8y-configuration-plugin 0.525 s Health C8Y-Configuration-Plugin
Update the service file 0.275 s Health C8Y-Configuration-Plugin
Reload systemd files 1.073 s Health C8Y-Configuration-Plugin
Start c8y-configuration-plugin 0.299 s Health C8Y-Configuration-Plugin
Start watchdog service 10.311 s Health C8Y-Configuration-Plugin
Check PID of c8y-configuration-plugin 0.173 s Health C8Y-Configuration-Plugin
Kill the PID 0.22 s Health C8Y-Configuration-Plugin
Recheck PID of c8y-configuration-plugin 2.342 s Health C8Y-Configuration-Plugin
Compare PID change 0.002 s Health C8Y-Configuration-Plugin
Stop watchdog service 0.18 s Health C8Y-Configuration-Plugin
Remove entry from service file 0.188 s Health C8Y-Configuration-Plugin
Stop c8y-log-plugin 0.363 s Health C8Y-Log-Plugin
Update the service file 0.26 s Health C8Y-Log-Plugin
Reload systemd files 1.021 s Health C8Y-Log-Plugin
Start c8y-log-plugin 0.294 s Health C8Y-Log-Plugin
Start watchdog service 10.378 s Health C8Y-Log-Plugin
Check PID of c8y-log-plugin 0.271 s Health C8Y-Log-Plugin
Kill the PID 0.208 s Health C8Y-Log-Plugin
Recheck PID of c8y-log-plugin 2.446 s Health C8Y-Log-Plugin
Compare PID change 0.002 s Health C8Y-Log-Plugin
Stop watchdog service 0.147 s Health C8Y-Log-Plugin
Remove entry from service file 0.272 s Health C8Y-Log-Plugin
Stop tedge-mapper 0.23 s Health Tedge Mapper C8Y
Update the service file 0.345 s Health Tedge Mapper C8Y
Reload systemd files 1.089 s Health Tedge Mapper C8Y
Start tedge-mapper 0.457 s Health Tedge Mapper C8Y
Start watchdog service 10.44 s Health Tedge Mapper C8Y
Check PID of tedge-mapper 0.189 s Health Tedge Mapper C8Y
Kill the PID 0.331 s Health Tedge Mapper C8Y
Recheck PID of tedge-mapper 2.441 s Health Tedge Mapper C8Y
Compare PID change 0.01 s Health Tedge Mapper C8Y
Stop watchdog service 0.34 s Health Tedge Mapper C8Y
Remove entry from service file 0.418 s Health Tedge Mapper C8Y
Stop tedge-agent 0.327 s Health Tedge-Agent
Update the service file 0.169 s Health Tedge-Agent
Reload systemd files 0.783 s Health Tedge-Agent
Start tedge-agent 0.176 s Health Tedge-Agent
Start watchdog service 10.338 s Health Tedge-Agent
Check PID of tedge-mapper 0.128 s Health Tedge-Agent
Kill the PID 0.142 s Health Tedge-Agent
Recheck PID of tedge-agent 2.2720000000000002 s Health Tedge-Agent
Compare PID change 0.001 s Health Tedge-Agent
Stop watchdog service 0.218 s Health Tedge-Agent
Remove entry from service file 0.213 s Health Tedge-Agent
Stop tedge-mapper-az 0.144 s Health Tedge-Mapper-Az
Update the service file 0.112 s Health Tedge-Mapper-Az
Reload systemd files 0.381 s Health Tedge-Mapper-Az
Start tedge-mapper-az 0.158 s Health Tedge-Mapper-Az
Start watchdog service 10.168 s Health Tedge-Mapper-Az
Check PID of tedge-mapper-az 0.132 s Health Tedge-Mapper-Az
Kill the PID 0.213 s Health Tedge-Mapper-Az
Recheck PID of tedge-agent 2.251 s Health Tedge-Mapper-Az
Compare PID change 0.001 s Health Tedge-Mapper-Az
Stop watchdog service 0.164 s Health Tedge-Mapper-Az
Remove entry from service file 0.146 s Health Tedge-Mapper-Az
Stop tedge-mapper-collectd 0.386 s Health Tedge-Mapper-Collectd
Update the service file 0.227 s Health Tedge-Mapper-Collectd
Reload systemd files 0.716 s Health Tedge-Mapper-Collectd
Start tedge-mapper-collectd 0.258 s Health Tedge-Mapper-Collectd
Start watchdog service 10.285 s Health Tedge-Mapper-Collectd
Check PID of tedge-mapper-collectd 0.291 s Health Tedge-Mapper-Collectd
Kill the PID 0.46 s Health Tedge-Mapper-Collectd
Recheck PID of tedge-mapper-collectd 2.458 s Health Tedge-Mapper-Collectd
Compare PID change 0.001 s Health Tedge-Mapper-Collectd
Stop watchdog service 0.385 s Health Tedge-Mapper-Collectd
Remove entry from service file 0.31 s Health Tedge-Mapper-Collectd
c8y-log-plugin health status 6.097 s MQTT health endpoints
c8y-configuration-plugin health status 6.007 s MQTT health endpoints
Wrong package name 0.533 s Improve Tedge Apt Plugin Error Messages
Wrong version 0.287 s Improve Tedge Apt Plugin Error Messages
Wrong type 0.921 s Improve Tedge Apt Plugin Error Messages
tedge_connect_test_positive 0.873 s Tedge Connect Test
tedge_connect_test_negative 3.191 s Tedge Connect Test
tedge_connect_test_sm_services 9.432 s Tedge Connect Test
tedge_disconnect_test_sm_services 1.869 s Tedge Connect Test
Install thin-edge.io 20.453 s Call Tedge
call tedge -V 0.122 s Call Tedge
call tedge -h 0.15 s Call Tedge
call tedge -h -V 0.161 s Call Tedge
call tedge help 0.116 s Call Tedge
tedge config list 0.164 s Call Tedge Config List
tedge config list --all 0.183 s Call Tedge Config List
set/unset device.type 0.728 s Call Tedge Config List
set/unset device.key.path 0.84 s Call Tedge Config List
set/unset device.cert.path 0.945 s Call Tedge Config List
set/unset c8y.root.cert.path 0.648 s Call Tedge Config List
set/unset c8y.smartrest.templates 0.73 s Call Tedge Config List
set/unset az.root.cert.path 0.894 s Call Tedge Config List
set/unset az.mapper.timestamp 1.316 s Call Tedge Config List
set/unset mqtt.bind_address 1.313 s Call Tedge Config List
set/unset mqtt.port 1.3439999999999999 s Call Tedge Config List
set/unset tmp.path 0.901 s Call Tedge Config List
set/unset logs.path 0.802 s Call Tedge Config List
set/unset run.path 0.594 s Call Tedge Config List
Get Put Delete 7.352 s Http File Transfer Api
Set keys should return value on stdout 0.195 s Tedge Config Get
Unset keys should not return anything on stdout and warnings on stderr 0.399 s Tedge Config Get
Invalid keys should not return anything on stdout and warnings on stderr 0.549 s Tedge Config Get

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actor implementation itself is correct but its integration must be fixed.

Comment on lines 13 to 17
anyhow = "1.0.69"
async-trait = "0.1"
futures = { version = "0.3" }
log = "0.4"
mqtt_channel = { path = "../../common/mqtt_channel" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these two extra dependencies on anyhow and mqtt_channel

Comment on lines 57 to 59

#[error(transparent)]
MqttConnectionError(#[from] mqtt_channel::MqttError),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is too specific to an error defined at this level.

Suggested change
#[error(transparent)]
MqttConnectionError(#[from] mqtt_channel::MqttError),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore, removed.

@@ -17,8 +17,10 @@ c8y_log_manager = { path = "../../extensions/c8y_log_manager" }
env_logger = "0.10"
log = "0.4"
tedge_actors = { path = "../../core/tedge_actors" }
tedge_api = { path = "../../core/tedge_api" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to avoid a direct dependency to tedge_api.

A simple option, to fix that, is to re-export health_status_down_message in tedge_health_ex.

A better option could be to add a set_init_and_last_will method` to the health actor builder to update the MQTT config with the init and last-will messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a function set_init_and_last_will. I tried re-exporting those health status helpers functions from tedge_api. But could not avoid the Cargo.toml entry.

repository = { workspace = true }

[dependencies]
anyhow = "1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A library should not depend on anyhow. See anyhow comparison to thiserror.

Suggested change
anyhow = "1.0"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 47 to 50
#[derive(Error, Debug, Clone, PartialEq, Eq)]
#[derive(Error, Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reasons to remove these trait implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This HealthMonitorActor actor is correctly implemented.

crates/extensions/tedge_health_ext/src/lib.rs Show resolved Hide resolved
mqtt_config
.clone()
.with_session_name(PLUGIN_NAME)
.with_last_will_message(health_status_down_message(PLUGIN_NAME)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init message is missing and must be added.

One way, to avoid this kind of miss, is to add a set_init_and_last_will method to the health actor builder to update the MQTT config with the init and last-will messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request March 16, 2023 17:18 — with GitHub Actions Inactive
@@ -68,6 +69,11 @@ async fn main() -> anyhow::Result<()> {
log_actor.with_c8y_http_proxy(&mut c8y_http_proxy_actor)?;
log_actor.with_mqtt_connection(&mut mqtt_actor)?;

//Instantiate health monitor actor
let health_actor = HealthMonitorBuilder::new(PLUGIN_NAME);
mqtt_actor.mqtt_config = health_actor.set_init_and_last_will(mqtt_actor.mqtt_config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although there's nothing wrong with this approach, I wish this init and last-will messages were also exchanged with the MQTT actor via the Config object of the ServiceConsumer to make it similar to how the TopicFilters are exchanged. But, since the init and last-will messages can only be registered once with the MQTT actor, this is probably fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It would be indeed nice to have this done when the health actor is connected to the mqtt actor. Regarding the issue that at most one last will message can be set, one can simply raise a LinkError if two actors attempt to set a last-will. This can be done as a follow-up task.

@@ -32,6 +32,9 @@ pub enum RuntimeError {

#[error(transparent)]
LinkError(#[from] LinkError),

#[error(transparent)]
HealthMonitorError(#[from] anyhow::Error),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding an actor specific error to the core actor crate? If a health-check related error really needs to be a first class citizen in the actor crate (I doubt that's the case), then better to define that as a concrete type rather than as a wrapper over anyhow:Error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -10,6 +10,7 @@ homepage = { workspace = true }
repository = { workspace = true }

[dependencies]
anyhow = "1.0.69"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks wrong. We've been trying to remove this dependency so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed, forgot to push the changes.

Comment on lines 19 to 20
pub use health::health_status_down_message;
pub use health::health_status_up_message;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really feels like these 2 functions better belong in the tedge_health_ext crate itself as every plugin wanting the health-check functionality is expected to depend on this crate. But, since all the older plugins are also relying on the same, we can consider moving these in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These pub use are useless here.

use tedge_api::health_status_up_message;

pub struct HealthMonitorActor {
daemon_to_be_monitored: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
daemon_to_be_monitored: String,
daemon_name: String,

Optional

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to approved once reverted the useless change in tedge_api.

Comment on lines 8 to 9
use tedge_api::health_status_down_message;
use tedge_api::health_status_up_message;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note. These two use could be pub.

This is no more required with the HealthMonitorBuilder::set_init_and_last_will()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Pradeep Kumar K J <pradeepkumar.kj@softwareag.com>
Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. The tests can be improved though.

dbg!(message.payload_str().unwrap());
assert!(message.payload_str()?.contains("up"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could be a bit more strict. What about the topic for instance?

  • [minor] please remove the dbg!

Comment on lines 37 to 36
dbg!(message.payload_str().unwrap());
assert!(message.payload_str()?.contains("up"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could be a bit more strict. What about the topic for instance?

  • [minor] please remove the dbg!

Comment on lines 47 to 46
let runtime_events_logger = None;
let mut runtime = Runtime::try_new(runtime_events_logger).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay. However, technically, you don't need a runtime to test the actor.
This would even be better because, here you do not control when the runtime is finalized.

Suggested change
let runtime_events_logger = None;
let mut runtime = Runtime::try_new(runtime_events_logger).await?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored and its not required anymore

let health_actor = HealthMonitorBuilder::new(service_to_be_monitored);

let health_actor = health_actor.with_connection(&mut health_mqtt_builder);
runtime.spawn(health_actor).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay. However, using a runtime is not required.

Suggested change
runtime.spawn(health_actor).await?;
let (actor, message_box) = health_mqtt_builder.build();
tokio::spawn(actor.run(message_box));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code as suggested above.

@PradeepKiruvale PradeepKiruvale force-pushed the health-actor branch 2 times, most recently from 7bc2a74 to 15fefe9 Compare March 17, 2023 14:15
@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request March 17, 2023 14:23 — with GitHub Actions Inactive
@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request March 17, 2023 14:51 — with GitHub Actions Inactive
@PradeepKiruvale PradeepKiruvale merged commit d0f248d into thin-edge:main Mar 17, 2023
didier-wenzek pushed a commit to didier-wenzek/thin-edge.io that referenced this pull request Mar 21, 2023
Signed-off-by: Pradeep Kumar K J <pradeepkumar.kj@softwareag.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants