Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the flaky alarms tests #1765

Merged
merged 1 commit into from
Mar 2, 2023

Conversation

PradeepKiruvale
Copy link
Contributor

Proposed changes

This PR fixes the flaky alarm tests

  • Run tests in serial
  • Fix the issues in the way the results validated

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request February 27, 2023 14:32 — with GitHub Actions Inactive
@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request February 27, 2023 18:06 — with GitHub Actions Inactive
@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request February 27, 2023 18:08 — with GitHub Actions Inactive
Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. I have some questions though.

@@ -1274,6 +1244,7 @@ fn extract_child_id(in_topic: &str, expected_child_id: Option<String>) {
}

#[tokio::test]
#[serial]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have all these tests serial?

They behave correctly even without serial when both c8y_mapper_alarm_empty_payload and c8y_mapper_child_alarm_empty_payload are ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. But, I observed that the test c8y-converter needs the broker, to be on the safer side I made them run serially.

Comment on lines -518 to +512
"tedge/alarms/major/temperature_alarm",
"tedge/alarms/major/temperature_alarm/external_sensor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this instance of tedge/alarms/major/temperature_alarm is replaced by tedge/alarms/major/temperature_alarm/external_sensor while the others (above) are replaced by tedge/alarms/major/custom_temperature_alarm"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there was a typo in the topic, the alarm that was sent on "tedge/alarms/major/temperature_alarm/external_sensor had to be cleared sending an empty message on the same topic.

@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request February 27, 2023 21:21 — with GitHub Actions Inactive
Signed-off-by: Pradeep Kumar K J <pradeepkumar.kj@softwareag.com>
while let Ok(Some(msg)) = messages.next().with_timeout(TEST_TIMEOUT_MS).await {
assert_json_include!(actual:serde_json::from_str::<serde_json::Value>(&msg).unwrap(), expected:expected_msg);
}
let expected_msg = r#"{"severity":"MAJOR","type":"custom_temperature_alarm","time":"2023-01-25T18:41:14.776170774Z","text":"Temperature high","customFragment":{"nested":{"value":"extra info"}}}"#;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that you converted this to raw string to use the assert_received_all_expected function that expects string. You could have used the to_string() function on the earlier json::Value as well, right? So that you get a string without affecting the code readability. It doesn't matter in this specific case as the JSON was not formatted properly in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats right. I felt converting from json to string is not necessary if a raw string is used directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using json! is just for better readability, especially with multi-level JSON structs. More of a good to have feature than a must.


// Expect converted temperature alarm message
mqtt_tests::assert_received_all_expected(&mut messages, TEST_TIMEOUT_MS, &[expected_msg]).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still problematic as an assertion failure will prevent the cleanup logic below(clearing alarms) from executing. Not just in this test, but all alarm tests relying on this cleanup behaviour suffers from the same as you discovered yesterday.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I observed is that these tests run in serial and when the next test starts it will create a new broker instance, which will be a clean start and no persisted messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we don't even need that "alarm cleanup" step in all these tests, huh? Fine then. You can remove those cleanup steps if it's useless anyway. But, up-to you.

@didier-wenzek
Copy link
Contributor

This PR has to be rebased to get the fix for test report generation.

@PradeepKiruvale PradeepKiruvale temporarily deployed to Test Pull Request February 28, 2023 13:27 — with GitHub Actions Inactive
Copy link
Member

@rina23q rina23q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a question only.

r#"{ "text": "Temperature high","time":"2023-01-25T18:41:14.776170774Z","customFragment": {"nested":{"value": "extra info"}} }"#,
mqtt_channel::QoS::AtLeastOnce,
true,
)
.await
.unwrap();

let expected_msg = json!({"severity":"MAJOR","type":"temperature_alarm","time":"2023-01-25T18:41:14.776170774Z","text":"Temperature high","customFragment":{"nested":{"value":"extra info"}}});

while let Ok(Some(msg)) = messages.next().with_timeout(TEST_TIMEOUT_MS).await {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the sporadically test failure is fixed by replacing this block by mqtt_tests::assert_received_all_expected? It looks all other tests have this replacement.

I am just curious, because other locations than alarm also use this while let block for testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, using mqtt_tests::assert_received_all_expected fixed the issue. Before I was trying to assert the json objects and was relying on the timeout error when it does not receive. But, the call with_timeout does not return an error, and because of this, the tests were always passing.
I did check at least two other places where while let is being used, felt it's okay.

@github-actions
Copy link
Contributor

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass %
137 0 5 137 100

Passed Tests

Name ⏱️ Duration Suite
Define Child device 1 ID 0.005 s C8Y Child Alarms Rpi
Normal case when the child device does not exist on c8y cloud 1.655 s C8Y Child Alarms Rpi
Normal case when the child device already exists 0.73 s C8Y Child Alarms Rpi
Reconciliation when the new alarm message arrives, restart the mapper 1.8820000000000001 s C8Y Child Alarms Rpi
Reconciliation when the alarm that is cleared 5.484 s C8Y Child Alarms Rpi
Prerequisite Parent 18.434 s Child Conf Mgmt Plugin
Prerequisite Child 0.369 s Child Conf Mgmt Plugin
Child device bootstrapping 13.758 s Child Conf Mgmt Plugin
Snapshot from device 23.288 s Child Conf Mgmt Plugin
Child device config update 16.303 s Child Conf Mgmt Plugin
Configuration types should be detected on file change (without restarting service) 45.992 s Inotify Crate
Child devices support sending simple measurements 45.258 s Child Device Telemetry
Child devices support sending custom measurements 47.257 s Child Device Telemetry
Child devices support sending custom events 41.641 s Child Device Telemetry
Child devices support sending custom events overriding the type 34.64 s Child Device Telemetry
Child devices support sending custom alarms #1699 32.949 s Child Device Telemetry
Child devices support sending inventory data via c8y topic 23.435 s Child Device Telemetry
Main device support sending inventory data via c8y topic 21.586 s Child Device Telemetry
Successful firmware operation 59.382 s Firmware Operation
Install with empty firmware name 53.464 s Firmware Operation
Supports restarting the device 73.997 s Restart Device
Update tedge version from previous using Cumulocity 105.273 s Tedge Self Update
Successful shell command with output 3.755 s Shell Operation
Check Successful shell command with literal double quotes output 3.171 s Shell Operation
Execute multiline shell command 3.006 s Shell Operation
Failed shell command 3.018 s Shell Operation
Software list should be populated during startup 51.834 s Software
Install software via Cumulocity 66.688 s Software
Software list should only show currently installed software and not candidates 42.299 s Software
Stop tedge-agent service 0.254 s Log Path Config
Customize the log path 0.109 s Log Path Config
Initialize tedge-agent 0.15 s Log Path Config
Check created folders 0.107 s Log Path Config
Remove created custom folders 0.098 s Log Path Config
Install latest via script (from current branch) 28.404 s Install Tedge
Install specific version via script (from current branch) 15.576 s Install Tedge
Install latest tedge via script (from main branch) 24.422 s Install Tedge
Support starting and stopping services 37.792 s Service-Control
Supports a reconnect 47.017 s Test-Commands
Supports disconnect then connect 47.801 s Test-Commands
Update unknown setting 27.375 s Test-Commands
Update known setting 23.384 s Test-Commands
Stop c8y-configuration-plugin 0.115 s Health C8Y-Configuration-Plugin
Update the service file 0.128 s Health C8Y-Configuration-Plugin
Reload systemd files 0.715 s Health C8Y-Configuration-Plugin
Start c8y-configuration-plugin 0.132 s Health C8Y-Configuration-Plugin
Start watchdog service 10.389 s Health C8Y-Configuration-Plugin
Check PID of c8y-configuration-plugin 0.105 s Health C8Y-Configuration-Plugin
Kill the PID 0.151 s Health C8Y-Configuration-Plugin
Recheck PID of c8y-configuration-plugin 2.349 s Health C8Y-Configuration-Plugin
Compare PID change 0.001 s Health C8Y-Configuration-Plugin
Stop watchdog service 0.274 s Health C8Y-Configuration-Plugin
Remove entry from service file 0.178 s Health C8Y-Configuration-Plugin
Stop c8y-log-plugin 0.195 s Health C8Y-Log-Plugin
Update the service file 0.243 s Health C8Y-Log-Plugin
Reload systemd files 0.899 s Health C8Y-Log-Plugin
Start c8y-log-plugin 0.311 s Health C8Y-Log-Plugin
Start watchdog service 10.364 s Health C8Y-Log-Plugin
Check PID of c8y-log-plugin 0.117 s Health C8Y-Log-Plugin
Kill the PID 0.084 s Health C8Y-Log-Plugin
Recheck PID of c8y-log-plugin 2.155 s Health C8Y-Log-Plugin
Compare PID change 0.001 s Health C8Y-Log-Plugin
Stop watchdog service 0.122 s Health C8Y-Log-Plugin
Remove entry from service file 0.118 s Health C8Y-Log-Plugin
Stop tedge-mapper 0.224 s Health Tedge Mapper C8Y
Update the service file 0.245 s Health Tedge Mapper C8Y
Reload systemd files 0.894 s Health Tedge Mapper C8Y
Start tedge-mapper 0.249 s Health Tedge Mapper C8Y
Start watchdog service 10.35 s Health Tedge Mapper C8Y
Check PID of tedge-mapper 0.098 s Health Tedge Mapper C8Y
Kill the PID 0.093 s Health Tedge Mapper C8Y
Recheck PID of tedge-mapper 2.167 s Health Tedge Mapper C8Y
Compare PID change 0.002 s Health Tedge Mapper C8Y
Stop watchdog service 0.072 s Health Tedge Mapper C8Y
Remove entry from service file 0.058 s Health Tedge Mapper C8Y
Stop tedge-agent 0.22 s Health Tedge-Agent
Update the service file 0.115 s Health Tedge-Agent
Reload systemd files 0.688 s Health Tedge-Agent
Start tedge-agent 0.142 s Health Tedge-Agent
Start watchdog service 10.363 s Health Tedge-Agent
Check PID of tedge-mapper 0.046 s Health Tedge-Agent
Kill the PID 0.066 s Health Tedge-Agent
Recheck PID of tedge-agent 2.259 s Health Tedge-Agent
Compare PID change 0.002 s Health Tedge-Agent
Stop watchdog service 0.276 s Health Tedge-Agent
Remove entry from service file 0.127 s Health Tedge-Agent
Stop tedge-mapper-az 0.19 s Health Tedge-Mapper-Az
Update the service file 0.223 s Health Tedge-Mapper-Az
Reload systemd files 0.716 s Health Tedge-Mapper-Az
Start tedge-mapper-az 0.158 s Health Tedge-Mapper-Az
Start watchdog service 10.233 s Health Tedge-Mapper-Az
Check PID of tedge-mapper-az 0.064 s Health Tedge-Mapper-Az
Kill the PID 0.158 s Health Tedge-Mapper-Az
Recheck PID of tedge-agent 2.209 s Health Tedge-Mapper-Az
Compare PID change 0.001 s Health Tedge-Mapper-Az
Stop watchdog service 0.212 s Health Tedge-Mapper-Az
Remove entry from service file 0.189 s Health Tedge-Mapper-Az
Stop tedge-mapper-collectd 0.179 s Health Tedge-Mapper-Collectd
Update the service file 0.177 s Health Tedge-Mapper-Collectd
Reload systemd files 0.766 s Health Tedge-Mapper-Collectd
Start tedge-mapper-collectd 0.175 s Health Tedge-Mapper-Collectd
Start watchdog service 10.236 s Health Tedge-Mapper-Collectd
Check PID of tedge-mapper-collectd 0.105 s Health Tedge-Mapper-Collectd
Kill the PID 0.164 s Health Tedge-Mapper-Collectd
Recheck PID of tedge-mapper-collectd 2.431 s Health Tedge-Mapper-Collectd
Compare PID change 0.001 s Health Tedge-Mapper-Collectd
Stop watchdog service 0.173 s Health Tedge-Mapper-Collectd
Remove entry from service file 0.164 s Health Tedge-Mapper-Collectd
c8y-log-plugin health status 5.633 s MQTT health endpoints
c8y-configuration-plugin health status 5.606 s MQTT health endpoints
Wrong package name 0.173 s Improve Tedge Apt Plugin Error Messages
Wrong version 0.128 s Improve Tedge Apt Plugin Error Messages
Wrong type 0.317 s Improve Tedge Apt Plugin Error Messages
tedge_connect_test_positive 0.669 s Tedge Connect Test
tedge_connect_test_negative 1.268 s Tedge Connect Test
tedge_connect_test_sm_services 6.633 s Tedge Connect Test
tedge_disconnect_test_sm_services 0.395 s Tedge Connect Test
Install thin-edge.io 11.539 s Call Tedge
call tedge -V 0.149 s Call Tedge
call tedge -h 0.145 s Call Tedge
call tedge -h -V 0.154 s Call Tedge
call tedge help 0.091 s Call Tedge
tedge config list 0.149 s Call Tedge Config List
tedge config list --all 0.12 s Call Tedge Config List
set/unset device.type 0.817 s Call Tedge Config List
set/unset device.key.path 0.587 s Call Tedge Config List
set/unset device.cert.path 0.502 s Call Tedge Config List
set/unset c8y.root.cert.path 0.419 s Call Tedge Config List
set/unset c8y.smartrest.templates 0.367 s Call Tedge Config List
set/unset az.root.cert.path 0.421 s Call Tedge Config List
set/unset az.mapper.timestamp 0.742 s Call Tedge Config List
set/unset mqtt.bind_address 0.522 s Call Tedge Config List
set/unset mqtt.port 0.588 s Call Tedge Config List
set/unset tmp.path 0.335 s Call Tedge Config List
set/unset logs.path 0.245 s Call Tedge Config List
set/unset run.path 0.245 s Call Tedge Config List
Get Put Delete 4.006 s Http File Transfer Api

while let Ok(Some(msg)) = messages.next().with_timeout(TEST_TIMEOUT_MS).await {
assert_json_include!(actual:serde_json::from_str::<serde_json::Value>(&msg).unwrap(), expected:expected_msg);
}
let expected_msg = r#"{"severity":"MAJOR","type":"custom_temperature_alarm","time":"2023-01-25T18:41:14.776170774Z","text":"Temperature high","customFragment":{"nested":{"value":"extra info"}}}"#;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using json! is just for better readability, especially with multi-level JSON structs. More of a good to have feature than a must.


// Expect converted temperature alarm message
mqtt_tests::assert_received_all_expected(&mut messages, TEST_TIMEOUT_MS, &[expected_msg]).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we don't even need that "alarm cleanup" step in all these tests, huh? Fine then. You can remove those cleanup steps if it's useless anyway. But, up-to you.

@PradeepKiruvale PradeepKiruvale merged commit f95be3a into thin-edge:main Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants