Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for software list & update on child devices #2307

Conversation

didier-wenzek
Copy link
Contributor

@didier-wenzek didier-wenzek commented Sep 29, 2023

Proposed changes

Implement software_list support as described by #2235

  • Use te MQTT scheme
  • Use commands instead of requests / responses for software list
  • Update the agent to publish software-list capability on start
  • Update the agent to respond to software-list requests
  • Update the c8y mapper to forward software list request to child devices when their agent starts.
  • Update the c8y mapper to request a software list request after a software-update operation (successful or not)

Implement software_update support as described by #2235

  • Use commands instead of requests / responses for software update
  • Update the agent to publish software-update capability on start
  • Update the agent to respond to software-update requests
  • Update the c8y mapper to to forward software update requests
  • Publish the list of supported software types along the capability message. See Use Softwaretype according to new Advanced Software Management in 10.14 #1352

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

crates/core/tedge_api/src/messages.rs Outdated Show resolved Hide resolved
crates/core/tedge_api/src/messages.rs Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 29, 2023

Codecov Report

Merging #2307 (352388d) into main (af49e4d) will decrease coverage by 0.4%.
The diff coverage is 79.3%.

Additional details and impacted files
Files Coverage Δ
crates/core/c8y_api/src/json_c8y.rs 87.0% <100.0%> (+0.2%) ⬆️
crates/core/c8y_api/src/smartrest/error.rs 0.0% <ø> (ø)
...re/c8y_api/src/smartrest/smartrest_deserializer.rs 93.3% <100.0%> (-0.2%) ⬇️
...core/c8y_api/src/smartrest/smartrest_serializer.rs 83.7% <ø> (-5.2%) ⬇️
crates/core/c8y_api/src/smartrest/topic.rs 92.1% <100.0%> (+4.1%) ⬆️
...s/core/tedge_agent/src/software_manager/builder.rs 75.0% <100.0%> (ø)
...dge_agent/src/tedge_operation_converter/builder.rs 91.3% <100.0%> (-0.7%) ⬇️
crates/core/tedge_api/src/lib.rs 100.0% <100.0%> (ø)
crates/core/tedge_api/src/mqtt_topics.rs 87.6% <100.0%> (-0.7%) ⬇️
crates/core/tedge_api/src/topic.rs 100.0% <ø> (+2.8%) ⬆️
... and 15 more

... and 5 files with indirect coverage changes

@github-actions
Copy link
Contributor

github-actions bot commented Sep 29, 2023

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
345 0 3 345 100 1h1m46.986999999s

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Happy to approve after a test run once the merge conflicts are resolved.

let smartrest_set_operation_status =
SmartRestSetOperationToExecuting::from_thin_edge_json(response)?.to_smartrest()?;
Ok(vec![Message::new(&topic, smartrest_set_operation_status)])
async fn register_software_list_operation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use this PR as an opportunity to refactor out all the software management related functions from this converter.rs module to a dedicated software_operation.rs, like log_upload.rs or config_operation.rs, so that this module is leaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it would be good to have operation specific modules or even actor extensions. However, I would prefer to address that later once both software list and software update fully ported to support child devices.

crates/extensions/c8y_mapper_ext/src/tests.rs Show resolved Hide resolved
Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to approve once the system test failures are sorted.

@didier-wenzek
Copy link
Contributor Author

The failing system test is related to self-update: thin-edge used to update thin-edge on the device.

So issue is that this test doesn't update tedge-agent but only tedge-mapper (and other tedge executables unrelated to the issue). But the new tedge-mapper is not compatible with the old tedge-agent on how a software-list is requested. Hence, the software list is is not updated on c8y, failing the test.

@reubenmiller
Copy link
Contributor

The failing system test is related to self-update: thin-edge used to update thin-edge on the device.

So issue is that this test doesn't update tedge-agent but only tedge-mapper (and other tedge executables unrelated to the issue). But the new tedge-mapper is not compatible with the old tedge-agent on how a software-list is requested. Hence, the software list is is not updated on c8y, failing the test.

So this is the first problem that has popped up due to the fact that the tedge-agent service is not restarted during the installation process, so the old instance continues running after the new version has been installed. We will have to solve this in the workflow, otherwise it will lead to very unexpected results for users.

@didier-wenzek didier-wenzek force-pushed the feat/tedge-agent-software-update-child-devices branch from 86d799f to d87ed9d Compare October 18, 2023 09:21
@didier-wenzek didier-wenzek force-pushed the feat/tedge-agent-software-update-child-devices branch from d87ed9d to 4f0655e Compare October 18, 2023 12:33
@didier-wenzek didier-wenzek force-pushed the feat/tedge-agent-software-update-child-devices branch 2 times, most recently from 21d3e01 to 8372b7d Compare October 18, 2023 16:30
@didier-wenzek didier-wenzek force-pushed the feat/tedge-agent-software-update-child-devices branch from 8372b7d to 2062b6c Compare October 18, 2023 17:03
Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core changes look good. Just some refactoring suggestions.

crates/core/tedge_agent/src/software_manager/actor.rs Outdated Show resolved Hide resolved
Execute Command sudo tedge config set mqtt.topic_root te
Execute Command sudo tedge config set mqtt.device_topic_id "device/${CHILD_SN}//"

# Install plugin after the default settings have been updated to prevent it from starting up as the main plugin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a huge UX issue that we'll have to solve somehow. Most customers wouldn't do it this way. They'd install everything first and then try to configure things, right? May be we should consider not starting tedge-agent by default, unless it was already up and running (during updates). It's probably not that easy considering self-update and other update paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we still have some conceptual work to do here as a follow up, however I wouldn't look at the exact solution until we consolidate the number of services/processes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However running thin-edge.io in a container is easier to change this as you can set an environment variable to affect the behaviour of each component. systemd does not support inheriting environment variables.

Payload: Default,
{
/// Build a new command with a random id
pub fn new(target: &EntityTopicId) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn new(target: &EntityTopicId) -> Self {
pub fn new(schema: MqttSchema, target: &EntityTopicId) -> Self {

Seeing the topic(), capability_message(), command_message() and clearing_message() all taking the schema separately, it feels like the schema should have been a part of the constructor. It's not something that's gonna change dynamically between these function calls, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the schema attached to a message looks upside down.

I would prefer to address this in a more generic manner as proposed here #2357

crates/extensions/c8y_http_proxy/src/actor.rs Outdated Show resolved Hide resolved
@@ -108,8 +107,15 @@ impl C8yMapperConfig {
let mut topics = Self::default_internal_topic_filter(&config_dir)?;

// Add feature topic filters
topics.add_all(mqtt_schema.topics(AnyEntity, Command(OperationType::Restart)));
topics.add_all(mqtt_schema.topics(AnyEntity, CommandMetadata(OperationType::Restart)));
for cmd in [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor:

Suggested change
for cmd in [
for cmd_type in [

Comment on lines +926 to +929
/*

#[test]
fn using_a_software_update_response() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove that test? Looks like an irrelevant test as software update response doesn't carry the software list anymore.

@didier-wenzek didier-wenzek force-pushed the feat/tedge-agent-software-update-child-devices branch from e81959f to ddbe805 Compare October 26, 2023 11:21
@didier-wenzek didier-wenzek force-pushed the feat/tedge-agent-software-update-child-devices branch from 92af9ad to 3e0466f Compare October 27, 2023 08:17
@reubenmiller reubenmiller added the theme:software Theme: Software management label Oct 27, 2023
@didier-wenzek didier-wenzek temporarily deployed to Test Pull Request October 27, 2023 10:32 — with GitHub Actions Inactive
@didier-wenzek didier-wenzek temporarily deployed to Test Pull Request October 27, 2023 12:02 — with GitHub Actions Inactive
Copy link
Contributor

@reubenmiller reubenmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Great effort!

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Device Should Have Installed Software tedge,${NEW_VERSION_ESCAPED}::apt tedge-mapper,${NEW_VERSION_ESCAPED}::apt tedge-agent,${NEW_VERSION_ESCAPED}::apt tedge-watchdog,${NEW_VERSION_ESCAPED}::apt tedge-configuration-plugin,${NEW_VERSION_ESCAPED}::apt tedge-log-plugin,${NEW_VERSION_ESCAPED}::apt tedge-apt-plugin,${NEW_VERSION_ESCAPED}::apt

# Software list reported by the new agent
Restart Service tedge-agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that end-user needs to explicitly restart the tedge-agent after every update? Or that has always been the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is the case since the work on self-update. Restarting the agent during a software update operation leads to a fail operation status.

This is something we will need to fix, sure.

return Ok(None);
}
request.insert("id".to_string(), Value::String(cmd_id.to_string()));
request.remove("status"); // as the old agent denies unknown fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor:

Suggested change
request.remove("status"); // as the old agent denies unknown fields
request.remove("status"); // as the old agent denies unknown status values like `init`

"Fail to extract command 'id' from agent {cmd_type} response: {}",
std::str::from_utf8(payload).unwrap_or("non utf8 payload")
))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add some unit test coverage, if there's time, especially for the convert_to_old_agent_request function which has some deep nested logic.

This is first step toward their deprecation.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
A huge change but straightforward.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
The aim is to be able to handle the case where the mapper has been updated but not the agent.
This will typically arises during a self update where the tedge packages are updated by the agent.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
@didier-wenzek didier-wenzek force-pushed the feat/tedge-agent-software-update-child-devices branch from 9555293 to 352388d Compare October 27, 2023 15:48
@didier-wenzek didier-wenzek temporarily deployed to Test Pull Request October 27, 2023 16:03 — with GitHub Actions Inactive
@didier-wenzek didier-wenzek merged commit 1fcce2c into thin-edge:main Oct 27, 2023
18 checks passed
@didier-wenzek didier-wenzek deleted the feat/tedge-agent-software-update-child-devices branch October 27, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme:software Theme: Software management
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants