Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make entity store persistent #2428 #2522

Merged

Conversation

albinsuresh
Copy link
Contributor

@albinsuresh albinsuresh commented Dec 14, 2023

Proposed changes

Persist the entity store as a JSON lines file.
Every registration message and twin data message is persisted as JSON lines. On startup the in-memory entity store is rebuilt by replaying these messages.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link

codecov bot commented Dec 14, 2023

Codecov Report

Merging #2522 (93c0cda) into main (19c39b7) will increase coverage by 0.0%.
Report is 1 commits behind head on main.
The diff coverage is 86.2%.

Additional details and impacted files
Files Coverage Δ
crates/core/c8y_api/src/json_c8y.rs 90.0% <100.0%> (+0.2%) ⬆️
crates/core/tedge_api/src/lib.rs 100.0% <ø> (ø)
crates/extensions/c8y_mapper_ext/src/config.rs 44.3% <100.0%> (+0.7%) ⬆️
crates/extensions/c8y_mapper_ext/src/converter.rs 81.8% <100.0%> (+<0.1%) ⬆️
...s/extensions/c8y_mapper_ext/src/service_monitor.rs 97.0% <100.0%> (+0.1%) ⬆️
crates/extensions/c8y_mapper_ext/src/tests.rs 91.7% <100.0%> (+<0.1%) ⬆️
crates/extensions/tedge_mqtt_ext/src/lib.rs 67.8% <ø> (ø)
crates/extensions/c8y_mapper_ext/src/actor.rs 77.7% <0.0%> (-0.4%) ⬇️
crates/extensions/c8y_mapper_ext/src/inventory.rs 83.7% <66.6%> (ø)
crates/core/tedge_api/src/message_log.rs 84.0% <84.0%> (ø)
... and 2 more

... and 5 files with indirect coverage changes

Copy link
Contributor

github-actions bot commented Dec 14, 2023

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
376 0 3 376 100 57m55.892s

crates/core/tedge_api/src/message_log.rs Outdated Show resolved Hide resolved
/// );
/// ```
pub struct EntityStore {
mqtt_schema: MqttSchema,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is useful to couple the entity store to a schema. The less coupling, the better.

It seems that the schema is used only to store/load EntityTwinMessage and EntityRegistrationMessage using their MQTT representation. Why not simply encapsulate these 2 cases under some EntityStoreMessage enum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was one of the options that I had evaluated in the last PR, as described in this comment. The main reason why I moved away from that PersistentMessage enum to raw MqttMessage was to cover the possibility of making more message types persistent in future, especially the ones like measurements that don't have a concrete struct representation.

crates/core/tedge_api/src/entity_store.rs Outdated Show resolved Hide resolved
Comment on lines 208 to 210
pub fn load_from_message_log(&mut self) {
info!("Loading the entity store from the log");
match self.message_log.reader() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be better to transform this method into a function taking mutable references to an entity store and a log of messages; populating the store from these messages.

A bit more involved, but I would also try to persist the entity store from outside, i.e with a function taking the store and the message log. This can be done by using an internal vector of non-persisted-yet messages.

Doing so:

  • The entity store will stay a pure data structure with no side effects.
  • The load and persist methods can be made async.
  • The logic would be simpler avoiding this usage of boolean flags to control in depth when something needs to be persisted or not.
  • The coupling to the mqtt_schema could also be removed that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, this would be deferred for later.

@didier-wenzek
Copy link
Contributor

didier-wenzek commented Dec 17, 2023

Persisting the raw MQTT messages is a bit opaque:

{"topic":{"name":"te/device/main///twin/c8y_Agent"},"payload":[123,34,110,97,109,101,34,58,34,116,104,105,110,45,101,100,103,101,46,105,111,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,
104,105,110,45,101,100,103,101,46,105,111,34,44,34,118,101,114,115,105,111,110,34,58,34,48,46,49,51,46,49,34,125],"qos":1,"retain":true}

This is even strange as what matters is cryptic (payload), while what is useless is clear (qos, retain).

@albinsuresh
Copy link
Contributor Author

Persisting the raw MQTT messages is a bit opaque:

{"topic":{"name":"te/device/main///twin/c8y_Agent"},"payload":[123,34,110,97,109,101,34,58,34,116,104,105,110,45,101,100,103,101,46,105,111,34,44,34,117,114,108,34,58,34,104,116,116,112,115,58,47,47,116,
104,105,110,45,101,100,103,101,46,105,111,34,44,34,118,101,114,115,105,111,110,34,58,34,48,46,49,51,46,49,34,125],"qos":1,"retain":true}

This is even strange as what matters is cryptic (payload), while what is useless is clear (qos, retain).

Will update the serialization logic of DebugPayload to cover this case, once we finalize on this point whether to continue persisting raw MqttMessgae instances or introduce a dedicated PersistentMessage enum for the same.

let mut writer = BufWriter::new(file);

if file_is_empty {
let version = env!("CARGO_PKG_VERSION");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reubenmiller suggested that we use an independent version scheme for this persistence store file format rather than using the tedge version itself, which will make the validations easier in future, even when the same file format is used across multiple releases. So, this will be version 1.0 of the schema. And it will only be updated when there's a breaking change to the schema.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Comment on lines +80 to +82
// If the file is empty append the version information as a header
let metadata = file.metadata()?;
let file_is_empty = metadata.len() == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds a bit fragile, at least unusual.

As this is an append-only log of messages - tracking version is a bit more involved. The first messages are not necessarily written by the same version of the software as the following messages.

What is proposed is okay, but the whole log will have to be rewritten if a new format is introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation was that it would be better to discard the log and start fresh if there is a major version mismatch between the version expected by the software and the file read from the disk. Even though I've defined a minor version as well, I don't really have a clear plan on when that would change, keeping the formats compatible between those minor versions. TBH I've only thought about breaking major version changes and kept the minor version number just to enable any possibilities in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation was that it would be better to discard the log and start fresh if there is a major version mismatch between the version expected by the software and the file read from the disk.

It's why I don't bother to much to have something simple for now - even if a bit fragile.

Even though I've defined a minor version as well, I don't really have a clear plan on when that would change, keeping the formats compatible between those minor versions. TBH I've only thought about breaking major version changes and kept the minor version number just to enable any possibilities in future.

Yeah, this sounds premature, but this is also harmless. So let's go with that for the rc.

Persist the entity store as a JSON lines file.
Every registration message and twin data message is persisted as JSON lines.
On startup the in-memory entity store is rebuilt by replaying these messages.
@albinsuresh albinsuresh merged commit 141c1d0 into thin-edge:main Dec 19, 2023
18 checks passed
@albinsuresh
Copy link
Contributor Author

Test Plan

The externally visible effects of this work are the following:

  1. All the entities registered and their twin data are now persisted on the disk at /etc/tedge/.tedge-mapper-c8y/entity_store.jsonl as a JSON lines file consisting of all the entity registration messages and twin messages. No duplicate messages expected in this file.
  2. The entity store of the mapper is rebuilt using the contents of this file on every startup, before it starts processing any MQTT messages. So, when the mapper re-delivers the same retained messages on restart, the mapper will just ignore those, avoiding the duplicate conversion of these messages to Cumulocity as well.
  3. Any new messages published while the mapper was down, should still be processed/converted by the mapper after the startup. Only the metadata messages that it had previously seen/registered/updated, before going down, are ignored.

@gligorisaev
Copy link
Contributor

QA has thoroughly checked the feature and here are the results:

  • Test for ticket exists in the test suite.
  • tests/RobotFramework/tests/cumulocity/registration/device_registration.robot
  • QA has tested the function and it's functioning according description.

@albinsuresh albinsuresh deleted the feat/2428/entity-store-persistence branch January 19, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants