Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RSDK-8187] config cache changes #262

Merged
Merged
45 changes: 38 additions & 7 deletions micro-rdk/src/common/config_monitor.rs
Original file line number Diff line number Diff line change
@@ -1,19 +1,39 @@
use super::app_client::{AppClient, AppClientError, PeriodicAppClientTask};
#[cfg(feature = "provisioning")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should remove this now, since the types are no longer gated on provisioning.

use crate::common::{
grpc::ServerError,
provisioning::storage::{RobotConfigurationStorage, WifiCredentialStorage},
};
use crate::proto::app::v1::ConfigResponse;
use futures_lite::Future;
use std::fmt::Debug;
acmorrow marked this conversation as resolved.
Show resolved Hide resolved
use std::pin::Pin;
use std::time::Duration;

pub struct ConfigMonitor<'a> {
restart_hook: Option<Box<dyn FnOnce() + 'a>>,
pub struct ConfigMonitor<'a, S>
where
S: RobotConfigurationStorage + WifiCredentialStorage + Clone + 'static,
<S as RobotConfigurationStorage>::Error: Debug,
ServerError: From<<S as RobotConfigurationStorage>::Error>,
<S as WifiCredentialStorage>::Error: Sync + Send + 'static,
acmorrow marked this conversation as resolved.
Show resolved Hide resolved
{
curr_config: ConfigResponse, //config for robot gotten from last robot startup, aka inputted from entry
storage: S,
restart_hook: Option<Box<dyn FnOnce() + 'a>>,
}

impl<'a> ConfigMonitor<'a> {
pub fn new(restart_hook: impl FnOnce() + 'a, curr_config: ConfigResponse) -> Self {
impl<'a, S> ConfigMonitor<'a, S>
where
S: RobotConfigurationStorage + WifiCredentialStorage + Clone + 'static,
<S as RobotConfigurationStorage>::Error: Debug,
ServerError: From<<S as RobotConfigurationStorage>::Error>,
<S as WifiCredentialStorage>::Error: Sync + Send + 'static,
{
pub fn new(curr_config: ConfigResponse, storage: S, restart_hook: impl FnOnce() + 'a) -> Self {
Self {
restart_hook: Some(Box::new(restart_hook)),
curr_config,
storage,
restart_hook: Some(Box::new(restart_hook)),
}
}

Expand All @@ -23,7 +43,13 @@ impl<'a> ConfigMonitor<'a> {
unreachable!();
}
}
impl<'a> PeriodicAppClientTask for ConfigMonitor<'a> {
impl<'a, S> PeriodicAppClientTask for ConfigMonitor<'a, S>
where
S: RobotConfigurationStorage + WifiCredentialStorage + Clone + 'static,
<S as RobotConfigurationStorage>::Error: Debug,
ServerError: From<<S as RobotConfigurationStorage>::Error>,
<S as WifiCredentialStorage>::Error: Sync + Send + 'static,
{
fn name(&self) -> &str {
"ConfigMonitor"
}
Expand All @@ -43,7 +69,12 @@ impl<'a> PeriodicAppClientTask for ConfigMonitor<'a> {
app_client.clone().get_config(None).await
{
if self.curr_config != *new_config {
self.restart()
if let Some(config) = new_config.config {
if let Err(e) = self.storage.store_robot_configuration(config) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we consider erasing the config then restarting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. That would leave open the possibility that if you can't connect when you come back online, you won't be able to build a robot until you talk to app again. That might be a risk worth taking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could work, its either a empty config or an invalid config if you can't come back online which both wouldn't work. Should I implement this solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make the assumption that network will be available until the next time we poll config after a reboot. The big risk would be an invalid config but that's on the user

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. @allisonschiang please take @npmenard's suggestion to just erase the config instead of storing it here.

log::warn!("Failed to store new robot configuration from app: {}", e);
}
}
self.restart();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the crash loop bug as a TODO comment here once it's made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to restart if the config cache wasn't reset or to not reset? I'm leaning towards reset because it could fix the reason why the cache failed to erase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. If you fail to clear the cache, if you restart you will run the stale config, and if you don't restart you will continue to run the stale config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if NVS writes aren't working, your best hope is that a reboot will help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile, we spent a lot of time talking about why this clears the cache rather than writing the configuration to it. I think a fairly detailed comment is warranted explaining why resetting the cache and rebooting is our current best strategy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we prefer a (possible reboot loop on an NVS write fail) over a (basically permanently stale config cache assuming NVS doesn't fix itself)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized when writing my comment the alternative, which is nvs fail -> reboot -> nvs fail -> reboot ... and the config cache never changing. While rebooting may fix the issue, maybe a stale config is preferred over a reboot loop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I think I agree then: if you can't clear NVS you are better off continuing with your existing config than entering a boot loop driven by continually detecting a new config. I also expect that being unable to write to NVS probably represents a badly degraded device or runtime, and our options are limited there. It is a fussy edge case though, and I'd be interested to hear if @npmenard has thoughts on the best outcome here.

}
}
Ok(Some(self.get_default_period()))
Expand Down
3 changes: 2 additions & 1 deletion micro-rdk/src/esp32/entry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,9 @@ pub async fn serve_web_inner<S>(
crate::esp32::esp_idf_svc::sys::esp_restart()
})))
.with_periodic_app_client_task(Box::new(ConfigMonitor::new(
|| unsafe { crate::esp32::esp_idf_svc::sys::esp_restart() },
*(cfg_response.clone()),
storage.clone(),
|| unsafe { crate::esp32::esp_idf_svc::sys::esp_restart() },
)))
.build(&cfg_response)
.unwrap();
Expand Down
3 changes: 2 additions & 1 deletion micro-rdk/src/native/entry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,9 @@ pub async fn serve_web_inner<S>(
.with_app_client(app_client)
.with_periodic_app_client_task(Box::new(RestartMonitor::new(|| std::process::exit(0))))
.with_periodic_app_client_task(Box::new(ConfigMonitor::new(
|| std::process::exit(0),
*(cfg_response.clone()),
storage.clone(),
|| std::process::exit(0),
)))
.build(&cfg_response)
.unwrap();
Expand Down
Loading