Skip to content

RSDK-12900: use viam-defaults.json only when cloud config is unavailable (remove merging)#177

Merged
aldenh-viam merged 7 commits intoviamrobotics:mainfrom
aldenh-viam:push-wurztlmpnnsn
Jan 14, 2026
Merged

RSDK-12900: use viam-defaults.json only when cloud config is unavailable (remove merging)#177
aldenh-viam merged 7 commits intoviamrobotics:mainfrom
aldenh-viam:push-wurztlmpnnsn

Conversation

@aldenh-viam
Copy link
Contributor

We currently apply the cloud config over viam-defaults.json. Change to use viam-defaults.json only when the cloud config is either unavailable or broken.

utils/config.go Outdated
Comment on lines +270 to +273
cacheBytes, err := os.ReadFile(cachePath)
if err != nil {
if errors.Is(err, fs.ErrNotExist) {
return StackConfigs(&pb.DeviceAgentConfigResponse{})
return StackConfigs(nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadConfigFromCache is only called once at startup. It will apply viam-defaults.json if the the cache is unavailable, however, GetConfig will soon run and replace it with the cloud config (if possible) before the subsystems start up and begin running (after #170)

@aldenh-viam aldenh-viam marked this pull request as ready for review December 30, 2025 20:57
@aldenh-viam aldenh-viam requested a review from cheukt January 2, 2026 16:53
Copy link
Member

@cheukt cheukt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test as well?

errOut = errors.Join(errOut, err)
} else {
jsonBytes, err = json.Marshal(cloudCfg)
if !cloudCfgSuccess {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how soon will this trip if the connection is flaky? like one config fetch fails but the next succeeds? will we end up with the config constantly switching between two states?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in practice it's unlikely to be an issue, and the risk of config flip flop is largely unchanged from what we have now:

agent/manager.go

Lines 692 to 710 in 5fa9720

resp, err := agentDeviceServiceClient.DeviceAgentConfig(timeoutCtx, req)
if err != nil {
m.logger.Warn(errw.Wrapf(err, "fetching %s config", SubsystemName))
return minimalDeviceAgentConfigCheckInterval, err
}
fixWindowsPaths(resp)
// Store update data in cache, actual binaries are updated later
err = m.cache.Update(resp.GetAgentUpdateInfo(), SubsystemName)
if err != nil {
m.logger.Error(errw.Wrapf(err, "processing update data for %s", SubsystemName))
}
err = m.cache.Update(resp.GetViamServerUpdateInfo(), viamserver.SubsysName)
if err != nil {
m.logger.Error(errw.Wrapf(err, "processing update data for %s", viamserver.SubsysName))
}
cfgFromCloud, err := utils.StackConfigs(resp)

If the DeviceAgentConfig grpc call fails with an error, this code doesn't run at all (we just continue with the existing config).

If the call succeeds but returns corrupted data, we run it through a few marshal/unmarshal round trips (ProtoToConfig) and only if everything is still good up to there, do we unmarshal to the final cfg. The one difference is currently we keep whatever the final unmarshal gives us, but in this PR we only keep it if it succeeds without error.

agent/utils/config.go

Lines 337 to 345 in 5fa9720

cloudCfg, err := ProtoToConfig(proto)
if err != nil {
errOut = errors.Join(errOut, err)
} else {
jsonBytes, err = json.Marshal(cloudCfg)
if err != nil {
errOut = errors.Join(errOut, err)
} else {
if err := json.Unmarshal(jsonBytes, &cfg); err != nil {

I'll see if any more ideas to make this more resilient come to mind, but I generally don't think this is a major cause for concern.

Copy link
Member

@cheukt cheukt Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we fallback to viam-defaults if provisioning turns on? e.g. you move a previously connected pi to a new location.

we should do that because otherwise the hotspot name/password will be unset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, no. IIUC the way it works now is when you first get online, it applies the cloud config over viam-defaults.json, so the resulting config still contains the hotspot settings (via viam-defaults), and it gets saved to disk. So next startup, if it needs to enter provisioning, it will read the cached config which still has the hotspot info.

In this case of moving a robot to a new location, do we still prefer using the cached config if it's available, or just revert to using viam-defaults.json only? I'd think the former because it's has more info & is closer to the user's desired state, but then we may have to go back to merging and selectively choose the fields to include (perhaps just the provisioning ones?).

Copy link
Contributor Author

@aldenh-viam aldenh-viam Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lmk what you think about a91fb64. I think added complexity is unavoidable whether we go with this method or restore the merging in StackConfigs but selectively choose which fields to include.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good start - thinking about it some more, maybe a better course is not to stack the cloud/viam-defaults together, but use either the cloud or viam-defaults depending on whether the cloud config differs from the default config (so if there are no differences, we can safely assume the user doesn't care about the networking configuration, but if there are, we don't only use the cloud config and not stack them since there could be weird interactions).

Then we can log which config we're using and it should be more clear when we are using which config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but use either the cloud or viam-defaults depending on whether the cloud config differs from the default config

I'm not sure this would help for the case we're trying to solve: if the hotspot settings are only in viam-defaults, but the cloud cfg has one additional option, e.g. wifi_power_save: false, this would opt to only use the cloud config and miss the hotspot settings. If the cloud config is either completely empty or perfectly matches viam-defaults, using only the latter is no different from the merge in a91fb64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, you're right, let's just keep it the current impl (merge cloud + defaults for provisioning)

Copy link
Contributor Author

@aldenh-viam aldenh-viam Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

731572c changed to only log & apply when there are diffs (functionally same as before) + comments for clarity.

errOut = errors.Join(errOut, err)
} else {
jsonBytes, err = json.Marshal(cloudCfg)
if !cloudCfgSuccess {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good start - thinking about it some more, maybe a better course is not to stack the cloud/viam-defaults together, but use either the cloud or viam-defaults depending on whether the cloud config differs from the default config (so if there are no differences, we can safely assume the user doesn't care about the networking configuration, but if there are, we don't only use the cloud config and not stack them since there could be weird interactions).

Then we can log which config we're using and it should be more clear when we are using which config

utils/config.go Outdated
err = json.Unmarshal(cacheBytes, &cfg)
if err != nil {
cfg, newErr := StackConfigs(&pb.DeviceAgentConfigResponse{})
cfg, newErr := StackConfigs(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we use StackConfigs(nil) enough where we can make a specific function to only get the viam-defaults + default config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

724754d I refactored StackConfigs into StackProtoConfig(proto) and StackOfflineConfig() and the viam-provisioning.json & viam-defaults.json json into separate smaller functions.

Co-authored-by: Cheuk <90270663+cheukt@users.noreply.github.com>
@aldenh-viam aldenh-viam force-pushed the push-wurztlmpnnsn branch 2 times, most recently from dfaefa5 to 1f0cc86 Compare January 13, 2026 19:20
Copy link
Member

@cheukt cheukt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aldenh-viam aldenh-viam merged commit 1da667e into viamrobotics:main Jan 14, 2026
3 checks passed
cheukt added a commit that referenced this pull request Mar 20, 2026
…unavailable (remove merging) (#177)"

This reverts commit 1da667e.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants