Skip to content
This repository has been archived by the owner on Mar 29, 2022. It is now read-only.

Design: Prevent race conditions in Pri<->Sec communication from delivering old images that will be rejected, absent any malice #137

Open
awwad opened this issue Sep 19, 2017 · 0 comments

Comments

@awwad
Copy link
Contributor

awwad commented Sep 19, 2017

Issue Summary

(This issue, more than it is an implementation issue, is a design and/or deployment issue that should be addressed in those documents; however, GitHub provides a convenient place to note and discuss this.)

Communications between the Primary and Secondaries, if not carefully engineered, can result in race conditions with the possibility of - with no malice from any party - a Primary delivering an image to a Secondary that will be deemed invalid by that Secondary. This may result in negative consequences to or delays in the vehicle's operation.

Details

The race condition is apparent in this workflow:

  1. Secondary obtains updated metadata from the Primary.
  2. Secondary determines if that the metadata it has obtained is valid.
  3. Secondary determines if that valid metadata indicates that it should update itself.
  4. If so, Secondary will obtain the indicated firmware image from the Primary.
  5. Secondary determines if the image it has obtained is valid per the valid metadata.
  6. If so, the Secondary allows the new firmware to be run (moving it into place, or flipping a bit, or installing it, or so on)

If the Primary finishes validating new metadata and images and moves them into place between steps 1 and 4, the Secondary may retrieve a new image that may no longer match the now-old metadata obtained in step 1. In many non-Uptane applications, this is not a substantial problem, as bandwidth and storage aren't as sensitive. (Nonetheless, TUF's consistent snapshots solve this problem for non-Uptane applications.) In the case of an Uptane Secondary, this is a much larger problem, because in step 4, if a Secondary does not have sufficient extra storage, obtaining new firmware may be overwriting old firmware, and if that firmware from the Primary proves not to match the older metadata from the Primary, the Secondary will be without fully functional firmware that it can be allowed to run, potentially leaving it in an impaired state (and possibly delaying or otherwise interfering with proper function of the vehicle).

This problem must then be resolved by an arrangement to try obtaining new metadata again. In part because the entity doing the final check (possibly a boot loader) may be much less elaborate than the entity originally receiving, validating, and processing the new metadata, this may be difficult, so this problem is especially unappealing, and a way to prevent these race conditions is important.

In this case, the state would be arrived at without any attack occurring. Attacks may result in this state as well, but those attacks require a certain amount of control: any of the following:

  • Primary compromise in any scenario
  • if Secondary performs partial verification: Director targets role compromise (if Secondary only performs partial verification)
  • if Secondary performs full verification: Director targets (or root) role compromise and the compromise of the appropriate Image Repository delegated role (or higher delegated role, top level targets role, or root role)

While this problem can be readily solved (for example, with TUF's Consistent Snapshots feature), neither the problem nor solutions are addressed in the Design, Implementation, or Deployment documents for Uptane. The problem should be noted and an example solution provided, I think.

Example / Reproduction

In particular, here is a problematic timeline for a Secondary without extra storage:

  • Secondary obtains metadata from the Primary instructing it to update to some_image.tgz.
  • Primary obtains (or otherwise puts into place) fresher metadata and an associated, updated some_image.tgz file.
  • Secondary retrieves some_image.tgz from the Primary, acquiring the newly updated version, displacing its existing firmware.
  • Secondary rejects some_image.tgz because it does not match the slightly-older metadata it obtained.
  • Secondary is now lacking fully functioning firmware.

Solutions

Different ways to solve this exist. Here are some of the straightforward solutions (including those to avoid):

  • Locking: The Primary could lock its metadata and images until all Secondaries are done updating. This seems like a needless exposure to denial of service attacks in the general case, but in some workflows it could make sense (for example, if update timing is predictable/restrictive/synchronized for other reasons). It may be better to instead have the Primary ignore or reject update requests from Secondaries that occur during its updates (so that Secondaries are constrained instead of the Primary).
  • (Inadvisable) The Primary could always deliver metadata and images together. This avoids race conditions between metadata delivery and image delivery. Among the problems with this is that in certain cases, only the Secondary will know for sure that it has to update, and so delivering an image without the Secondary first having processed the metadata may be premature and so may similarly lead to needlessly disabled Secondaries. For example, a lower release counter than that of the image currently installed by the Secondary is a check that only the Secondary itself is fully trusted to perform.
  • The Secondary could provide the hash (or a very short portion of the hash) to the Primary when requesting an image (This hash information comes from the metadata it has just validated.), so as to avoid retrieving a file it is not expecting from an uncompromised Primary. The Primary could either provide the sought after, slightly older image, or refuse, causing the Secondary to try refreshing metadata again (right away or on the next cycle, as appropriate to the implementer's particular workflow), retaining its existing, functioning firmware. (This is similar to TUF's Consistent Snapshots feature.) This likely has the advantage of being less error-prone/sensitive than timing constraints and not adding synchronization constraints to what for some implementers' workflows may be an already rigid system.

Recommended Solution

The Uptane Deployment Considerations or Uptane Implementation Specification should address this issue and recommend example solutions such as those above.

Additional, related issue

Please note that there is a similar race condition if the Primary makes available metadata and images to Secondaries at different times. The two should be made available (moved into place or unlocked) simultaneously.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant