Skip to content

Commit

Permalink
overlord,o/devicestate: restrict cloud-init on Ubuntu Core (#2)
Browse files Browse the repository at this point in the history
* o/devicestate: add ensureCloudInitRestricted to DeviceManager

This function will disable or restrict cloud-init after seeding is complete on
an Ubuntu Core system. This is the full implementation of the mitigation of
CVE-2020-11933 and https://bugs.launchpad.net/snapd/+bug/1879530.

We want to allow some uses of cloud-init to continue to work, but we don't want
cloud-init to be able to trigged arbitrarily after first boot because at that
point the device should be considered "locked down".

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

* o/devicestate/devicemgr: set cloudInitAlreadyRestricted bool, use to skip early

If we already restricted cloud-init we can set a manager bool to prevent
unnecessary checking again for the duration of the manager. If snapd is
restarted for example, it will ping cloud-init again to check the status and
re-set the variable.

Also add tests around this, including a full call to devicemgr.Ensure() to test
that Ensure() calls ensureCloudInitRestricted().

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

* o/devicestate/devicemgr, tests: adjust error msg, test names

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

* tests/o/devicestate: disable device registration in cloud-init Ensure() test

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

* tests/o/devicestate/cloudinit: simplify tests

* Move duplicated setup to the SetUpTest preamble
* Use a shared mockLogger instead of creating individual ones
* Fix import ordering
* Add missing MockCommand restores

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

* o/devicestate/devicemgr: adjust EnsureBefore logic to use time instead of count

Previously, we were using the number of times we were called with a counter to
determine if a specific time had elapsed, but this is fragile since EnsureBefore
does not guarantee that we run in the specified time exactly, it ensures only
that we will get re-run at some point before the specified time. This means that
we could get scheduled to re-run very soon in time and incorrectly conclude that
a significant amount of time had elapsed when in fact it had not. As a matter of
fact, this bug was actually being relied upon (unknowingly in my defense) in the
unit tests, where a call to settle() would just call Ensure repeatedly and cause
the logic to assume that the specific time had elapsed.

The solution to all of this is to keep track of time instead of count. Here, we
save the first time we encounter cloud-init in one of the states we track it in,
and check in all future instances if the time has actually elapsed before
continuing. This does require significant mocking of time in the unit tests, but
otherwise is a good solution.

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>

* o/devicestate/devicemgr: while waiting in error state, only output msg once

This previously would output a message every time EnsureCloudInitRestricted was
called before hitting the timeout, but really we should only output it on the
first time we find cloud-init in this state.

Also add a test for this scenario, with Ensure getting scheduled before the
3 minute timeout has elapsed, so that we also have some test coverage of the
timeout logic for error, previously we only had this logic covered for the
CloudInitEnabled state.

Signed-off-by: Ian Johnson <ian.johnson@canonical.com>
  • Loading branch information
anonymouse64 authored and mvo5 committed Jul 16, 2020
1 parent 620ee3e commit 0fff96f
Show file tree
Hide file tree
Showing 6 changed files with 1,147 additions and 0 deletions.
151 changes: 151 additions & 0 deletions overlord/devicestate/devicemgr.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,15 @@ import (
"github.com/snapcore/snapd/overlord/storecontext"
"github.com/snapcore/snapd/release"
"github.com/snapcore/snapd/snapdenv"
"github.com/snapcore/snapd/sysconfig"
"github.com/snapcore/snapd/timings"
)

var (
cloudInitStatus = sysconfig.CloudInitStatus
restrictCloudInit = sysconfig.RestrictCloudInit
)

// DeviceManager is responsible for managing the device identity and device
// policies.
type DeviceManager struct {
Expand All @@ -65,6 +71,10 @@ type DeviceManager struct {

ensureInstalledRan bool

cloudInitAlreadyRestricted bool
cloudInitErrorAttemptStart *time.Time
cloudInitEnabledInactiveAttemptStart *time.Time

lastBecomeOperationalAttempt time.Time
becomeOperationalBackoff time.Duration
registered bool
Expand Down Expand Up @@ -528,6 +538,143 @@ func (m *DeviceManager) ensureBootOk() error {
return nil
}

func (m *DeviceManager) ensureCloudInitRestricted() error {
m.state.Lock()
defer m.state.Unlock()

if m.cloudInitAlreadyRestricted {
return nil
}

var seeded bool
err := m.state.Get("seeded", &seeded)
if err != nil && err != state.ErrNoState {
return err
}

// On Ubuntu Core devices that have been seeded, we want to restrict
// cloud-init so that its more dangerous (for an IoT device at least)
// features are not exploitable after a device has been seeded. This allows
// device administrators and other tools (such as multipass) to still
// configure an Ubuntu Core device on first boot, and also allows cloud
// vendors to run cloud-init with only a specific data-source on subsequent
// boots but disallows arbitrary cloud-init {user,meta,vendor}-data to be
// attached to a device via a USB drive and inject code onto the device.

// TODO: expand the scope to run on any device with a gadget, so i.e.
// classic devices connected to a brand store also have this run?
if seeded && !release.OnClassic {
opts := &sysconfig.CloudInitRestrictOptions{}

// check the current state of cloud-init, if it is disabled or already
// restricted then we have nothing to do
cloudInitStatus, err := cloudInitStatus()
if err != nil {
return err
}
statusMsg := ""

switch cloudInitStatus {
case sysconfig.CloudInitDisabledPermanently, sysconfig.CloudInitRestrictedBySnapd:
// already been permanently disabled, nothing to do
m.cloudInitAlreadyRestricted = true
return nil
case sysconfig.CloudInitUntriggered:
// hasn't been used
statusMsg = "reported to be in disabled state"
case sysconfig.CloudInitDone:
// is done being used
statusMsg = "reported to be done"
case sysconfig.CloudInitErrored:
// cloud-init errored, so we give the device admin / developer a few
// minutes to reboot the machine to re-run cloud-init and try again,
// otherwise we will disable cloud-init permanently
// initialize
if m.cloudInitErrorAttemptStart == nil {
// save the time we started the attempt to restrict
now := timeNow()
m.cloudInitErrorAttemptStart = &now
logger.Noticef("System initialized, cloud-init reported to be in error state, will disable in 3 minutes")
}

// check if 3 minutes have elapsed since we first saw cloud-init in
// error state
timeSinceFirstAttempt := timeNow().Sub(*m.cloudInitErrorAttemptStart)
if timeSinceFirstAttempt <= 3*time.Minute {
// we need to keep waiting for cloud-init, up to 3 minutes
nextCheck := 3*time.Minute - timeSinceFirstAttempt
m.state.EnsureBefore(nextCheck)
return nil
}
// otherwise, we timed out waiting for cloud-init to be fixed or
// rebooted and should restrict cloud-init
// we will restrict cloud-init below, but we need to force the
// disable, as by default RestrictCloudInit will error on state
// CloudInitErrored
opts.ForceDisable = true
statusMsg = "reported to be in error state after 3 minutes"
default:
// in unknown states we are conservative and let the device run for
// a while to see if it transitions to a known state, but eventually
// will disable anyways
fallthrough
case sysconfig.CloudInitEnabled:
// we will give cloud-init up to 5 minutes to try and run, if it
// still has not transitioned to some other known state, then we
// will give up waiting for it and disable it anyways

// initialize the first time we saw cloud-init in enabled state
if m.cloudInitEnabledInactiveAttemptStart == nil {
// save the time we started the attempt to restrict
now := timeNow()
m.cloudInitEnabledInactiveAttemptStart = &now
}

// keep re-scheduling again in 10 seconds until we hit 5 minutes
timeSinceFirstAttempt := timeNow().Sub(*m.cloudInitEnabledInactiveAttemptStart)
if timeSinceFirstAttempt <= 5*time.Minute {
// TODO: should we log a message here about waiting for cloud-init
// to be in a "known state"?
m.state.EnsureBefore(10 * time.Second)
return nil
}

// otherwise, we gave cloud-init 5 minutes to run, if it's still not
// done disable it anyways
// note we we need to force the disable, as by default
// RestrictCloudInit will error on state CloudInitEnabled
opts.ForceDisable = true
statusMsg = "failed to transition to done or error state after 5 minutes"
}

// now restrict/disable cloud-init
res, err := restrictCloudInit(cloudInitStatus, opts)
if err != nil {
return err
}

// log a message about what we did
actionMsg := ""
switch res.Action {
case "disable":
actionMsg = "disabled permanently"
case "restrict":
// log different messages depending on what datasource was used
if res.Datasource == "NoCloud" {
actionMsg = "set datasource_list to [ NoCloud ] and disabled auto-import by filesystem label"
} else {
// all other datasources just log that we limited it to that datasource
actionMsg = fmt.Sprintf("set datasource_list to [ %s ]", res.Datasource)
}
}
logger.Noticef("System initialized, cloud-init %s, %s", statusMsg, actionMsg)

m.cloudInitAlreadyRestricted = true
}

return nil
}

func (m *DeviceManager) ensureInstalled() error {
m.state.Lock()
defer m.state.Unlock()
Expand Down Expand Up @@ -678,6 +825,10 @@ func (m *DeviceManager) Ensure() error {
}

if !m.preseed {
if err := m.ensureCloudInitRestricted(); err != nil {
errs = append(errs, err)
}

if err := m.ensureOperational(); err != nil {
errs = append(errs, err)
}
Expand Down

0 comments on commit 0fff96f

Please sign in to comment.